Tricking the JVM JIT into speed

On the Clojurians Slack, PEZ brought some delicious Fibonacci performance tuning from the “languages” repo. While I don’t think that benchmark is as useful as mesmerizing the moving circles are in the animated graphs, I had a few very confusing and interesting discoveries.

Step 1: watch Alex Miller’s talk about Clojure’s interop performance. Step 2: learn how to actually do all that stuff. Step 3: notice that recursion does things. Step 4: magic.

Doing “all that stuff” involved first learning how to get the compiler to use direct linking. Since the benchmark build script was using clj -e, what it took was wrapping the relevant call in the appropriate binding: (binding [*compiler-options* {:direct-linking true}] (compile 'code)).

Next was learning to decompile compiled class files back into “Java” using the CFR decompiler. Sadly class files aren’t exactly human readable, so this is necessary to look into the internals of what Clojure code ends up like. Based on Alex Miller’s talk and the Java implementation in the repo, I assumed that the ideal implementation would be something like this:

public final class fibonacci extends AFunction implements IFn.LL {
  public static long invokeStatic(long a) {
    if (a < 2) return a;
    return invokeStatic(a - 1) + invokeStatic(a - 2);
  }
}

Then up getting the Clojure compiler to emit the static and primitive based method for the Fibonacci function. However, this just refused to work. I haven’t researched the involved compiler internals, but I assume that having a recursive call in there prevents optimizations from kicking in. So the following function just wouldn’t get emit the kind of (theoretically optimal) call I wanted.

(defn fibonacci
  ^long [^long n]
  (if (> n 1)
    (+ (fibonacci (- n 1)) (fibonacci (- n 2)))
    n))

Because of the function calling itself, it seems that the compiler insisted on keeping a var-based indirection (in spite of :direct-linking true) which proved pretty costly. Instead of calling itself, the recursive Fibonacci call would still end up being a getRawRoot call. But poking around I noticed that if it’s not recursive but uses a different function (class) then it gets compiled to the primitive static call as expected.

I thought that maybe I could make the implementation then zigzag back and forth between two implementations? And lo, yes. Because Clojure’s declare didn’t fulfill the requirements (it’s still a simple var instead of a static call), I hacked around the issue like below:

(defn fibB
  ^long [^long n])

(defn fibonacci
  ^long [^long n]
  (if (> n 1)
    (+ (fibB (- n 1)) (fibB (- n 2)))
    n))

(defn fibB
  ^long [^long n]
  (if (> n 1)
    (+ (fibonacci (- n 1)) (fibonacci (- n 2)))
    n))

Surprisingly this implementation ended up about 30% faster than the naive Java implementation in the benchmark. Turns out this speedup is apparently because the JIT would inline the fibB side (= half of the calls) and also unroll multiple levels of recursion. Using the same trick in the Java implementation resulted in a similar speedup, so it’s a JVM thing, not something specific to Clojure.

While this solution was way too hacky for my taste, it’s interesting that with a code like below, it’s possible to trick the Clojure compiler to emit the recursive primitive static call I expected (which results in bytecode roughly equivalent to the Java impl in the bench). I wonder if this is intentional or just some weird side-effect somewhere.

(defn fibonacci
  ^long [^long n])

(defn fibonacci
  ^long [^long n]
  (if (> n 1)
    (+ (fibonacci (- n 1)) (fibonacci (- n 2)))
    n))

valeblog

Tricking the JVM JIT into speed

Leave a Reply Cancel reply

Recent Posts

Recent Posts

七大陸最高峰チャレンジ

valeblog

Tricking the JVM JIT into speed

Leave a Reply Cancel reply

Recent Posts

Tags

Recent Posts

Tags

七大陸最高峰チャレンジ