Experimenting with LLVM
In the case of Mongoose, writing my own VM (or, thus far, four of them) was fun, but it's not viable in the long term. I'd like to target the language to an existing VM, but earlier attempts with the JVM and CLR bore little fruit. (Neither VM is well-suited to dynamic languages with serious metaprogramming facilities.)
In the case of Cesta, I need a way to speedily evaluate filter expressions, and a way to "cross-breed" decoders for parallel fuzzy decoding.
Research on both topics kept leading me back to LLVM, which is many things:
- It's a compiler, with frontends for C, C++, and several other languages;
- It's a library for designing and implementing transformations on code;
- It's a framework for building the backends of new compilers;
- It's a well-defined low-level virtual machine;
- It's an embeddable code generator, suitable for use in Just-In-Time compilers.
Initially, I was scared of LLVM. I did some hacking on the GCC family of compilers a few years ago, and the code is abjectly terrifying — not to mention the deliberately obfuscated intermediate representations. However, every research lead I followed kept pointing back to LLVM. Eventually, I got up the gumption to download it and try it out.
In short: I am floored. LLVM is almost absurdly easy to work with, thanks to the best-designed C++ API I've ever seen outside of Qt. Within minutes of downloading it, I had built and tested some of my C code using
llvm-gcc
, their drop-in replacement for GCC.An hour, two source examples, and 120 lines of code later, I had written a program that generates a Fibonacci function in RAM, JITs it to native x86, and executes it in process, printing the results. (It might have taken longer if the LLVM guys hadn't provided an example that does exactly that, which I was able to learn from.)
This is when I started to notice something: the JIT is impressively fast, considering that I had not activated any of LLVM's impressive suite of optimizations. I coded up two version of the Fibonacci sequence (naïve and hand-optimized) and ran some benchmarks.
The first is the naïve recursive version:
int fibonacci(int x) {
if(x <= 2) return 1;
return fib(x - 1) + fib(x - 2);
}
Compiled in a test harness, the times are as follows (suitably averaged and scrubbed of outliers):
fib(24)
- GCC 4.0.1, -O0: 1136us
- GCC 4.0.1, -O3: 397us
- LLVM JIT: 771us
fib(40) (note that units are now milliseconds)
- GCC 4.0.1, -O0: 1866ms
- GCC 4.0.1, -O3: 618ms
- LLVM JIT: 1578ms
Generating this recursive Fibonacci function at runtime (which takes some milliseconds at startup, not included in this benchmark) produces code that runs faster than GCC at -O0. Neat.
Next, the iterative transform of the function:
int iterative_fibonacci(int x) {
if(x <= 2) return 1;
int a = 1, b = 1, c;
while(x > 2) {
c = a + b;
b = a;
a = c;
x--;
}
return c;
}
(The code to generate this at runtime is substantially more complex, since I'm assembling the SSA intermediate representation by hand in C++.)
This algorithm is vastly faster than the previous version, so note that our units below are now nanoseconds.
fib(24)
- GCC 4.0.1, -O0: 170 ns
- GCC 4.0.1, -O3: 30 ns
- LLVM JIT: 51 ns
fib(40)
- GCC 4.0.1, -O0: 299 ns
- GCC 4.0.1, -O3: 51 ns
- LLVM JIT: 93 ns
On this version, LLVM shows an impressive performance gain over GCC at -O0, and is within a factor of 2 of GCC's output at -O3. (Edit: As noted in the comments, running GCC at -O0 is effectively tying both of its hands behind its back. I understand this; I'm comparing to -O0 here because I've not aggressively optimized my LLVM code here and I wanted to try a non-JIT compiler with its optimizations turned off. It's still faster than the M3VM threaded interpreter.)
To try this for yourself, here's where to find the sources:
- The LLVM naïve Fibonacci code is in the LLVM distribution at
examples/Fibonacci
. - The iterative LLVM Fibonacci code.
- The recursive C Fibonacci code (for GCC).
- The iterative C Fibonacci code (for GCC).