Cliff Hacks Things.

Saturday, March 31, 2007

Experimenting with LLVM

My oldest live projects are Mongoose and Cesta, both in intermittent development since 2003. I'm doing a bit of hacking on both right now.

In the case of Mongoose, writing my own VM (or, thus far, four of them) was fun, but it's not viable in the long term. I'd like to target the language to an existing VM, but earlier attempts with the JVM and CLR bore little fruit. (Neither VM is well-suited to dynamic languages with serious metaprogramming facilities.)

In the case of Cesta, I need a way to speedily evaluate filter expressions, and a way to "cross-breed" decoders for parallel fuzzy decoding.

Research on both topics kept leading me back to LLVM, which is many things:

  1. It's a compiler, with frontends for C, C++, and several other languages;

  2. It's a library for designing and implementing transformations on code;

  3. It's a framework for building the backends of new compilers;

  4. It's a well-defined low-level virtual machine;

  5. It's an embeddable code generator, suitable for use in Just-In-Time compilers.



Initially, I was scared of LLVM. I did some hacking on the GCC family of compilers a few years ago, and the code is abjectly terrifying — not to mention the deliberately obfuscated intermediate representations. However, every research lead I followed kept pointing back to LLVM. Eventually, I got up the gumption to download it and try it out.

In short: I am floored. LLVM is almost absurdly easy to work with, thanks to the best-designed C++ API I've ever seen outside of Qt. Within minutes of downloading it, I had built and tested some of my C code using llvm-gcc, their drop-in replacement for GCC.

An hour, two source examples, and 120 lines of code later, I had written a program that generates a Fibonacci function in RAM, JITs it to native x86, and executes it in process, printing the results. (It might have taken longer if the LLVM guys hadn't provided an example that does exactly that, which I was able to learn from.)

This is when I started to notice something: the JIT is impressively fast, considering that I had not activated any of LLVM's impressive suite of optimizations. I coded up two version of the Fibonacci sequence (naïve and hand-optimized) and ran some benchmarks.

The first is the naïve recursive version:

int fibonacci(int x) {
if(x <= 2) return 1;
return fib(x - 1) + fib(x - 2);
}


Compiled in a test harness, the times are as follows (suitably averaged and scrubbed of outliers):
fib(24)

  • GCC 4.0.1, -O0: 1136us

  • GCC 4.0.1, -O3: 397us

  • LLVM JIT: 771us


fib(40) (note that units are now milliseconds)

  • GCC 4.0.1, -O0: 1866ms

  • GCC 4.0.1, -O3: 618ms

  • LLVM JIT: 1578ms



Generating this recursive Fibonacci function at runtime (which takes some milliseconds at startup, not included in this benchmark) produces code that runs faster than GCC at -O0. Neat.

Next, the iterative transform of the function:

int iterative_fibonacci(int x) {
if(x <= 2) return 1;
int a = 1, b = 1, c;
while(x > 2) {
c = a + b;
b = a;
a = c;
x--;
}
return c;
}


(The code to generate this at runtime is substantially more complex, since I'm assembling the SSA intermediate representation by hand in C++.)

This algorithm is vastly faster than the previous version, so note that our units below are now nanoseconds.
fib(24)

  • GCC 4.0.1, -O0: 170 ns

  • GCC 4.0.1, -O3: 30 ns

  • LLVM JIT: 51 ns


fib(40)

  • GCC 4.0.1, -O0: 299 ns

  • GCC 4.0.1, -O3: 51 ns

  • LLVM JIT: 93 ns



On this version, LLVM shows an impressive performance gain over GCC at -O0, and is within a factor of 2 of GCC's output at -O3. (Edit: As noted in the comments, running GCC at -O0 is effectively tying both of its hands behind its back. I understand this; I'm comparing to -O0 here because I've not aggressively optimized my LLVM code here and I wanted to try a non-JIT compiler with its optimizations turned off. It's still faster than the M3VM threaded interpreter.)

To try this for yourself, here's where to find the sources:

Labels: , ,

Saturday, February 24, 2007

The wheel of hobbies spins on.

This is a sort of omnibus update for all my various projects, in case anyone's watching. :-)

My spare-time projects tend to be fueled by interest from others. If I'm doing them just for me, I'll hack on it to solve some interesting set of problems, and then leave them unfinished. The partially-finished systems are useful to me, and solve my problems; packaging and publishing them is only worth it if they'll be useful to others. I do enough software-polishing in my day job that I have a hard time making it a hobby unless I'm helping people in the process.

Speaking of the day job, my workload has increased dramatically since about October. My team has shrunk, but our responsibilities haven't. I'm enjoying it, but it cuts into my spare time.


PropellerForth has stalled due to profound lack of interest from the Propeller community, though I've suddenly (past week or so) gotten a bunch of emails about it. Thus, I may pick it back up, or at least publish the sources so others can carry the torch. (I never got around to publishing before.) I still don't know of anyone using propasm, which is ironic, since I actually packaged it -- the sources are available, and have near-100% test coverage.

I've done a lot of rethinking of Mongoose. I've been working with some of the new batch of next-generation multi-paradigmatic languages, like Scala, but none of them are heading in the direction I want. I'm still using Mongoose primarily to prove some points, and not intending it to be the "next big thing" -- but at the same time, I've been looking at targeting it to existing VMs, with the hopes of achieving interoperability with some last-gen languages. (And not having to maintain my own VM. I hate C++.)

The robots have been dormant lately, but DPR still boots and works (imagine, a Linux system that still works after several months of inattention). I've been learning a lot about machine learning lately in my day job, and I'm hoping to rewrite some of DPR's personality using new techniques like fuzzy associative memories. I'll post here if I get around to it.

So, what have I been hacking on? Strangely enough, Cesta -- my network analyzer, which I've left untouched for nearly a year. My brain keeps coming back to it, because it lets me do the sort of stuff I don't get to do at work (like UI and interaction design), while still presenting a lot of juicy hard problems (like my fuzzy protocol decoding ideas). I'm applying what I learned from the Mongoose compiler to reworking Cesta's protocol decoding layer, enabling it to synthesize new decoders on the fly.

For example, take TCP. TCP connections contain no indication of the protocol they're carrying -- sure, the connection is to port 80, but that's not a guarantee that it's carrying HTTP.

Previously, my solution was brute-force: try to decode every known TCP protocol, eliminating them one by one as they reject the data as malformed. Two years ago I was already looking at more efficient ways to do this, but got caught up in the fun stuff like graph rendering and UI design.

In my reworking of the protocol decoding, I came across a better way. Protocols are defined declaratively using a protocol definition file, which Cesta compiles into decoder code -- and by synthesizing the constraints and layouts of all protocols known to be carried in TCP connections, it can generate a state machine that allows quick and efficient payload-type detection. (And, thanks to the dynamic code generation techniques I learned for Mongoose, I think I can compile the code on the fly without e.g. calling out to gcc.)

If this sounds like massive overkill, you're right -- which is where the machine learning comes in. Using the fuzzy associative memory techniques I've learned recently, Cesta can learn what protocols are most likely to be sent over a given port, and optimize the decoder appropriately. For example, a TCP connection to port 80 is pretty likely to be HTTP, so we can try decoding it as such until proven wrong -- a technique I call speculative decoding. If the guess proves wrong, we can try the next-most-likely protocol, or fall back to the synthetic every-protocol decoder I described above.

As a side effect, we get service-type identification. If Cesta learns that a port on a particular host seems to be speaking HTTP or SSHv2, it records this in the fuzzy memory, and we can present this to the user.

Multiplexed ports or changes in service don't present a problem for this either: it has no difficulty describing a port as "80% HTTP, 20% SSH" if something weird is going on (such as dynamic port mappings on a NAT).

This is complicated -- rather more complicated, in fact, than any network analyzer I've ever used -- but I understand all the techniques involved, and I think I can pull it off. Once there's code, I'll post it here.

Labels: , ,