Cliff Hacks Things.

Sunday, February 19, 2006

Virtual machine generators are a fun way to spend Saturday night!

Last night, I started work on a new version of the Mongoose VM. The previous implementation (M1) was a testbed for a bunch of competing ideas, many of questionable value — so the code was pretty hairy.

The new implementation (M2) uses a simplified bytecode, has a Java-like classfile format for separate compilation (M1 was image-based), and is starting life as a coalescing threaded interpreter.

Once I'd made it a bit into the implementation, I realized how much boilerplate code was involved in writing a VM. So, I assembled some tools (in Ruby, of all languages) to help me out. I now have a domain-specific language for describing VMs, which includes
- Describing the logical register set;
- Describing bytecode functions and formats;
- Providing C implementations for operations;
- Describing the calling conventions for invoking the VM itself from C code.

Short example:

// Bring in some C headers for the operation bodies
include <stdio.h>;
include <stdint.h>;

// Declare us a register
register uint32_t $accumulator;

// Bytecode 0x00 with a single 8-bit argument
op increment(00, amount:8) {
$accumulator += amount;
}

// Bytecode 0x01 with no arguments
op print(01) {
printf("%lu\n", $accumulator);
}

// Current implementations can stop execution with return
op halt(FF) {
return;
}


The initial tools generated a dead-simple switched VM from my bytecode descriptions, as well as some header files for manipulating the instruction set.

Newer revisions generate direct-threaded code, including functions to translate the bytecode format to threaded form on load. I'm working on very simple stream-rewrite capabilities, for use in instruction coalescing (combining several sequential ops into one) and specialization (replacing an op with a variant depending on context).

A few examples of each:

Self sends: by coalescing send-to-self sequences into a virtual self-send instruction, we can take advantage of the lack of polymorphism. In particular, self-send has an inline parameter that points to the body of the method to call — reducing the method dispatch process to a GOTO.
Tail sends: by coalescing a send-return sequence into a special tail-send instruction — unavailable in the raw bytecode — we can perform tailcall elimination.
Tail recursion: by further specializing the instruction when (a) the receiver is self and (b) the message is the same that invoked the current method, we can perform tail recursion elimination using a particularly lightweight sequence. (M1 optimized both tail sends and tail recursion through dedicated bytecodes.)
Hot sends: by specializing sends into a virtual cached-send instruction — which has a 3-deep polymorphic inline cache — we can drastically reduce method lookups for the general case.

You've probably noticed that those examples all involve message sends, and there's a reason for that: Smalltalk-style languages spend most of their time sending messages around, and it tends to be a slow process. It's low-hanging fruit for optimization.

One more:
Opening closures (ha ha): the relatively expensive block instruction, which creates a literal Block, can be specialized to a contained-block instruction. This requires that the block be entirely self-contained: no references to the enclosing context's variables, and no non-local return instructions. In this case, the enclosing activation records need not be reified onto the heap.

Some of these virtual-instruction transformations are working now; I hope to have the rest in place shortly.

(Incidentally, the entire system is under 1000 lines of Ruby, plus a 134-line description of the virtual machine.)

0 Comments:

Post a Comment

<< Home