Virtual machine generators are a fun way to spend Saturday night!
Last night, I started work on a new version of the Mongoose VM. The previous implementation (M1) was a testbed for a bunch of competing ideas, many of questionable value — so the code was pretty hairy.
The new implementation (M2) uses a simplified bytecode, has a Java-like classfile format for separate compilation (M1 was image-based), and is starting life as a coalescing threaded interpreter.
Once I'd made it a bit into the implementation, I realized how much boilerplate code was involved in writing a VM. So, I assembled some tools (in Ruby, of all languages) to help me out. I now have a domain-specific language for describing VMs, which includes
- Describing the logical register set;
- Describing bytecode functions and formats;
- Providing C implementations for operations;
- Describing the calling conventions for invoking the VM itself from C code.
Short example:
The initial tools generated a dead-simple
Newer revisions generate direct-threaded code, including functions to translate the bytecode format to threaded form on load. I'm working on very simple stream-rewrite capabilities, for use in instruction coalescing (combining several sequential ops into one) and specialization (replacing an op with a variant depending on context).
A few examples of each:
Self sends: by coalescing send-to-self sequences into a virtual
Tail sends: by coalescing a send-return sequence into a special
Tail recursion: by further specializing the instruction when (a) the receiver is
Hot sends: by specializing sends into a virtual
You've probably noticed that those examples all involve message sends, and there's a reason for that: Smalltalk-style languages spend most of their time sending messages around, and it tends to be a slow process. It's low-hanging fruit for optimization.
One more:
Opening closures (ha ha): the relatively expensive
Some of these virtual-instruction transformations are working now; I hope to have the rest in place shortly.
(Incidentally, the entire system is under 1000 lines of Ruby, plus a 134-line description of the virtual machine.)
The new implementation (M2) uses a simplified bytecode, has a Java-like classfile format for separate compilation (M1 was image-based), and is starting life as a coalescing threaded interpreter.
Once I'd made it a bit into the implementation, I realized how much boilerplate code was involved in writing a VM. So, I assembled some tools (in Ruby, of all languages) to help me out. I now have a domain-specific language for describing VMs, which includes
- Describing the logical register set;
- Describing bytecode functions and formats;
- Providing C implementations for operations;
- Describing the calling conventions for invoking the VM itself from C code.
Short example:
// Bring in some C headers for the operation bodies
include <stdio.h>;
include <stdint.h>;
// Declare us a register
register uint32_t $accumulator;
// Bytecode 0x00 with a single 8-bit argument
op increment(00, amount:8) {
$accumulator += amount;
}
// Bytecode 0x01 with no arguments
op print(01) {
printf("%lu\n", $accumulator);
}
// Current implementations can stop execution with return
op halt(FF) {
return;
}
The initial tools generated a dead-simple
switch
ed VM from my bytecode descriptions, as well as some header files for manipulating the instruction set.Newer revisions generate direct-threaded code, including functions to translate the bytecode format to threaded form on load. I'm working on very simple stream-rewrite capabilities, for use in instruction coalescing (combining several sequential ops into one) and specialization (replacing an op with a variant depending on context).
A few examples of each:
Self sends: by coalescing send-to-self sequences into a virtual
self-send
instruction, we can take advantage of the lack of polymorphism. In particular, self-send
has an inline parameter that points to the body of the method to call — reducing the method dispatch process to a GOTO.Tail sends: by coalescing a send-return sequence into a special
tail-send
instruction — unavailable in the raw bytecode — we can perform tailcall elimination.Tail recursion: by further specializing the instruction when (a) the receiver is
self
and (b) the message is the same that invoked the current method, we can perform tail recursion elimination using a particularly lightweight sequence. (M1 optimized both tail sends and tail recursion through dedicated bytecodes.)Hot sends: by specializing sends into a virtual
cached-send
instruction — which has a 3-deep polymorphic inline cache — we can drastically reduce method lookups for the general case.You've probably noticed that those examples all involve message sends, and there's a reason for that: Smalltalk-style languages spend most of their time sending messages around, and it tends to be a slow process. It's low-hanging fruit for optimization.
One more:
Opening closures (ha ha): the relatively expensive
block
instruction, which creates a literal Block, can be specialized to a contained-block
instruction. This requires that the block be entirely self-contained: no references to the enclosing context's variables, and no non-local return instructions. In this case, the enclosing activation records need not be reified onto the heap.Some of these virtual-instruction transformations are working now; I hope to have the rest in place shortly.
(Incidentally, the entire system is under 1000 lines of Ruby, plus a 134-line description of the virtual machine.)
0 Comments:
Post a Comment
<< Home