Cliff Hacks Things.

Tuesday, December 09, 2008

Linear-scan register allocator adapted

I've adapted the SSA Linear-Scan allocator, originally designed by Mössenböck and Pfeiffer for Hotspot, for my Propeller Java compiler.

It has some advantages:

  1. It's simple.

  2. It's fast.

  3. It does a pretty good job.

  4. It can deal with destructive arithmetic ops (as in the Propeller) and pre-colored registers (as in calling conventions) easily.



Handling pre-colored registers well is particularly important to me, since I'm trying to do all lowering and allocation without leaving SSA. This includes the calling conventions. With this algorithm, it's a matter of

  • Before each call, insert extra movs to clone each argument value. Pre-color these clone values with the appropriate registers in the calling convention.

  • After each call, rewrite the return value to go into a temporary, which is pre-colored with the returned-result register, and which gets immediately mov-d to the right value.

  • Rely on the phi-equivalence coalescing algorithm (still adapted from Pereira) to merge the non-interfering values and eliminate the copies.



I've altered Pereira's algorithm to deal with the "join families" that Mössenb&oml;/Pfeiffer generate. (Arguably in doing so I've recreated their coalesce algorithm, but hey.)

Here's a new test function with some interfering values:

public int function(int x) {
int y = 1;
int z = 0;

while (y < x) {
z = y;
y++;
}

return z;
}


And here's the new output with the new register allocator and dramatically better interference analysis. Note that I don't actually build an interference graph at any point; not sure if this is going to come back to bite me.

Remember that the code runs backwards.

method$:L27312399
IF_ALWAYS JMPRET prim_return_ret, #prim_return

long @method$:L27312399
long @method$:L11631043
IF_ALWAYS JMPRET prim_jmplt_ret, #prim_jmplt
method$:L10840700
IF_ALWAYS CMPS lmmr10, lmmr1

long @method$:L10840700
IF_ALWAYS RDWORD PC, PC
IF_ALWAYS ADD lmmr10, #1
IF_ALWAYS MOV lmmr10, lmmr10
method$:L11631043
IF_ALWAYS MOV lmmr0, lmmr10

long @method$:L10840700
IF_ALWAYS RDWORD PC, PC
IF_ALWAYS MOV lmmr0, #0
method
method$:L29315749
IF_ALWAYS MOV lmmr10, #1


lmmr0 and lmmr1 are the two function arguments (x and this). We smash this almost immediately, reusing lmmr0 to hold z, since it's the register used to return values from functions.

One of these days it would be fun to take a class in this stuff, I think. These are neat problems.

Labels: , ,

Sunday, December 07, 2008

Adapting propasm for the Pilgrim LMM

Last night, I replaced my Java compiler's hacked code generator with one based around propasm.

Unfortunately, propasm had some baked-in assumptions that gave me some trouble, so I had to rewrite it. The rewrite will go into SVN as soon as it reaches feature-parity. It's a lot simpler now and provides an API for extensible image generation -- which is the feature I needed today.

You see, I'm targeting Phil Pilgrim's adaptation of the LMM, which is the only one with 16-cycle constant execution for register-register instructions. (I developed it independently at Hobee's one night, but it looks like Phil did it first, so as far as I'm concerned it's the Pilgrim LMM.)

The Pilgrim LMM has an unusual property that lets it achieve its speed: it lays code out backwards, from high addresses to low, and uses djnz to update the native and LMM PCs simultaneously. This completely broke poor propasm's brain -- it was designed for single-pass assembly, which of course means that instructions get laid out in the order they're input.

To support this, propasm now generates images in two passes. First, it builds a tree of object describing the image's contents. Second, it walks the tree and generates output. This has the side-effect of allowing optimizations on machine code without requiring a separate object model -- I've implemented a couple simple scalar optimizations over propasm's image objects in my compiler.

My Java compiler plugs a class called rlmm.CodeSegment into propasm's image generation code, which lets it generate the code in execution order (by a CFG walk) but lay it out backwards in memory. Thus, for the sample input from my last post, we now get the following output:

' Start of segment rlmm.CodeSegment@1004901
IF_ALWAYS JMPRET prim_return_ret, #prim_return
method$:L10840700
IF_ALWAYS MOV r0, r2

long @method$:L10840700
long @method$:L15980197
IF_ALWAYS JMPRET prim_jmplt_ret, #prim_jmplt
method$:L29181730
IF_ALWAYS CMPS r2, r1

long @method$:L29181730
IF_ALWAYS RDLONG PC, PC
method$:L15980197
IF_ALWAYS ADDS r2, #1

long @method$:L29181730
IF_ALWAYS RDLONG PC, PC
method$:L15006066
method
IF_ALWAYS MOV r2, #0


Remember to read bottom-to-top! (Pardon the excessive IF_ALWAYS and expansion of calls, this output was disassembled with a naïve algorithm.)

Labels: , ,

Saturday, December 06, 2008

Compiling Java for the Propeller

My spare time is pretty limited these days -- work takes up a lot of it. On the up-side, I got promoted.

Over the past few months, some of my spare time has been spent designing a Java compiler that targets the Parallax Propeller. A few hand-coded mockups have convinced me that it's feasible -- the only thing between that and making it a reality are the past 50 years of compiler research, which I never learned. :-)

So, I've spent time surfing Citeseer and gathering data. Between what I've learned and my Propeller experience, I've set some parameters for the first version: it will compile to Large Memory Model code rather than a custom interpreter, and will consume Java class files.

As of this evening I've got a prototype working, and have compiled/assembled/uploaded my first binary. The bullet points are as follows:

  • Converts Java 6 JVML to n-address linear IR using abstract evaluation. Assumes input is verifiable, which lets me skip some checking. (Why Java 6? Because it generates nice frame attributes and doesn't use jsr.)

  • Operates almost exclusively in SSA (actually CSSA).

  • Performs register allocation in SSA form, using a crippled-but-developing version of Pereira's algorithm. I've modified it to deal more gracefully with the Propeller's CISC-like two-address architecture.

  • Provides limited scalar optimizations -- the ones I've gotten around to writing. Copy folding, constant propagation, jump-to-flow-control elimination, etc.

  • Fantastically crappy code generator for now -- literally printf-based with no intelligent layout or peephole optimizations, as you'll see in the samples below.

  • None of the fun stuff, like garbage collection, works yet.



Here's sample input and output from one of my test suites

public class Input0 {
public int function(int x) {
int y = 0;
while (y < x) {
y ++;
}
return y;
}
}


And the output, in propasm syntax (a superset of Parallax):

method_function
L12893404
mov r2, #0
rdlong PC, PC ' inlined jmp
long @@L23063136
L26210109
add r2, #1
rdlong PC, PC ' inlined jmp
long @@L23063136
L23063136
' [int r2:1] <- phi ssa.BasicBlock@c4bcdc:[int c0:0] ssa.BasicBlock@18fef3d:[int r2:2]
cmps r2, r1 wc wz
call #prim_lt
long @@L26210109
long @@L25763215
L25763215
mov r0, r2
call #prim_return


The "call #prim_foo" instructions are calls to LMM kernel traps; plenty of other people have explained Propeller LMM in detail elsewhere, I won't repeat it. My code generator inlines LMM primitives in some cases, as it's done with jump here.

Some details of the calling conventions are apparent in the output: incoming arguments start at r0, and the returned value comes back in r0. These registers are real-live Propeller registers inside the LMM kernel, and I suspect I'll have room for a lot of them -- 32 or more -- so I'm using a RISC-like register calling convention, with link register.

Assuming work doesn't interfere, I'll post more as it develops.

Labels: , ,