Cliff Hacks Things.

Tuesday, December 09, 2008

Linear-scan register allocator adapted

I've adapted the SSA Linear-Scan allocator, originally designed by Mössenböck and Pfeiffer for Hotspot, for my Propeller Java compiler.

It has some advantages:

  1. It's simple.

  2. It's fast.

  3. It does a pretty good job.

  4. It can deal with destructive arithmetic ops (as in the Propeller) and pre-colored registers (as in calling conventions) easily.



Handling pre-colored registers well is particularly important to me, since I'm trying to do all lowering and allocation without leaving SSA. This includes the calling conventions. With this algorithm, it's a matter of

  • Before each call, insert extra movs to clone each argument value. Pre-color these clone values with the appropriate registers in the calling convention.

  • After each call, rewrite the return value to go into a temporary, which is pre-colored with the returned-result register, and which gets immediately mov-d to the right value.

  • Rely on the phi-equivalence coalescing algorithm (still adapted from Pereira) to merge the non-interfering values and eliminate the copies.



I've altered Pereira's algorithm to deal with the "join families" that Mössenb&oml;/Pfeiffer generate. (Arguably in doing so I've recreated their coalesce algorithm, but hey.)

Here's a new test function with some interfering values:

public int function(int x) {
int y = 1;
int z = 0;

while (y < x) {
z = y;
y++;
}

return z;
}


And here's the new output with the new register allocator and dramatically better interference analysis. Note that I don't actually build an interference graph at any point; not sure if this is going to come back to bite me.

Remember that the code runs backwards.

method$:L27312399
IF_ALWAYS JMPRET prim_return_ret, #prim_return

long @method$:L27312399
long @method$:L11631043
IF_ALWAYS JMPRET prim_jmplt_ret, #prim_jmplt
method$:L10840700
IF_ALWAYS CMPS lmmr10, lmmr1

long @method$:L10840700
IF_ALWAYS RDWORD PC, PC
IF_ALWAYS ADD lmmr10, #1
IF_ALWAYS MOV lmmr10, lmmr10
method$:L11631043
IF_ALWAYS MOV lmmr0, lmmr10

long @method$:L10840700
IF_ALWAYS RDWORD PC, PC
IF_ALWAYS MOV lmmr0, #0
method
method$:L29315749
IF_ALWAYS MOV lmmr10, #1


lmmr0 and lmmr1 are the two function arguments (x and this). We smash this almost immediately, reusing lmmr0 to hold z, since it's the register used to return values from functions.

One of these days it would be fun to take a class in this stuff, I think. These are neat problems.

Labels: , ,

Sunday, December 07, 2008

Adapting propasm for the Pilgrim LMM

Last night, I replaced my Java compiler's hacked code generator with one based around propasm.

Unfortunately, propasm had some baked-in assumptions that gave me some trouble, so I had to rewrite it. The rewrite will go into SVN as soon as it reaches feature-parity. It's a lot simpler now and provides an API for extensible image generation -- which is the feature I needed today.

You see, I'm targeting Phil Pilgrim's adaptation of the LMM, which is the only one with 16-cycle constant execution for register-register instructions. (I developed it independently at Hobee's one night, but it looks like Phil did it first, so as far as I'm concerned it's the Pilgrim LMM.)

The Pilgrim LMM has an unusual property that lets it achieve its speed: it lays code out backwards, from high addresses to low, and uses djnz to update the native and LMM PCs simultaneously. This completely broke poor propasm's brain -- it was designed for single-pass assembly, which of course means that instructions get laid out in the order they're input.

To support this, propasm now generates images in two passes. First, it builds a tree of object describing the image's contents. Second, it walks the tree and generates output. This has the side-effect of allowing optimizations on machine code without requiring a separate object model -- I've implemented a couple simple scalar optimizations over propasm's image objects in my compiler.

My Java compiler plugs a class called rlmm.CodeSegment into propasm's image generation code, which lets it generate the code in execution order (by a CFG walk) but lay it out backwards in memory. Thus, for the sample input from my last post, we now get the following output:

' Start of segment rlmm.CodeSegment@1004901
IF_ALWAYS JMPRET prim_return_ret, #prim_return
method$:L10840700
IF_ALWAYS MOV r0, r2

long @method$:L10840700
long @method$:L15980197
IF_ALWAYS JMPRET prim_jmplt_ret, #prim_jmplt
method$:L29181730
IF_ALWAYS CMPS r2, r1

long @method$:L29181730
IF_ALWAYS RDLONG PC, PC
method$:L15980197
IF_ALWAYS ADDS r2, #1

long @method$:L29181730
IF_ALWAYS RDLONG PC, PC
method$:L15006066
method
IF_ALWAYS MOV r2, #0


Remember to read bottom-to-top! (Pardon the excessive IF_ALWAYS and expansion of calls, this output was disassembled with a naïve algorithm.)

Labels: , ,

Saturday, December 06, 2008

Compiling Java for the Propeller

My spare time is pretty limited these days -- work takes up a lot of it. On the up-side, I got promoted.

Over the past few months, some of my spare time has been spent designing a Java compiler that targets the Parallax Propeller. A few hand-coded mockups have convinced me that it's feasible -- the only thing between that and making it a reality are the past 50 years of compiler research, which I never learned. :-)

So, I've spent time surfing Citeseer and gathering data. Between what I've learned and my Propeller experience, I've set some parameters for the first version: it will compile to Large Memory Model code rather than a custom interpreter, and will consume Java class files.

As of this evening I've got a prototype working, and have compiled/assembled/uploaded my first binary. The bullet points are as follows:

  • Converts Java 6 JVML to n-address linear IR using abstract evaluation. Assumes input is verifiable, which lets me skip some checking. (Why Java 6? Because it generates nice frame attributes and doesn't use jsr.)

  • Operates almost exclusively in SSA (actually CSSA).

  • Performs register allocation in SSA form, using a crippled-but-developing version of Pereira's algorithm. I've modified it to deal more gracefully with the Propeller's CISC-like two-address architecture.

  • Provides limited scalar optimizations -- the ones I've gotten around to writing. Copy folding, constant propagation, jump-to-flow-control elimination, etc.

  • Fantastically crappy code generator for now -- literally printf-based with no intelligent layout or peephole optimizations, as you'll see in the samples below.

  • None of the fun stuff, like garbage collection, works yet.



Here's sample input and output from one of my test suites

public class Input0 {
public int function(int x) {
int y = 0;
while (y < x) {
y ++;
}
return y;
}
}


And the output, in propasm syntax (a superset of Parallax):

method_function
L12893404
mov r2, #0
rdlong PC, PC ' inlined jmp
long @@L23063136
L26210109
add r2, #1
rdlong PC, PC ' inlined jmp
long @@L23063136
L23063136
' [int r2:1] <- phi ssa.BasicBlock@c4bcdc:[int c0:0] ssa.BasicBlock@18fef3d:[int r2:2]
cmps r2, r1 wc wz
call #prim_lt
long @@L26210109
long @@L25763215
L25763215
mov r0, r2
call #prim_return


The "call #prim_foo" instructions are calls to LMM kernel traps; plenty of other people have explained Propeller LMM in detail elsewhere, I won't repeat it. My code generator inlines LMM primitives in some cases, as it's done with jump here.

Some details of the calling conventions are apparent in the output: incoming arguments start at r0, and the returned value comes back in r0. These registers are real-live Propeller registers inside the LMM kernel, and I suspect I'll have room for a lot of them -- 32 or more -- so I'm using a RISC-like register calling convention, with link register.

Assuming work doesn't interfere, I'll post more as it develops.

Labels: , ,

Sunday, February 10, 2008

Java 6 try/finally compilation without jsr/ret

My day job requires me to be a bit of a JVM geek, so I was poring over the Java Virtual Machine spec recently when I remembered something: In Java 6 and later, the old jsr and ret instructions are effectively deprecated. These instructions were used to build mini-subroutines inside methods. While Java doesn't support nested functions or anything fun like that, it does have the try/finally construct, and these instructions were quite handy for implementing it.

I saw no "official" instructions for compiling finally without jsr, so I investigated it, and thought I'd post the results -- mostly in case I forget them later.

For non-Java folks, some background: a chunk of code that might fail at runtime can be wrapped in a try block. You can then attach handlers for specific types of exceptions/errors to the try block using catch clauses, if you want to respond to specific failure cases. You can also attach a finally block, which will be run at the end, failure or no. You can think of finally as a sort of cleanup block -- which, in practice, is how it's used.

As a result, you wind up with multiple control flow paths that can execute the finally code:

  • try block, successful completion, finally block executes before moving on

  • try block, failure, one or more catch blocks, finally block executes before moving on

  • try block, unhandled failure, finally block executes before unwinding the stack and throwing the error out to a higher level


Java originally compiled this the way most folks would write it by hand: code that gets used multiple places goes in a subroutine. In this case, it's a nested subroutine accessed using jsr.

This unfortunately makes dataflow analysis and type inference of the Java code considerably more complex, for reasons I won't go into here.

So while Java 6 and later JVMs can still understand jsr, tools no longer generate it. Instead, they duplicate the code of the finally block along each path (a transform I've always called tail duplication, but there may be other names). Let's look at a quick example.

This totally contrived Java class plays with an array:

class TryFinally {
public static void main(String[] args) {
int[] a = new int[2];
try {
a[16] = 2;
} catch (ArrayIndexOutOfBoundsException e) {
a[0] = 2;
} finally {
a[1] = 2;
}
}
}


It will always follow the longest code path, because it's written to fail: the try block will execute, followed by the handler, followed by the finally clause.

The disassembled JVM instructions for this method are as follows:

public static void main(java.lang.String[]);
Code:
0: iconst_2 // Create the array
1: newarray int
3: astore_1 // Store it in local 1
4: aload_1 // Set element 16 to 2 (throws)
5: bipush 16
7: iconst_2
8: iastore
9: aload_1 // Begin 'success' finally code
10: iconst_1
11: iconst_2
12: iastore
13: goto 35 // End 'success' finally code
16: astore_2 // Catch block, save the exception...
17: aload_1 // and set a[0] = 2
18: iconst_0
19: iconst_2
20: iastore
21: aload_1 // Catch copy of finally code
22: iconst_1
23: iconst_2
24: iastore
25: goto 35
28: astore_3 // A third copy of finally code!
29: aload_1
30: iconst_1
31: iconst_2
32: iastore
33: aload_3
34: athrow
35: return
Exception table:
from to target type
4 9 16 Class java/lang/ArrayIndexOutOfBoundsException
4 9 28 any
16 21 28 any
28 29 28 any


As you can see in my annotations, we have three copies of the finally code! Why three? The answer is in the three code paths I discussed above, and in the exception table.

In the table we see four exception handlers defined -- but, of course, we only defined one! Why so many?

The first is the one we defined as a catch. The second is an invisible additional catch on the try block for type 'any' -- so any unexpected exceptions are sent to the third copy of the finally code. The third guards the catch block itself; the fourth guards the generated exception handler.

In other words, the compiler has rewritten the Java code into something resembling:

public static void main(String[] args) {
int[] a = new int[2];
try {
a[16] = 2;
a[1] = 2;
} catch (ArrayIndexOutOfBoundsException e) {
a[0] = 2;
a[1] = 2;
} catch (* e) {
a[1] = 2;
throw e;
}
}

As you can see, the finally block has disappeared -- instead, its contents have been duplicated along each code path.

Labels: