Cliff Hacks Things.

Tuesday, December 09, 2008

Linear-scan register allocator adapted

I've adapted the SSA Linear-Scan allocator, originally designed by Mössenböck and Pfeiffer for Hotspot, for my Propeller Java compiler.

It has some advantages:

It's simple.

It's fast.

It does a pretty good job.

It can deal with destructive arithmetic ops (as in the Propeller) and pre-colored registers (as in calling conventions) easily.

Handling pre-colored registers well is particularly important to me, since I'm trying to do all lowering and allocation without leaving SSA. This includes the calling conventions. With this algorithm, it's a matter of

Before each call, insert extra movs to clone each argument value. Pre-color these clone values with the appropriate registers in the calling convention.

After each call, rewrite the return value to go into a temporary, which is pre-colored with the returned-result register, and which gets immediately mov-d to the right value.

Rely on the phi-equivalence coalescing algorithm (still adapted from Pereira) to merge the non-interfering values and eliminate the copies.

I've altered Pereira's algorithm to deal with the "join families" that Mössenb&oml;/Pfeiffer generate. (Arguably in doing so I've recreated their coalesce algorithm, but hey.)

Here's a new test function with some interfering values:


  public int function(int x) {
    int y = 1;
    int z = 0;
    
    while (y < x) {
      z = y;
      y++;
    }
    
    return z;
  }

And here's the new output with the new register allocator and dramatically better interference analysis. Note that I don't actually build an interference graph at any point; not sure if this is going to come back to bite me.

Remember that the code runs backwards.


method$:L27312399
  IF_ALWAYS JMPRET prim_return_ret, #prim_return

  long @method$:L27312399
  long @method$:L11631043
  IF_ALWAYS JMPRET prim_jmplt_ret, #prim_jmplt
method$:L10840700
  IF_ALWAYS CMPS lmmr10, lmmr1

  long @method$:L10840700
  IF_ALWAYS RDWORD PC, PC
  IF_ALWAYS ADD lmmr10, #1
  IF_ALWAYS MOV lmmr10, lmmr10
method$:L11631043
  IF_ALWAYS MOV lmmr0, lmmr10

  long @method$:L10840700
  IF_ALWAYS RDWORD PC, PC
  IF_ALWAYS MOV lmmr0, #0
method
method$:L29315749
  IF_ALWAYS MOV lmmr10, #1

lmmr0 and lmmr1 are the two function arguments (x and this). We smash this almost immediately, reusing lmmr0 to hold z, since it's the register used to return values from functions.

One of these days it would be fun to take a class in this stuff, I think. These are neat problems.

Labels: compilers, java, propeller

Sunday, December 07, 2008

Adapting propasm for the Pilgrim LMM

Last night, I replaced my Java compiler's hacked code generator with one based around propasm.

Unfortunately, propasm had some baked-in assumptions that gave me some trouble, so I had to rewrite it. The rewrite will go into SVN as soon as it reaches feature-parity. It's a lot simpler now and provides an API for extensible image generation -- which is the feature I needed today.

You see, I'm targeting Phil Pilgrim's adaptation of the LMM, which is the only one with 16-cycle constant execution for register-register instructions. (I developed it independently at Hobee's one night, but it looks like Phil did it first, so as far as I'm concerned it's the Pilgrim LMM.)

The Pilgrim LMM has an unusual property that lets it achieve its speed: it lays code out backwards, from high addresses to low, and uses djnz to update the native and LMM PCs simultaneously. This completely broke poor propasm's brain -- it was designed for single-pass assembly, which of course means that instructions get laid out in the order they're input.

To support this, propasm now generates images in two passes. First, it builds a tree of object describing the image's contents. Second, it walks the tree and generates output. This has the side-effect of allowing optimizations on machine code without requiring a separate object model -- I've implemented a couple simple scalar optimizations over propasm's image objects in my compiler.

My Java compiler plugs a class called rlmm.CodeSegment into propasm's image generation code, which lets it generate the code in execution order (by a CFG walk) but lay it out backwards in memory. Thus, for the sample input from my last post, we now get the following output:


' Start of segment rlmm.CodeSegment@1004901
  IF_ALWAYS JMPRET prim_return_ret, #prim_return
method$:L10840700
  IF_ALWAYS MOV r0, r2

  long @method$:L10840700
  long @method$:L15980197
  IF_ALWAYS JMPRET prim_jmplt_ret, #prim_jmplt
method$:L29181730
  IF_ALWAYS CMPS r2, r1

  long @method$:L29181730
  IF_ALWAYS RDLONG PC, PC
method$:L15980197
  IF_ALWAYS ADDS r2, #1

  long @method$:L29181730
  IF_ALWAYS RDLONG PC, PC
method$:L15006066
method
  IF_ALWAYS MOV r2, #0

Remember to read bottom-to-top! (Pardon the excessive IF_ALWAYS and expansion of calls, this output was disassembled with a naïve algorithm.)

Labels: compilers, java, propeller

Saturday, December 06, 2008

Compiling Java for the Propeller

My spare time is pretty limited these days -- work takes up a lot of it. On the up-side, I got promoted.

Over the past few months, some of my spare time has been spent designing a Java compiler that targets the Parallax Propeller. A few hand-coded mockups have convinced me that it's feasible -- the only thing between that and making it a reality are the past 50 years of compiler research, which I never learned. :-)

So, I've spent time surfing Citeseer and gathering data. Between what I've learned and my Propeller experience, I've set some parameters for the first version: it will compile to Large Memory Model code rather than a custom interpreter, and will consume Java class files.

As of this evening I've got a prototype working, and have compiled/assembled/uploaded my first binary. The bullet points are as follows:

Converts Java 6 JVML to n-address linear IR using abstract evaluation. Assumes input is verifiable, which lets me skip some checking. (Why Java 6? Because it generates nice frame attributes and doesn't use jsr.)

Operates almost exclusively in SSA (actually CSSA).

Performs register allocation in SSA form, using a crippled-but-developing version of Pereira's algorithm. I've modified it to deal more gracefully with the Propeller's CISC-like two-address architecture.

Provides limited scalar optimizations -- the ones I've gotten around to writing. Copy folding, constant propagation, jump-to-flow-control elimination, etc.

Fantastically crappy code generator for now -- literally printf-based with no intelligent layout or peephole optimizations, as you'll see in the samples below.

None of the fun stuff, like garbage collection, works yet.

Here's sample input and output from one of my test suites


public class Input0 {
  public int function(int x) {
   int y = 0;
    while (y < x) {
      y ++;
    }
    return y;
  }
}

And the output, in propasm syntax (a superset of Parallax):


method_function
L12893404
  mov r2, #0
  rdlong PC, PC  ' inlined jmp
  long @@L23063136
L26210109
  add r2, #1
  rdlong PC, PC  ' inlined jmp
  long @@L23063136
L23063136
  ' [int r2:1] <- phi ssa.BasicBlock@c4bcdc:[int c0:0] ssa.BasicBlock@18fef3d:[int r2:2]
  cmps r2, r1 wc wz
  call #prim_lt
  long @@L26210109
  long @@L25763215
L25763215
  mov r0, r2
  call #prim_return

The "call #prim_foo" instructions are calls to LMM kernel traps; plenty of other people have explained Propeller LMM in detail elsewhere, I won't repeat it. My code generator inlines LMM primitives in some cases, as it's done with jump here.

Some details of the calling conventions are apparent in the output: incoming arguments start at r0, and the returned value comes back in r0. These registers are real-live Propeller registers inside the LMM kernel, and I suspect I'll have room for a lot of them -- 32 or more -- so I'm using a RISC-like register calling convention, with link register.

Assuming work doesn't interfere, I'll post more as it develops.

Labels: compilers, java, propeller

Sunday, February 10, 2008

Java 6 try/finally compilation without jsr/ret

My day job requires me to be a bit of a JVM geek, so I was poring over the Java Virtual Machine spec recently when I remembered something: In Java 6 and later, the old jsr and ret instructions are effectively deprecated. These instructions were used to build mini-subroutines inside methods. While Java doesn't support nested functions or anything fun like that, it does have the try/finally construct, and these instructions were quite handy for implementing it.

I saw no "official" instructions for compiling finally without jsr, so I investigated it, and thought I'd post the results -- mostly in case I forget them later.

For non-Java folks, some background: a chunk of code that might fail at runtime can be wrapped in a try block. You can then attach handlers for specific types of exceptions/errors to the try block using catch clauses, if you want to respond to specific failure cases. You can also attach a finally block, which will be run at the end, failure or no. You can think of finally as a sort of cleanup block -- which, in practice, is how it's used.

As a result, you wind up with multiple control flow paths that can execute the finally code:

try block, successful completion, finally block executes before moving on

try block, failure, one or more catch blocks, finally block executes before moving on

try block, unhandled failure, finally block executes before unwinding the stack and throwing the error out to a higher level

Java originally compiled this the way most folks would write it by hand: code that gets used multiple places goes in a subroutine. In this case, it's a nested subroutine accessed using jsr.

This unfortunately makes dataflow analysis and type inference of the Java code considerably more complex, for reasons I won't go into here.

So while Java 6 and later JVMs can still understand jsr, tools no longer generate it. Instead, they duplicate the code of the finally block along each path (a transform I've always called tail duplication, but there may be other names). Let's look at a quick example.

This totally contrived Java class plays with an array:


class TryFinally {
  public static void main(String[] args) {
    int[] a = new int[2];
    try {
      a[16] = 2;
    } catch (ArrayIndexOutOfBoundsException e) {
      a[0] = 2;
    } finally {
      a[1] = 2;
    }
  }
}

It will always follow the longest code path, because it's written to fail: the try block will execute, followed by the handler, followed by the finally clause.

The disassembled JVM instructions for this method are as follows:


public static void main(java.lang.String[]);
  Code:
   0:   iconst_2       // Create the array
   1:   newarray int
   3:   astore_1       // Store it in local 1
   4:   aload_1        // Set element 16 to 2 (throws)
   5:   bipush  16
   7:   iconst_2
   8:   iastore
   9:   aload_1        // Begin 'success' finally code
   10:  iconst_1
   11:  iconst_2
   12:  iastore
   13:  goto    35     // End 'success' finally code
   16:  astore_2       // Catch block, save the exception...
   17:  aload_1        // and set a[0] = 2
   18:  iconst_0
   19:  iconst_2
   20:  iastore
   21:  aload_1        // Catch copy of finally code
   22:  iconst_1
   23:  iconst_2
   24:  iastore
   25:  goto    35
   28:  astore_3       // A third copy of finally code!
   29:  aload_1
   30:  iconst_1
   31:  iconst_2
   32:  iastore
   33:  aload_3
   34:  athrow
   35:  return
  Exception table:
   from   to  target type
     4     9    16   Class java/lang/ArrayIndexOutOfBoundsException
     4     9    28   any
    16    21    28   any
    28    29    28   any

As you can see in my annotations, we have three copies of the finally code! Why three? The answer is in the three code paths I discussed above, and in the exception table.

In the table we see four exception handlers defined -- but, of course, we only defined one! Why so many?

The first is the one we defined as a catch. The second is an invisible additional catch on the try block for type 'any' -- so any unexpected exceptions are sent to the third copy of the finally code. The third guards the catch block itself; the fourth guards the generated exception handler.

In other words, the compiler has rewritten the Java code into something resembling:


  public static void main(String[] args) {
    int[] a = new int[2];
    try {
      a[16] = 2;
      a[1] = 2;
    } catch (ArrayIndexOutOfBoundsException e) {
      a[0] = 2;
      a[1] = 2;
    } catch (* e) {
      a[1] = 2;
      throw e;
    }
  }

As you can see, the finally block has disappeared -- instead, its contents have been duplicated along each code path.

Labels: java

Cliff Hacks Things.

Tuesday, December 09, 2008

Linear-scan register allocator adapted

Sunday, December 07, 2008

Adapting propasm for the Pilgrim LMM

Saturday, December 06, 2008

Compiling Java for the Propeller

Sunday, February 10, 2008

Java 6 try/finally compilation without jsr/ret

About

About Me

Links

Previous

Archives