Cliff Hacks Things.

Tuesday, December 09, 2008

Linear-scan register allocator adapted

I've adapted the SSA Linear-Scan allocator, originally designed by Mössenböck and Pfeiffer for Hotspot, for my Propeller Java compiler.

It has some advantages:

  1. It's simple.

  2. It's fast.

  3. It does a pretty good job.

  4. It can deal with destructive arithmetic ops (as in the Propeller) and pre-colored registers (as in calling conventions) easily.



Handling pre-colored registers well is particularly important to me, since I'm trying to do all lowering and allocation without leaving SSA. This includes the calling conventions. With this algorithm, it's a matter of

  • Before each call, insert extra movs to clone each argument value. Pre-color these clone values with the appropriate registers in the calling convention.

  • After each call, rewrite the return value to go into a temporary, which is pre-colored with the returned-result register, and which gets immediately mov-d to the right value.

  • Rely on the phi-equivalence coalescing algorithm (still adapted from Pereira) to merge the non-interfering values and eliminate the copies.



I've altered Pereira's algorithm to deal with the "join families" that Mössenb&oml;/Pfeiffer generate. (Arguably in doing so I've recreated their coalesce algorithm, but hey.)

Here's a new test function with some interfering values:

public int function(int x) {
int y = 1;
int z = 0;

while (y < x) {
z = y;
y++;
}

return z;
}


And here's the new output with the new register allocator and dramatically better interference analysis. Note that I don't actually build an interference graph at any point; not sure if this is going to come back to bite me.

Remember that the code runs backwards.

method$:L27312399
IF_ALWAYS JMPRET prim_return_ret, #prim_return

long @method$:L27312399
long @method$:L11631043
IF_ALWAYS JMPRET prim_jmplt_ret, #prim_jmplt
method$:L10840700
IF_ALWAYS CMPS lmmr10, lmmr1

long @method$:L10840700
IF_ALWAYS RDWORD PC, PC
IF_ALWAYS ADD lmmr10, #1
IF_ALWAYS MOV lmmr10, lmmr10
method$:L11631043
IF_ALWAYS MOV lmmr0, lmmr10

long @method$:L10840700
IF_ALWAYS RDWORD PC, PC
IF_ALWAYS MOV lmmr0, #0
method
method$:L29315749
IF_ALWAYS MOV lmmr10, #1


lmmr0 and lmmr1 are the two function arguments (x and this). We smash this almost immediately, reusing lmmr0 to hold z, since it's the register used to return values from functions.

One of these days it would be fun to take a class in this stuff, I think. These are neat problems.

Labels: , ,

Sunday, December 07, 2008

Adapting propasm for the Pilgrim LMM

Last night, I replaced my Java compiler's hacked code generator with one based around propasm.

Unfortunately, propasm had some baked-in assumptions that gave me some trouble, so I had to rewrite it. The rewrite will go into SVN as soon as it reaches feature-parity. It's a lot simpler now and provides an API for extensible image generation -- which is the feature I needed today.

You see, I'm targeting Phil Pilgrim's adaptation of the LMM, which is the only one with 16-cycle constant execution for register-register instructions. (I developed it independently at Hobee's one night, but it looks like Phil did it first, so as far as I'm concerned it's the Pilgrim LMM.)

The Pilgrim LMM has an unusual property that lets it achieve its speed: it lays code out backwards, from high addresses to low, and uses djnz to update the native and LMM PCs simultaneously. This completely broke poor propasm's brain -- it was designed for single-pass assembly, which of course means that instructions get laid out in the order they're input.

To support this, propasm now generates images in two passes. First, it builds a tree of object describing the image's contents. Second, it walks the tree and generates output. This has the side-effect of allowing optimizations on machine code without requiring a separate object model -- I've implemented a couple simple scalar optimizations over propasm's image objects in my compiler.

My Java compiler plugs a class called rlmm.CodeSegment into propasm's image generation code, which lets it generate the code in execution order (by a CFG walk) but lay it out backwards in memory. Thus, for the sample input from my last post, we now get the following output:

' Start of segment rlmm.CodeSegment@1004901
IF_ALWAYS JMPRET prim_return_ret, #prim_return
method$:L10840700
IF_ALWAYS MOV r0, r2

long @method$:L10840700
long @method$:L15980197
IF_ALWAYS JMPRET prim_jmplt_ret, #prim_jmplt
method$:L29181730
IF_ALWAYS CMPS r2, r1

long @method$:L29181730
IF_ALWAYS RDLONG PC, PC
method$:L15980197
IF_ALWAYS ADDS r2, #1

long @method$:L29181730
IF_ALWAYS RDLONG PC, PC
method$:L15006066
method
IF_ALWAYS MOV r2, #0


Remember to read bottom-to-top! (Pardon the excessive IF_ALWAYS and expansion of calls, this output was disassembled with a naïve algorithm.)

Labels: , ,

Saturday, December 06, 2008

Compiling Java for the Propeller

My spare time is pretty limited these days -- work takes up a lot of it. On the up-side, I got promoted.

Over the past few months, some of my spare time has been spent designing a Java compiler that targets the Parallax Propeller. A few hand-coded mockups have convinced me that it's feasible -- the only thing between that and making it a reality are the past 50 years of compiler research, which I never learned. :-)

So, I've spent time surfing Citeseer and gathering data. Between what I've learned and my Propeller experience, I've set some parameters for the first version: it will compile to Large Memory Model code rather than a custom interpreter, and will consume Java class files.

As of this evening I've got a prototype working, and have compiled/assembled/uploaded my first binary. The bullet points are as follows:

  • Converts Java 6 JVML to n-address linear IR using abstract evaluation. Assumes input is verifiable, which lets me skip some checking. (Why Java 6? Because it generates nice frame attributes and doesn't use jsr.)

  • Operates almost exclusively in SSA (actually CSSA).

  • Performs register allocation in SSA form, using a crippled-but-developing version of Pereira's algorithm. I've modified it to deal more gracefully with the Propeller's CISC-like two-address architecture.

  • Provides limited scalar optimizations -- the ones I've gotten around to writing. Copy folding, constant propagation, jump-to-flow-control elimination, etc.

  • Fantastically crappy code generator for now -- literally printf-based with no intelligent layout or peephole optimizations, as you'll see in the samples below.

  • None of the fun stuff, like garbage collection, works yet.



Here's sample input and output from one of my test suites

public class Input0 {
public int function(int x) {
int y = 0;
while (y < x) {
y ++;
}
return y;
}
}


And the output, in propasm syntax (a superset of Parallax):

method_function
L12893404
mov r2, #0
rdlong PC, PC ' inlined jmp
long @@L23063136
L26210109
add r2, #1
rdlong PC, PC ' inlined jmp
long @@L23063136
L23063136
' [int r2:1] <- phi ssa.BasicBlock@c4bcdc:[int c0:0] ssa.BasicBlock@18fef3d:[int r2:2]
cmps r2, r1 wc wz
call #prim_lt
long @@L26210109
long @@L25763215
L25763215
mov r0, r2
call #prim_return


The "call #prim_foo" instructions are calls to LMM kernel traps; plenty of other people have explained Propeller LMM in detail elsewhere, I won't repeat it. My code generator inlines LMM primitives in some cases, as it's done with jump here.

Some details of the calling conventions are apparent in the output: incoming arguments start at r0, and the returned value comes back in r0. These registers are real-live Propeller registers inside the LMM kernel, and I suspect I'll have room for a lot of them -- 32 or more -- so I'm using a RISC-like register calling convention, with link register.

Assuming work doesn't interfere, I'll post more as it develops.

Labels: , ,

Thursday, November 27, 2008

Ubuntu 8.10 on the Acer Aspire One

I've replaced my eee 701 with an Acer Aspire One. Ubuntu works quite well on it, without a lot of the trickery I needed on the eee, and the battery life is stunning (I get upwards of 7 hours after tweaks).

I've created a site with tutorials on getting Ubuntu working optimally on the Aspire One. It includes a complete step-by-step guide for going from installation to awesome, based on the detailed notebook I kept in the first two days I had the machine.

The machine, incidentally, is smaller than the notebook. :-)

Labels: ,

Monday, November 24, 2008

Amazon wrong; Acer Aspire One has no free mini-PCIe slot.

Amazon's product page for the Acer Aspire One claims that the machine has an available mini-PCIe slot for WWAN.

I'm holding it in my hands, and it does not -- the connector isn't soldered in place, just like earlier ones.

This is the first time Amazon's product descriptions have had a significant error in my experience, and I'm pretty pissed.

Monday, May 26, 2008

PropellerForth issue tracker

I've gotten a few blog comments from folks who are using PropellerForth. Yay!

A few comments have reported bugs.

I don't closely follow comments on my blog (though I respond when I have time) -- but I do get a email when someone reports a bug at the PropellerForth issue tracker over on Google Code!

If you're using PropellerForth and find a problem, please report it there. (Please don't post "bugs" like "You haven't released a binary in a few months" or "You haven't imported your Subversion repo." I assure you I'm working on both as time permits, but this isn't my day job.)

Of course, if you're using PropellerForth and haven't found a problem, I'd love to hear about that too! Blog comments are a fine place for that, or send me an email if you can track down my address. ;-)

Labels:

Monday, May 19, 2008

Trig in PropellerForth

Among many other nice features, the Parallax Propeller microcontroller has a single-quadrant sine table in ROM. This makes implementing sine and cosine fast and simple, for medium-precision (13-bit) work.

Here's a simple port of Parallax's routines to PropellerForth.


hex

\ Address of table in ROM
E000 constant sin-base

\ Computes the sign of an angle.
\ The angle is a 13-bit number (0x1FFF = almost 360 degrees).
\ The result is a 16.16 fixed-point signed integer.
: sin ( angle13 -- n16_16 )
dup 1000 and >R \ Stash a flag for Quadr.3/4 onto the rstack
dup 0800 and if negate then \ Invert angle for Quadr.2/4
2* sin-base or H@ \ Compute word address and fetch
R> if negate then ; \ Negate result for Quadr.3/4

: cos ( angle13 -- n16_16 ) 0800 + sin ;


With PropellerForth v8.05 this gives a runtime of 1088-1152 cycles for sin, and an additional 240 cycles for cos -- 31x slower than a native implementation. (v8.02 will be slower.)

It can be made slightly faster by (1) replacing the numbers with CONSTANTs and (2) inlining 2*, which is not a primitive, as "1 lshift", for a runtime of 993-1056 cycles -- 28x slower than native, at the cost of a 22 more bytes of space.

Update: Huh. Through some optimizations to the kernel, I can shave off another 100 cycles -- for a runtime of 896-926 cycles. However, it costs 248 bytes in the kernel, and eliminates a hook I was hoping to use for single-step and breakpoints. I'll have to weigh this.

Labels: