Cliff Hacks Things.

Tuesday, March 14, 2006

Indirect threading is...slower?

Well, that was unexpected.

M2VM so far has been using the switched-interpreter mode of my VM generator.

As of yesterday, my profiler no longer says the GC is the bottleneck — now, it's the interpreter loop. So, logically, I've started to optimize it.

As a first step, I switched the VM generator to produce indirect-threaded code — simply looking up the address of each bytecode's code, using an array of addresses (one for each bytecode). I expected this to be faster.

It's about 10% slower on my benchmarks. (It's about the same speed at -O1, but at higher levels it gets slower and slower.)

I'll drill down with some profiling to see if I can find the cause, but this kinda blows my mind. The generated code for the switch statement was the hottest in the module, so I assumed that, by getting it out of the way, things would speed up.

Hrm.

Update: This is only true on the G4/G5. On an Intel Core Duo system, the indirect threaded interpreter is an instant 10% speed boost. However, I've found an issue with GCC's code generation that's ruining my indirect branch prediction rate — which may explain the issues on the G5.

0 Comments:

Post a Comment

<< Home