Cliff Hacks Things.

Monday, May 19, 2008

Trig in PropellerForth

Among many other nice features, the Parallax Propeller microcontroller has a single-quadrant sine table in ROM. This makes implementing sine and cosine fast and simple, for medium-precision (13-bit) work.

Here's a simple port of Parallax's routines to PropellerForth.


hex

\ Address of table in ROM
E000 constant sin-base

\ Computes the sign of an angle.
\ The angle is a 13-bit number (0x1FFF = almost 360 degrees).
\ The result is a 16.16 fixed-point signed integer.
: sin ( angle13 -- n16_16 )
dup 1000 and >R \ Stash a flag for Quadr.3/4 onto the rstack
dup 0800 and if negate then \ Invert angle for Quadr.2/4
2* sin-base or H@ \ Compute word address and fetch
R> if negate then ; \ Negate result for Quadr.3/4

: cos ( angle13 -- n16_16 ) 0800 + sin ;


With PropellerForth v8.05 this gives a runtime of 1088-1152 cycles for sin, and an additional 240 cycles for cos -- 31x slower than a native implementation. (v8.02 will be slower.)

It can be made slightly faster by (1) replacing the numbers with CONSTANTs and (2) inlining 2*, which is not a primitive, as "1 lshift", for a runtime of 993-1056 cycles -- 28x slower than native, at the cost of a 22 more bytes of space.

Update: Huh. Through some optimizations to the kernel, I can shave off another 100 cycles -- for a runtime of 896-926 cycles. However, it costs 248 bytes in the kernel, and eliminates a hook I was hoping to use for single-step and breakpoints. I'll have to weigh this.

Labels:

Saturday, May 17, 2008

Another PFcam teaser

With the help of the new scope, I wrote a new OV6620 driver that includes chroma support. With the camera reconfigured to run at 25fps, it records 88x72 YUV images at full rate. It's 210 bytes, and integrated into a custom build of PropellerForth.

A few lines of Forth later, I had code to generate PPM images. (PPM is possibly the world's easiest format to produce.) Here are some samples. Forgive the yellow cast and the chroma noise -- my new workbench doesn't have lights yet, and the camera's low-lum performance is poor. That, and I probably have bugs in my YUV matrix code.




Once I'm confident in the design, I'll post the driver and Gerbers for the PCB (and if anyone actually wants one, we can go in on a BatchPCB order).

I've got 8 spare Propeller I/O pins...gotta figure out what to do with 'em. Servo connectors? Debug LEDs? Audio?

Edit: Yes, I did indeed have bugs in my matrix code! They're fixed now.

Better chroma makes engineer happy.

Labels: ,

Tuesday, May 13, 2008

I can see clearly now, the waves are gone

I'm a pretty hardcore guy when it comes to debugging software. I pride myself on my automated tests, I routinely sling debuggers and disassemblers around, and so forth.

When it comes to hardware, though, I've been stuck firmly in the printf era. The printf era is pretty damn powerful, of course, when you've got Forth at your side -- the PropellerForth VGA and NTSC display drivers, for example, were written using only a multimeter and ancient HeathKit frequency counter.

However, I'm not a big fan of making my own life difficult. After many years of fondling scopes at Fry's, I finally gave in and picked up a Tektronix TDS1012B. It's at the high-end of the extreme low-end: decent sampling rates for embedded work and so forth.

A couple weekends ago I taught myself printed circuit board and built the PFcam, the Propeller-based image processing board I mentioned in my last post. I'm not an analog electronics whiz, so the first thing I did with my scope was test the new board.

Sure enough -- here's a capture of some I2C traffic between the Propeller and the camera:


Notice the square waves are wearing hats? The regulator's output was bouncing between 3.3v (the goal) and 5v (the input). A decoupling cap later, my SCCB code (Omnivision-speak for I2C) is much more reliable:


Now all the I2C/SCCB chips play nice together, and nobody wigs out and sees a stop condition where there ain't one.

I'm happy with the scope so far. I'll most more details on the PFcam board later on.

Labels: ,

Wednesday, May 07, 2008

Got tired of my CMUcam.

What embedded machine vision system...

  • is easier to interface (and more powerful) than a CMUcam2;

  • has eight cores at 96MHz, six of which can perform user-defined image processing;

  • can be programmed and debugged interactively in-system using Forth or assembler;

  • runs open-source code built using open-source tools;

  • costs less than $100 in parts; and

  • can be built in an evening with only basic through-hole soldering?



Yeah, I couldn't find one either.




Mmm, 192 MIPS of pixel-devouring goodness. I love my overclocked Propellers.

Edit: What kind of image processing, you ask? How about realtime ASCII art!



I hope to get middle-mass and edge tracking working shortly.

Labels: ,

Sunday, February 10, 2008

PropellerForth tidbit: running a word at boot time

By default, PropellerForth v8.01 starts up by running the built-in word INTERACTIVE. This displays the version screen, enables multitasking, and drops into the Forth interpreter by calling the standard word QUIT.

Once you've built an application and burned it to EEPROM using savemem, you may not want PropellerForth to be interactive anymore! Alternatively, you might define a new version of INTERACTIVE that does some additional setup -- SD card initialization, for example -- and then calls the old INTERACTIVE.

I neglected to provide an easy way to run your own words on boot in v8.01, so here it is:

decimal
\ The address of a variable containing the first word to run.
140 constant 'boot

\ Sets a word to run at startup.
\ Example: BOOT MYSTARTUP
: boot ( "name" -- )
' 'boot ! ;


You can simulate a reboot -- assuming you haven't started any other tasks -- by invoking an arcane Propeller machine instruction:

32 0 coginit


Once you're satisfied, savemem and reboot!

Labels:

PropellerForth tidbit: changing the console speed

PropellerForth 8.01, by default, sets the console at 19,200 baud.

This is mostly for compatibility, and doesn't tax the serial driver; in practice, the I/O routines spend most of their time sleeping. At 80MHz the kernel I/O routines have no trouble doing 230kBps. Increasing the speed can really make a difference when moving a bunch of data to the console, such as when dumping a block of RAM or listing a block of source.

Currently the speed is hardcoded, but that doesn't mean we can't change it. :-)

\ Set the console I/O speed in bits per second.
\ Takes effect immediately.
\ Example: 115200 console-speed
decimal
: console-speed ( u -- )
second swap / \ get # of cycles per bit
176 L! ; \ change the kernel's constant bit time


The magic number there, 176, is the location of the bit time constant inside the kernel -- we use L!, local-store, to override it in the running kernel image. 176 is valid for 8.01 and early alphas of 8.02, but may change in the future -- eventually there will be a supported way of doing this.

Have fun!

Labels:

Java 6 try/finally compilation without jsr/ret

My day job requires me to be a bit of a JVM geek, so I was poring over the Java Virtual Machine spec recently when I remembered something: In Java 6 and later, the old jsr and ret instructions are effectively deprecated. These instructions were used to build mini-subroutines inside methods. While Java doesn't support nested functions or anything fun like that, it does have the try/finally construct, and these instructions were quite handy for implementing it.

I saw no "official" instructions for compiling finally without jsr, so I investigated it, and thought I'd post the results -- mostly in case I forget them later.

For non-Java folks, some background: a chunk of code that might fail at runtime can be wrapped in a try block. You can then attach handlers for specific types of exceptions/errors to the try block using catch clauses, if you want to respond to specific failure cases. You can also attach a finally block, which will be run at the end, failure or no. You can think of finally as a sort of cleanup block -- which, in practice, is how it's used.

As a result, you wind up with multiple control flow paths that can execute the finally code:

  • try block, successful completion, finally block executes before moving on

  • try block, failure, one or more catch blocks, finally block executes before moving on

  • try block, unhandled failure, finally block executes before unwinding the stack and throwing the error out to a higher level


Java originally compiled this the way most folks would write it by hand: code that gets used multiple places goes in a subroutine. In this case, it's a nested subroutine accessed using jsr.

This unfortunately makes dataflow analysis and type inference of the Java code considerably more complex, for reasons I won't go into here.

So while Java 6 and later JVMs can still understand jsr, tools no longer generate it. Instead, they duplicate the code of the finally block along each path (a transform I've always called tail duplication, but there may be other names). Let's look at a quick example.

This totally contrived Java class plays with an array:

class TryFinally {
public static void main(String[] args) {
int[] a = new int[2];
try {
a[16] = 2;
} catch (ArrayIndexOutOfBoundsException e) {
a[0] = 2;
} finally {
a[1] = 2;
}
}
}


It will always follow the longest code path, because it's written to fail: the try block will execute, followed by the handler, followed by the finally clause.

The disassembled JVM instructions for this method are as follows:

public static void main(java.lang.String[]);
Code:
0: iconst_2 // Create the array
1: newarray int
3: astore_1 // Store it in local 1
4: aload_1 // Set element 16 to 2 (throws)
5: bipush 16
7: iconst_2
8: iastore
9: aload_1 // Begin 'success' finally code
10: iconst_1
11: iconst_2
12: iastore
13: goto 35 // End 'success' finally code
16: astore_2 // Catch block, save the exception...
17: aload_1 // and set a[0] = 2
18: iconst_0
19: iconst_2
20: iastore
21: aload_1 // Catch copy of finally code
22: iconst_1
23: iconst_2
24: iastore
25: goto 35
28: astore_3 // A third copy of finally code!
29: aload_1
30: iconst_1
31: iconst_2
32: iastore
33: aload_3
34: athrow
35: return
Exception table:
from to target type
4 9 16 Class java/lang/ArrayIndexOutOfBoundsException
4 9 28 any
16 21 28 any
28 29 28 any


As you can see in my annotations, we have three copies of the finally code! Why three? The answer is in the three code paths I discussed above, and in the exception table.

In the table we see four exception handlers defined -- but, of course, we only defined one! Why so many?

The first is the one we defined as a catch. The second is an invisible additional catch on the try block for type 'any' -- so any unexpected exceptions are sent to the third copy of the finally code. The third guards the catch block itself; the fourth guards the generated exception handler.

In other words, the compiler has rewritten the Java code into something resembling:

public static void main(String[] args) {
int[] a = new int[2];
try {
a[16] = 2;
a[1] = 2;
} catch (ArrayIndexOutOfBoundsException e) {
a[0] = 2;
a[1] = 2;
} catch (* e) {
a[1] = 2;
throw e;
}
}

As you can see, the finally block has disappeared -- instead, its contents have been duplicated along each code path.

Labels:

Thursday, February 07, 2008

New features for PropellerForth 8.02

I'm fleshing out the feature set for PropellerForth 8.02, due out in a couple weeks. This is just a teaser post describing what I've been working on; no code yet. :-)

The main new features at this time:
  • Block word set for reading/writing block devices and loading source code from storage
  • EEPROM Block backend, for treating the program EEPROM as a block device
  • SPI-mode SD/MMC card Block backend


The SD support is a traditional Forth disk layer: it doesn't implement FAT or any other filesystem. Instead, it lets you directly address blocks on the disk. By default you can use this to save and edit source code directly on the card, but it could also allow an enterprising individual to implement filesystem support.

Because source is stored in raw disk sectors instead of files, getting at it from a "real" computer will require a tool like dd. Since this is an embedded system, that doesn't bother me too much, but I wouldn't complain if someone implemented FAT16! :-)

The actual interface code is a direct port of Tom Rokicki's FSRW SPIN implementation. It's about 1KiB and currently gets about 5-6KiBps at 80MHz, twice the throughput of the SPIN version, despite being a pretty literal port. I hope to optimize it further before release.

Now, to work on a target compiler -- so that users of 8.02 can recompile their whole system from sources stored on EEPROM or SD.

Labels:

Sunday, January 20, 2008

You keep using that word.

Poking around on the internet this evening, I ran across an ad for a product intended to help manage "really, really big" server deployments. I was intrigued, since that's what I do for a living, so I kept watching.

Then this screen came up:



*cough* "Big." You keep using that word. I do not think it means what you think it means.