Cliff Hacks Things.

Tuesday, September 12, 2006

Type Inferencing for Mongoose

After a few months of meditation, Mongoose's type system has finally come together. It lives in a several-dozen-page VoodooPad document right now, but I'll summarize it here.

  • Protocol-based, the OO equivalent of structural typing. A "type" describes an object's externally-visible operations, not where it gets its code.

  • Pragmatic. Supports both "subclass-to-specialize" (as in Eiffel) and "subclass-to-extend" (as in every other OO language), without the holes in Eiffel's type system.

  • Simple. All the safety and flexibility of Java's generics, the loose hierarchy of duck-typed languages like Python or Ruby, but with very thorough static checking (if you so desire) — and yet, far simpler to learn than Java's type system.

  • Optional. 'Nuff said.



Some of the problems I've managed to solve recently include:

  • How substitutability (in the Liskov sense) works in a protocol-typed language, with an adaptation of co-contravariance.

  • How to handle union and intersection types.

  • How to run an inferencer over the whole shebang.



This post is really focused on that last point: I got the inferencer working for simple cases.

Take the following code snippet, which defines a simple stream class that produces the Fibonacci numbers. To fill in for the standard library, this snippet also defines a protocol Integer (the protocol of numeric literals), and stubs out a protocol Boolean (the protocol of the literals true and false).

When reading, remember that, like Smalltalk, the caret character ^ means "return."

protocol Integer {
(* A method named "+", which takes and returns an Integer *)
method + other (Integer) ^ (Integer).
}
protocol Boolean {
}
class FibonacciStream {
var a := 1.
var b := 1.

method hasNext [
^true.
]

method next [
var c := self twoValuesAgo + b.
a := b.
b := c.
^c.
]

(* This method is here only to introduce some indirection. *)
method twoValuesAgo [
^a.
]
}


(Long-time readers will note that, yes, the syntax has changed yet again.)

Of note: the FibonacciStream (the actual user code here) has no type annotations at all.
Also of note (at least because it confused the early inferencer): The types of a, b, and c all depend on one another.

Despite the lack of user type declarations, the inferencer actually has a lot to work with: it knows that a and b begin life as Integers, and it knows the signature of Integer's "+" method. This is sufficient to derive all the types in this class, as you can see in the output from the inferencer:

protocol Integer {
method + other (Integer) ^(Integer)

}
protocol Boolean {}
class FibonacciStream {
var a (Integer) := 1.
var b (Integer) := 1.
method hasNext ^(Boolean)
[
^true.
]
method next ^(Integer)
[
var c (Integer) := self twoValuesAgo + b.
a := b.
b := c.
^c.
]
method twoValuesAgo ^(Integer)
[
^a.
]
}

The inferencer has different formatting tastes than I do; it seems to use two spaces instead of tabs, and put its brackets on their own line. I'll fix this egregious error later. :-)

The inferencer has some other neat tricks, not shown here (because they mostly crop up in really weird designs). It can handle variables that change type over their lifetime, either by synthesizing a union-type or by splitting the variable, SSA-style. It also does all sorts of neat things with Mongoose's generics system, but I'll write more on that once I get the damn thing working. :-)

Update: The case I cite above, where a variable changes type over time, is now a warning. You can use a type declaration to express your intention if you really want it to happen, but I find that (in Ruby, at least, which is the most similar language I work in) it's almost always a bug.