Cliff Hacks Things.

Monday, March 20, 2006

Nutshell: my thoughts on static type checking

I've been giving a lot of thought to static type checking in Mongoose — for three years now, in fact.

Static type checking has a bad name these days, with programmers advocating fully dynamically-typed languages for productivity reasons, or to avoid complexity.

I agree with the gist of the argument, but I feel it's based on bad data. Here are my arguments. (I'm not making the "static type systems allow for more optimizations" argument, since Self proved that one wrong 13 years ago.)

1. The popular static type systems are the worst examples.

Ask one of these programmers for an example of a static type system that holds them back. I will virtually guarantee the answer is likely to be C, C#, C++, Java, Pascal, or Delphi.

I will also virtually guarantee the answer is not Ocaml, ML, or Haskell, three strongly-typed languages — stronger even than the C family or Java, as they lack pointer aliasing, casts, and other ways to accidentally goof up your types at runtime. And yet they don't get in the way (though the lack of operator overloading in ML is unfortunate).

There are a number of reasons for this. The ML family provides incredibly good type-inference support: the compiler looks at the operations you're performing on a value and where it came from, and deduces what type you wanted. (In most interactive environments, it will report the types it deduces, so you can see if you screwed up. Yes, the errors are yours, the compiler knows its stuff.) As a result, the ML family (including Ocaml) doesn't require you to pedantically type every variable, like the C family.

Microsoft added a sort of type-inference to C# in the latest version, but it only saves you from declaring your variable types — instead of int x you get to write var x. Woohoo. Compared to ML, this is a toy feature, and it's a really bad fit for the language, in my uneducated country-boy opinion.

2. Type declarations — or type annotations, or whatever you want to call them — are excellent documentation.

Maybe I'm just a bad programmer, but in Ruby, I find myself passing Strings when I really wanted to pass Integers all the damn time. The program usually doesn't die until runtime — but it usually does die, all because I forgot a to_i somewhere in my parsing code.

This is exacerbated by the fact that Strings, in Ruby, overload most arithmetic operators. Rather than dying as soon as I try to add or shift a String, I just get a weird String. (I do like it better than the transparent conversions in Perl, I'll say.)

I have the same issues with Smalltalk. Sometimes, it's just really nice to have documentation on what the expected type of a parameter is, what a function will return, or whether or not something can be null. A lot of Smalltalk code fakes this, by naming the parameters after their expected type — aSmallInteger, for instance. This is really just an overgrown Hungarian notation, and violates one of my basic principles of software design: values should be named for their roles, not their types.

3. Having a tool to check the accuracy of your documentation rocks.

When writing Java in an IDE, I always leave Javadoc checking enabled.

For those of you fortunate enough to have avoided Java, Javadoc-checking in an IDE verifies that your parameters are documented, their declared types are correct, and so forth — in your comments. It verifies that your comments are in sync with the source. This is wonderful, since in Java, comments duplicate the information in the method signatures. It keeps them from getting out of sync.

Type declarations on parameters and results are a form of documentation — and, in a static type system, they're checked by a tool called the compiler.

4. A sufficiently expressive static type system doesn't bind your hands.

On this point, I must return — as I so often do — to Strongtalk.

Strongtalk is a descendant of Smalltalk, a totally dynamic language. One of the real powers of Smalltalk is its genericity: a parameter can be of any type at all, and as long as it responds to the set of messages you send it, your algorithm will work. Pass it an integer, pass it a user-defined currency type, pass it a complex number — it doesn't care, and neither do you.

Strongtalk sports a static type system that preserves this power. First, it's optional: if you honestly want a parameter to accept any type of object, that's fine.

Second, it doesn't confuse "class" with "type." Sure, you can say "this parameter must be a SmallInteger" — but you can also say "this parameter must respond to this set of messages with these types" In other words, you can describe the type in the same way that Smalltalk does (and Ruby does, and Python does, and...) — but in an explicit fashion, and one that the compiler can check for you!

Strongtalk's type system is by far the most powerful and expressive that I've used. It, single-handedly, convinced me that static typing was not a dead end — but rather the path of least resistance.

Something that Strongtalk, near as I can tell, didn't do is infer the types and write the type declarations for you — but there's really nothing preventing that. This is one of the features I'm working into Mongoose, albeit slowly.

5. Explicitness is good.

I like explicit code. (Yeah, I've worked for porn sites, but no jokes, please.) I like explictness of intent, both on the part of the programmer, and on the part of third-party components.

Static type systems are only one step in that direction. Pre/postconditions and loop and class invariants are another. (Really, type systems are a specific case of this, but I feel they're best expressed separately.)

I'm in the very early stages of a contract-inference engine for Mongoose code, which will save the programmer the grunt work of writing pre/postconditions. Stay tuned.

0 Comments:

Post a Comment

<< Home