Manifest Typing Considered Good

Some time after the page was initially created, I realized that the title should read some manifest typing considered good. Of course, that would be too long and boring for a title, so I'll let the slight misnomer stay. This should be taken as a plea that current languages with TypeInference have turned the design knob all the way to the other extreme, leading to some "BadEngineeringPropertiesOfFunctionalLanguages" (to paraphrase Cardelli).


"Wouldn't you be delighted if your Fairy Godmother offered to wave her wand over your program to remove all its errors and only made the condition that you should write out and key in your whole program three times! The way to shorten programs is to use procedures, not to omit vital declarative information." ~ CarHoare, Turing Award Lecture


In a StaticTyping environment, people began to complain that ManifestTyping is a "drag" on developer productivity and a cludge standing in the way of beautiful code and EconomyOfExpression. And therefore we have one more step before somebody writes ManifestTypingConsideredHarmful? ...

Or is it? This page can be considered a pre-emptive strike.

Against code that doesn't use enough type declarations, or do you mean to attack languages that allow any type declarations to be omitted?

The historical argument

The discussion started long time ago, and comes with DynamicTyping advocates alleging that ManifestTyping forces them to "say things which are not true" (that would be KentBeck), or to put it in more general terms - "saying things (the manifest type that is) that are true now but maybe false later".

Some StaticTyping advocates then got really shamed into defensive mode. After all, how can one defend something that forces good folks into lying? Thus they took refuge into a "superior" feature of advanced programming languages like ML (StandardMl?, ObjectiveCaml), HaskellLanguage, CleanLanguage, etc. - that being TypeInference. And thanks to this wonderful feature it looks like you're not forced to say much at all in those languages. They almost look like dynamically typed source code to the reader that does not pay close attention. And from then onward, ManifestTyping (ala mainstream industrial languages like CeePlusPlus and JavaLanguage) has been looked on like the black sheep of the StaticTyping camp, while TypeInference is the HolyGrail.

There are two features in these languages which let you say "less". Type inference is the lesser, letting you write down the same old programs, just without writing down a type for every declaration. (Taking as a sanity-saving given that no reasonable language requires a declared type for every expression, and every sub-expression of those expressions, and so on). Parametric polymorphism is the greater, letting expressions actually have types that don't require (or at least require many fewer) things that are not true. Consider a trivial apply function, apply f x = f x. This function requires only and just that x be a suitable argument for f, and all the languages you mention can express exactly that requirement, with a type something like "forall a, b : (a -> b) -> a -> b". In ObjectiveCaml this sort of thing extends to the object system (ObjectiveCaml is the only one of these language that really has an object system), with types meaning things like "a function taking two arguments, where the first is an object having at least a method called foo which can be applied to the second argument". It is of course a great help that with type inference it's not necessary to keep copying bits of these types all over the place, or in some cases to even bother figuring out these minimum requirements of your code yourself.

Close, but not quite. TypeInference has been around since the 70's (ML), and certainly isn't the result of anybody shaming anybody into anything.

Why things are not as simple as they appear to be.

Well, it turns out that if we take a closer look (in the tradition of BadEngineeringPropertiesOfOoLanguages) languages with TypeInference have this glitzy feature that you do not have to declare the type of an identifier while the language itself will infer the most general type for it, but on the other hand they suffer from some not so good engineering properties, especially with regards to their EconomyOfExpression and EconomyOfExecution, and although nobody suspected (or rather nobody was curious to look) TypeInference plays a very important role in determining this unwanted properties.

To begin with, let us note that no StaticTyping language with TypeInference has really been put to heavy use in "production" (industrial) environments. And some of the engineering properties and potentially not so good language design decision still have to withstand some relevant empirical validation.

At first let's look at simple case:

  let f1 x y = x + y
x and y are bound to be integers in ML-family of languages, without the programmer having to spend his precious keystrokes to declare it. The compiler in its infinite wisdom saw that the + operator is applied in x + y and then decided that x and y have to be integers. But what is the price paid for it?

Well, the price we pay in ML is that for floats we have to turn our time-honored "+" into "+." as in

  let f2 x y = x +. y

[ A LanguageLawyer says: The above example's actually O'Caml. In Standard ML (SmlLanguage) at least, "+" works for both integers and real numbers, so you'd need to give a hint to the TypeInference mechanism. Consider:
  fun f1 x = x + 1;
  fun f2 x y = x + y : int;
vs:
  fun f1 x = x + 1.0;
  fun f2 x y = x + y : real;
]

Here x and y are floats. Now the ML die-hard advocates would suggest that the notation "+." has some kind of subtle advantages since it alerts the code writer and the code maintainer that + as in integer addition and +. as in floating point addition are very different operators with very different algebraical properties, so they should be honoured with different names. Well, that might be so, but having to write just the simple "+" had a much sweeter EconomyOfExpression feel to it (Except, surely, that having +. give you the type of the expression is a lot more economic than declaring the types of the variables and then putting a standard + between them).

Whatever, we'll grant them that argument (although I am really curious what they will do when they'll have to implement a more serious numeric tower - at least like Java, C++, if not Scheme/lisp like, where will they draw more + operators). However the problem can only be waved away with regards to +/+. What happens with a function name like "length"? Length can be length of strings, length of lists, length of arrays, length of (your arbitrary data structure goes here). And then the code becomes:

  let f1 s =
     let l= Strings.length s
     ...
or
  let f2 a =
    let l= Arrays.length a
    ...
A clear breach of EconomyOfExpression, because whereas in Java (yeah, I know, bad design ... ) the developer would have to say
  void method(String s) {
  s.length();
  s.substring(...);
  s.endsWith(...);
  }
Declaring that s is String OnceAndOnlyOnce, in the superior language ObjectiveCaml, the developer will be spared of the declaration of String attached to the identifier, but because name collisions screw TypeInference, he will have to prefix all string operators with the name of the Strings module:
  let f s= 
    let l= Strings.length s
      and s2= Strings.substring ...
      and condition= Strings.endsWith ...
And so on, so forth. EconomyOfExpression got badly screwed in ML family, essentially because names of operators on data are used to disambiguate the type of the data for the purpose of TypeInference, as opposed to using the data type to disambiguate the name of the operator. The type for a datum (identifier) has to be declared only once, however, while names of function operating on that data are likely to occur many more times. The trade-off does present a clear advantage for the Java style languages versus the ML family (at least in this regard).


But name collisions do not exist for Haskell.

And they also do not exist for objects in ObjectiveCaml.

It should be noted that this is problem only exists in the ML family (it gets mocked in FunctionalWeenie), and is not a property of statically-typed languages in general. Haskell has no problem overloading the + operator for both integers and non-integer forms. Of course, many languages use other syntactic cues to help the type inference engine, such as the convention that "5" is an integer and "5.0" is a float. (The length suffices of C/C++ - L or U - are another example of this).

Name collisions only screw type inference in ML and CAML. Haskell solved this problem with TypeClasses long ago. And it's been discovered that the much vaunted (and seldom documented) Module Calculus of Ocaml is doable in Haskell as well. Now if they could only make Haskell actually run fast...

Well, part of why it doesn't run that fast is because TypeClasses as a feature, helps some with the EconomyOfExpression but impacts the EconomyOfExecution. Of course there's the ever so dear hope for the SufficientlySmartCompiler.

But do we get TypeInference for free with TypeClasses, at least if we disconsider the EconomyOfExecution? Well, let's discuss this further. Of course, I need to dust off my Haskell first :)

Yes.

TypeClasses carry their "methods" around as HigherOrderFunctions. The speed hit is the same as a VeeTable indirection, i.e the difference between CeeLanguage and CeePlusPlus. Most people consider that reasonable for all but the most performance-critical code.

(And those that don't are really arguing against AdHocPolymorphism, as there's no way to avoid the extra indirection unless you require that every name be bound at compile-time to a function. Otherwise, the runtime needs to decide what method to call, and that requires an indirection.)

HaskellLanguage lacks EconomyOfExecution because of pervasive ImplicitLazyEvaluation, not because of its type system. Basically, the expense is the cost of packaging up thunks for CallByNeed on every single expression. Haskell compilers don't actually work like that - they do GraphReduction? of LambdaCalculus expressions - but that's effectively what's going on. -- JonathanTang

I suggest you take a look at Wadler's original TypeClasses papers. http://homepages.inf.ed.ac.uk/wadler/topics/type-classes.html.

From "How to make AdHocPolymorphism less Ad-Hoc":

"However, type classes should be judged independently of Haskell; they could just as well be incorporated into another language, such as standard ML."

The papers present a strategy to transform a program with typeclasses into a HindleyMilnerTypeInference system without typeclasses. Nowhere does the algorithm make use of Haskell's LazyEvaluation rules. It's just as possible to write a preprocessor to add typeclasses to ObjectiveCaml.

Andreas's objection in the thread above concerns the appropriate RunTime behavior of the Haskell fragment listed, and shows he's been thinking in Haskell for an awfully long time. ;-) The difference between LazyEvaluation and StrictEvaluation is that the former works from top-down, where functions are definitions that are invoked as needed, while the latter works from bottom-up, where functions are commands that are invoked when encountered. In a Haskell program, execution begins with the reduction of "main", which will likely trigger other reductions as necessary. Type variables are instantiated as needed, and as Andreas points out, "main" is monomorphic and so every expression will eventually have a well-defined type.

In a strict language, all evaluation begins with either program literals or values read from IO. These provide the types of variables - and values - used in the rest of the program. This is what Neel was pointing out, and it's also mentioned in Wadler's paper: "However, unrestricted overloading of constants leads to situations where the overloading cannot be resolved without providing extra type information. For instance, the expression one * one is meaningless unless it is used in a context that specifies whether its result is an Int or a Float." In a strict language, that context is usually not available, so you're forced to use monomorphic IO functions (readInt/readFloat/readString) and disambiguate literals (3/3L/3.0). But these restrictions are also present in C/Java. Also, you can't say 3 + 3.0 without specifying some sort of type coercion, which is mentioned in Wadler's paper.

The Google thread does present a legitimate speed argument in that all these dictionary lookups prevent inlining of functions. This isn't an issue in Haskell, where CallByNeed prevents inlining anyway. But in a strict language where the average function length is one line (as many Haskell programs are), this could be a significant hit. Fortunately, there's already been significant research in this area, because this is the exact same problem that SmallTalk, C++, and Java face! If you substitute PolymorphicInlineCaches for Wadler's dictionary lookups, you basically get SelfLanguage and HotSpot, both of which get generally acceptable performance. And interestingly enough, TypeInference has been implemented in Self, including dynamic inheritance. http://research.sun.com/self/papers/type-inference.html -- JonathanTang


Even if the technical details about the SufficientlySmartCompiler and implementing type classes in a strict language (and even a few more) are all ironed out, one would have to wonder if the resulting type system would not pose too much of an unnecessary conceptual burden, to the extent to which a mere software engineer would rather throw his hands up in the air and shout: "I'd rather spend a few keystrokes annotating identifiers with type names rather than having to figure out what this compiler tries to do with my types.". -- CostinCozianu

Isn't that the same argument that DynamicTyping enthusiasts levy against ManifestTyping?

Besides, the real headache with ManifestTyping is not typing out the type annotations, it's maintaining them. Every time the types change, you have to track down every variable that might contain the changed value and modify its type declaration. OnceAndOnlyOnce? -- JonathanTang

 template <class T> typedef Foo<T,int> FooOfInt?<T>;

Regarding point 2 above, it has been noted as "mistakenly rejected early on" by BjarneStroustrup, and will likely be included in C++0x. See http://www.research.att.com/~bs/C++0x_keynote.pdf for more information. -- MichaelSparks


MainifestTyping? vs TypeClasses.

TypeClasses seem to get rid of all the problems (although implementation problems) and get rid of ManifestTyping.

To begin with, TypeClasses solve some part of the name clashes problem, they do not resolve it completely. For example the name "length" is reserved for lists (aka the type [a]). Now one may want to introduce it's own kinds of lists, say a doubly linked list, or a list backed by an array.Of course the newly introduced types will also need a "length" operator, but since nobody at the DesignPhase of Haskell foresaw the need for an interface (aka type class) that defines the length type, then the new guy who writes the doubly linked list will not have an interface to adhere to and his newly defined name will class with the other defined name. Of course, modules to the rescue, i.e. one will be addressed as Prelude.length, and another one will be addressed by DoubleList?.length. So we're back to square one (ML). Another instance is where different type classes define the same names for operators, to resolve the potential client of both will disambiguate by using module name.

Sure Java programmers use package names to disambiguate type names, but as we observed before type names in ManifestTyping appear OnceAndOnlyOnce in the manifest, that is, while operator names appear more often.

No, we're still a step ahead. Witness:

 import Prelude hiding (length)
 import qualified Prelude

class Sequence a where length :: a -> Int instance Sequence [a] where length = Prelude.length -- your instances here

Of course this will be hidden in a module, should you really decide there's a need for an overloaded 'length'.


More about type classes.

The assertion was made above that type classes and Haskell type system solve all the problems with type inference without suffering the drawbacks of ML. Actually, Haskell's type system is quite a beast, and it's inherent complexity in addressing simple things is sufficient of a drawback. However, after a brief excursion into Haskell, here's the problem with TypeClasses: they are not first class in the type system. They are very different beasts.

For example under Haskell there's no way to create a list with the following content: [1,"string"] although both elements implement the "Show" interface (speaking in the language of Java mortals), i.e. their types are instances of "Show" and provide the function "show : a -> [Char]" (roughly the equivalent of Java's toString()).

As always, Haskell continues to evolve. GHC 6.6 will support this (or the 6.5 development snapshots, if you are feeling adventurous), as long as you help it along a bit by declaring the type of the list [forall a . Show a => a]. A bigger problem is that given a list with a type like this, *all* you can do is call methods of class Show (and superclasses). A list of things you can do nothing with but convert to strings is not much more useful that just having a list of strings in the first place. There is a class Typeable you could add to partially get around the restriction, but it's still pretty messy

Furthermore the polymorphism with type classes does not quite really work, while the identity function behaves as expected in:

 let id= \x -> x in (id 1, id "string")
is well typed and gives the result (1,"string"), when we try to substitute id with a function that operates on Show:
 let f = \x -> show x in (f 1, f "string")
the unsuspecting Java programmer in me would expect the expression to be well typed, and to return a tuple of two strings. However, Haskell balks at me and says ERROR: it cannot infer type.

I check again the type of the f function:

  :t \x -> show x
it is well typed and gives the type "Show a => a->[Char]" which to me reads like takes a value of a type deriving from Show and returns a string. Now the way Haskell operates is that he doesn't say f take a value of any type deriving from Show, but it takes a value of exactly one type deriving from Show, which type we do not know at the point where the function is declared but will be determined later through TypeInference.

Your reading is quite correct. A type like Show a => a -> String does mean *any* type supporting Show - any lower case identifier in a Haskell type signature is a type variable, and a type is polymorphic in all the type variables. It only breaks down here because you hit a rather ugly corner case in the language, called the (dreaded) monomorphism restriction, where the compiler *does* try to reduce the variable a down to one particular type. Fixed example follows yours. All the gory details here: http://www.haskell.org/hawiki/MonomorphismRestriction

So for example

 let f= \x -> show x ++ show x
    in let a= f 1  -- Point A: here the type inferencer assigned Num a to the unknown type
    in let b = f 2 -- Point B: here the type of f is already fixed and set in stone
    in a ++ b
is well typed and returns 1122, but if I want to say to replace point B with
   let b = f "string"
then the type inference breaks because string (aka [Char] ) does not match the numeric type already inferred at point A.

The monomorphism restriction only applies to declarations involving type classes (why id worked above), without an explicit type signature, and without parameters on the left hand side (and is disabled by the -fno-monomorphism-restriction flag, if you're using GHC). With any of these fixes, type inference works quite a lot better:

 let f x = show x ++ show x
     -- or f :: Show a => a -> String
     --    f = \x -> show x ++ show x
     a = f "hi" -- Type checker makes sure there is a Show instance for String 
     b = f 1 -- Some magic decides 1 is an Integer, type checker checks that there is a Show instance for String.
  in a ++ b
Now the solutions to this dilemma are quite a number in Haskell, but all sacrifice EconomyOfExpression one way or the other. For example one can use existential types to create a "wrapper" type that will provide a box with the actual value encapsulated inside. This is unnatural (the naive Java programmer in me asks why the hell should I create extra objects and extra types just to please the type inference engine - it's clearly an exercise in HelpingTheCompiler and it doesn't even have positive effects with regards to EconomyOfExecution). Other workarounds involve the usage of closures or array or records containing closures, again we are about unnecessary entities that are nowhere near as elegant as your stock interfaces in languages with ManifestTyping (Java, C#). The explanations seem pretty logical TypeClasses were designed to solve the overloading problem (aka ad-hoc polymorphism), while the barbarians at the gates (OO programmers) seek to apply subtype polymorphism in their designs.

See http://www.haskell.org/hawiki?ExistentialTypes

Since Haskell for me is a rarely exercised tongue, I eagerly wait corrections and criticisms. -- CostinCozianu

You're correct, this is unnatural. These heterogenous lists are also largely unneeded. Why would you create a "list of all things showable"? Just show them already and create a list of strings. The standard example is actually the second most useless example, with the most useless being the "list of everything".

I'm still waiting for a compelling use case for existential types. All uses I've come across are better implemented as a record of functions. This again feels natural.


An anonymous donor claims that there is no problem with OCAML:

But ObjectiveCaml will let you notate the type of the argument, so this argument doesn't really hold water.

  let f1 (s: string) = length s;;
  let f2 (a: 'a array) = length s;;
On the contrary, here's the output with the latest OCAML interpreter:

 let f (x:string) = length x;;
  Characters 19-25:
    let f (x:string) = length x;;
                     ^^^^^^
  Unbound value length
Whereas

   #let f (x:string) = String.length x;;
    val f : string -> int = <fun>
Now you probably refer to the open Module directive, but that is not a resolution to the overloading problem :

  #open Array;;
  #let f x = length x;;
  val f : 'a array -> int = <fun>
  # f2 (x: 'a array) = length x
  val f : 'a array -> int = <fun>
  # open String;;
  # let f3 y = length y;;
  val f3 : string -> int = <fun>
  # (* all looks nice but *)
  # let f4 (x: 'a array) = length x;;
  Characters 30-31:
     let f4 (x: 'a array) = length x;;
                                ^
  This expression has type 'a array but is here used with type string 
As you can see in the name resolution algorithms the type declaration simply does not matter, the only thing that matters is the last open directive, in the case above open String. The overloading problem is very manifest in the names of standard functions for printing values:
  print_string : string -> unit
  print_int : int -> unit
  print_float : int -> unit
Whereas in Java or other OO languages there's just one name "print" and several overloaded function. Repeating the type name in the name of the function is redundant and sore to the eye, and is there just to make the TypeInference happy. It's an example of what needs to be sacrificed in order to make TypeInference work.

Agreed. I'm quite happy with deduced, rather than inferred, types. If I use manifest typing on my data structures, then the types in the usage of instances of those data structures are typically enough to figure out the rest without all the mess brought in by unification. -- ScottMcMurray

Dude, deduction is a form of inference. Most TypeInference uses deduction. And the above problems with OCaml are problems of OCaml, not TypeInference in general. Haskell, for example readily allows inference involving typeclasses, and uses it (in abundance) for such things as monads and 'show' (its own form of 'print X' for almost any X you might care to name).

I use deduction when it's context independant, like auto in C++0x. Sure, it's a subset, but I'd rather reserve the name inference for things like Hindley-Milner unification-based approaches where it's not trivial to implement. -- ScottMcMurray


It seems that the message of this page is that manifest typing is better than type inference, because, at the end of the day, it allows for greater EconomyOfExpression, mostly because of ad-hoc polymorphic functions. So where does this leave dynamic typing, or the combination of dynamic typing and manifest typing (as seen in CommonLispObjectSystem)?

Au contraire, mon frere. If anything, the message is that overloading is better than no overloading. Duh!

I think you're in ViolentAgreement. If that's the message, then the big question is "Why don't people miss overloading in dynamic languages?"

They don't miss it because they have overloading, too. It's just dynamic.

Ah. Guess I'm too used to perl.


Something I found to add to the "IDE Tricks" category: Resharper (for C# in VS) offers type-sensitive code completion in variable initialization/assignment, return statements, function call argument lists, etc.


OctoberZeroFive


CategoryLanguageTyping


EditText of this page (last edited November 16, 2011) or FindPage with title or text search