Type Inference

TypeInference is the analysis of a program to infer the types of some or all expressions, usually at CompileTime. It can be used for languages with either StaticTyping or SoftTyping. See also ManifestTyping and ImplicitTyping.

TypeInference is often used in FunctionalProgrammingLanguages. The compiler or interpreter needs only the minimum information to be able to figure out the type of an expression or variable.

Consider a function

  function Foo(a,b) = a + b

The compiler knows that + has type

  Integer x Integer -> Integer

Which means + takes two Integers and returns another Integer. It can now infer that Foo also has type

  Integer x Integer -> Integer

A more complicated example: consider that you have a function Plus which in its definition states that both parameters must be of the same type. If you later write an expression,

  Plus x1 x2

and the compiler already (through previous inference) assumes the type of x1 is "type a", it will assume x2 is also of "type a".

Note that "type a" can still be an unknown type. It is a temporary name that the compiler may use for this unique, but unknown type. If further inference shows that x1 has the "real" type of X, then it is also known that "type a" is actually X, and that x2 is therefore also of type X, and so on.

If you try to use x2 in another expression where type Y is expected you will get a type fault.

In the end what you have is a system where expressions need not have their types declared explicitly, but typing-errors will still be found. This is still strong type checking!

The most common type of type inference is called HindleyMilnerTypeInference and is found in various forms in MlLanguage and HaskellLanguage. Other forms of type inference exist. For example, MrSpidey uses set-based analysis to infer types in the "dynamically" typed SchemeLanguage.

RE: The compiler or interpreter needs only the minimum information to be able to figure out the type of an expression or variable.

This might even be wonderful for computers who do not have brains and who only require minimum information - just as assembly language allows us to specify minimum information about what we are twiddling. But have we considered that humans often need more information about what is what, so that a human does not have to sift through the code or the comments to try and figure out why this was declared here and for what reasons? An argument could be made that we don't need web pages with blue underlined hyperlinks because black and white is good enough and implicitly we can figure out that a text is clickable by simply clicking text randomly that isn't underlined. If after clicking text we do not arrive at any web page - it is obvious that the text is not a hyperlink! Much more obvious, than say clearly marking it?

[With type inference, you could have your editor highlight based on expression types, or give you hovertext for the expression if you selected it. You can't do the same for assembly language. In any case, TypeInference languages rarely prevent type annotations where the humans wish to write them; they mostly save a great deal of effort where the type is obvious, and where one doesn't need to know the exact type. It also is better for modularity (ManifestTyping inhibits refactoring; see LanguageInhibitsRefactoring).]

RE: Which means + takes two Integers and returns another Integer. It can now infer that Foo also has type

Not necessarily - if my plus sign also concatenates strings... Especially if I am used to reading a language which the plus sign concatenates strings, or especially if I'm reading someone else's code where I'm not so sure whether plus sign concatenates strings or adds integers, or.. especially if I am learning a language and wondering.. what does this plus sign really do in this function? Couldn't it be more obvious just to say that A and B are integers? Does it really take that much effort to make them an int to set the record straight?

[You complain about OperatorOverloading. Anyhow, it may be that setting it to int isn't difficult... but setting something to auto sure is easier than setting something to std::pair< std::map<std::string, mytype<Arg1, Arg2, typename Arg1::value_type >, myAllocator >::iterator, bool>. If your language supports TypefulProgramming of any sort, TypeInference is pretty much required to save a great deal of typing (pardon the double entendre).]

As a note, TypeInference isn't the only static TypeSafety option for ImplictTypes?. TypeInference attempts to find type-descriptors (e.g. 'Integer' or 'Integer->Integer') for values, functions, expressions, terms, etc., but often those type-descriptors simply don't need to be found. Proving TypeSafety only requires proving TypeCompatibility? ("t 'is compatible with' T" instead of "t 'is a' T") for application of functions and messages. That's it. This can often be done without ever learning the types the values and functions. Of course, some means of describing intended types must exist for module interfaces and services, where declaration is separate from description.

Some quick examples (in HaskellLanguage)

A very simple example is apply.

apply f v = f v

The inferred type for apply is

(a -> b) -> a -> b

where a and b are type variables: apply is a function from a function f with the general type a -> b, and a value of f's argument type a to f's return type b.

Now consider the function

apply length

defined by the application of apply to the function length. length has type [c] -> Int (where c is another type variable) and returns the number of elements of a given homogeneous list (a list where all members inhabit the same type). Fixing the value of the first argument of apply to be length and knowing the type of length allows Haskell to restrict the values of the type variables a and b above to [c] and Int, respectively, and the inferred type for apply length is [c] -> Int.

A slightly more complicated example is a factorial function.

factorial x = if x == 0 then 1 else x * factorial (x-1)

The inferred type is function from a number to a number (factorial::Number->Number).

It goes something like this: The compiler sees that factorial is a function, and gives it a provisional type a->b. The compiler sees math being done on the argument, so it decides it is a number. The compiler sees that the function can return a number, so it decides that the return type is a number. It pokes around a bit more and decides this type is consistent with everything. (Just don't try "factorial 0.5")

Type inference also handles PatternMatching (and TypeClasses too...) A quick example is map.

map f [] = []
map f (x:xs) = f x : map f

The inferred type is (a->b)->[a]->[b], or a function that takes a function f and a list of things of f's argument type, and returns a list of things of f's return type.

Suppose that I now have (in addition to lists) arrays and sets also. Do I have to write different versions of "map", one for each of the different types, or will the above version of "map" (or a variation of it) work for arrays and sets (and any other "enumerable" thing I might want to define in the future) too?

Yes, check out the Functor typeclass: http://www.haskell.org/onlinereport/basic.html#sect6.3.5 That Means you can map over anything once you've defined an implementation of fmap for it.

For a slightly deeper example, you could create a recursive Tree type that could contain more 'instances' as the branches, and write implementations of map that apply themselves to all the Tree types, or to just the top one, whatever you think is correct the Tree type you've created. -- ShaeErisson

(Something to consider: slides about ML at http://perl.plover.com/yak/typing/typing.html)

Once you get used to it, all this is hardly worth much fuss, and the only thing which remains difficult to understand is that some programmers accept languages in which you have to write thing like:

 FooBar getFooBar()
 {
FooBar fooBar = new FooBar();
return fooBar;
 }

instead of:

 let getFooBar () = new fooBar

-- StephanHouben

I quite agree, but your example isn't quite fair.

 FooBar getFooBar(){return new FooBar();}

Only one gratuitous type declaration, and no gratuitous variables.

-- GarethMcCaughan

The type inference becomes a much bigger deal when you use a lot of nested functions. The types can then get very complicated as you have functions of functions from functions to functions of ints to functions of floats and functions of .......

With templates you can get type inference or deduction in C++, as well:

template<class T1,class T2> T add(T1 n1,T2 n2) { return n1+n2; }

add(1,1.0); // Instantiates add<int,double>()

It also scales to arbitrary complex types.

-- TerjeSlettebo?

That is not type inference, that is still just type specialization; when the compiler looks at the instantiation of add(T1, T2), it doesn't have to do any analysis whatsoever to discover that T1=int, T2=double, because that is simply the types of what was passed to add(). It is allowing "1.0" to be a synonym for "double". That's not type inference.

The above example is broken. In fact, it is broken in a way that clearly demonstrates the lack of type inferencing in C++: the compiler won't resolve type T, even though enough information is present to make the deduction. For instance, the expression n1+n2 returns a double in add<int,double>(), so it obvious what T should be. The correct solution is template <class T, class T1, class T2> T add(...) ... add<double>(1, 1.0);. I.e., you have to tell the compiler what T is.

But don't all languages with TypeInference do the same thing? After all, the place a TypeInference algorithm starts is with the types of the literals (and any type annotations the user does include), and goes from there. The following function:

 fun square X := x * x

cannot be uniquely typed - it is, after all, a polymorphic function which will work with any type that supports the "binary *" operator (which may or may not mean multiplication). A unique typing can be found for the following, however (using the above definition):

 var y = 5; # integer literal
 var z = 6.0; # float literal

 y = square y; # type of square is int->int
 z = square z; # type of square is float->float

The extra information necessary to select the correct versions of "square" are the inherent types of the literals. It is probably possible to construct a program which has a unique typing but no literals; assuming a universe of closed types. That probably isn't very useful however.

Yes...however, C++ cannot proceed beyond a single step of the "type inference" in question. True type inference proceeds over multiple steps until it has leeched out all information that it can. The C++ example would fail to parse something like:

   main() {
     (*func)(&x, &y);    // call func ptr that does unknown things to x and y
     add(x, y);
     x = 1.0;
     y = 2.0;
   }

...firstly because the type of x and y aren't defined at their first use (first sign that type inference isn't supported), but even beyond that, because when add() is instantiated, it doesn't know what types x and y are, and cannot cope with that.

This is example is very impressive, only I wonder why on earth anybody would want a function like this. Plus, it's awful, obscure and unreadable. Would you mind giving a really useful example of type inference which demonstrates its supposed advantages?

A similar language with actual type inference might see that x and y are double, since they are used that way after the template instantiation, and given that knowledge, work backwards to see that add() should be specialized on (double, double).

(This isn't the best example of what I'm talking about, but gives the flavor of it. Onlookers, feel free to refactor the above.)

This is a pretty big difference. OTOH I suppose one could say that C++ has trivial type inference in this context, but does not have non-trivial type inference.

The above wouldn't compile, as x and y aren't declared anywhere. Do you mean something like this?

 template <class UnknownType1, class UnknownType2>
 void foo()
 {
     UnknownType1 x;
     UnknownType2 y;
     (*func)(&x, &y);    // call func ptr that does unknown things to x and y
     add(x, y);
     x = 1.0;
     y = 2.0;
 }

C++ doesn't allow variables to be declared with unknown type (type annotations are required), the closest you can get is the above--where variables are declared with a type which is a template parameter.

Of course, when you try to write foo(); the compiler won't let you do it without specifying which ones. You can do foo<double,double>() and the compiler will except it; the compiler may also accept foo<int,int>() (and the assignments then become type coercions). foo<string,string>() will be disallowed, as you can't coerce a float to a string.

The call to the mysterious unknown function pointer above isn't necessary for the example.

Quite. I forgot to say that the example was in a non-existent language similar to C++, so your correction is right on target, and we can take your correction in place of my faulty example. The key issue here is, as you said: "C++ doesn't allow variables to be declared with unknown type (type annotations are required)".

That requirement clearly distinguishes a language with full-fledged first class type inference, versus one that does not. Type annotations are required if and only if type inference (well... or DynamicTyping) is not supported. Not quite; in principle, you could have SoftTyping without TypeInference (although it wouldn't be very useful). See ImplicitTyping.

Conversely, languages with full-fledged (an adjective that shouldn't be necessary) type inference do not require type annotations.

''If I may interject,

The reason that the example is incorrect is because x and y are not initialized to a value before using them. This means that the compiler must:''

Type-infer x and y as floats or doubles.
Reprocess the file from the beginning, adding "double x; double y;" at the top,
...because if it sticks them in at the initialization, they won't exist until the initialization, so you obviously won't be able to use them...
Finally, because no initialization is specified, the add functions will both result in ZERO, because the default for any uninitialized numeric type is, logically, zero. (If you could make this infer that x and y start at their initialized forms, you would end up with code that ether a.) was inconsistent, or, b.) made assignments one-time-only, like some functional languages.

I would argue that type inference gives you all the power of dynamic and static typing with none of the drawbacks. I'm looking forward to using the NeedleLanguage, which has type inference for its ObjectOriented type system. This could be the language that pleases everyone.

OhHaskell is another ObjectOriented language that supports type inference. And so is ScalaLanguage.

In the above example, it is remarked that:

The compiler knows that + has type "Integer -> Integer -> Integer"

What if + can also have type Real->Real->Real? Or Complex->Complex->Complex? Or Vector->Vector->Vector? I'm talking about addition in general, as opposed to the + operator of any real language (in MlLanguage and it's derivatives, IIRC, + only operates on ints and a separate operator is used for floating point addition - or am I thinking of something else?)

Actually, it can. In reality, Haskell uses TypeClasses, and + just wants any type from the Num typeclass (anything that's a number). So any of those shown above is just fine.

A more interesting question is "How do I write a single function that works the same way for Integer -> Real -> Real, or Vector -> Complex -> Vector?"

Happily, that question has already been answered by functional dependencies: http://www.haskell.org/hawiki/FunDeps -- ShaeErisson''

Has any ProgrammingLanguage used TypeInference for other purposes? Consider unit verification and conversion in a desk calculator. If you could attach a unit to each value in an equation, a unit-aware expression language could verify that you are using the units correctly and automatically convert between different forms of the same measure.

There have been languages with built-in support for units, although they are unfortunately rare. My belief is that this support should be in every language.
- See discussion halfway down page at http://lambda-the-ultimate.org/node/view/33
- See http://lambda-the-ultimate.org/classic/message11877.html

I agree! I have been trying to get some discussion started on this area - I added the SmartData page 6 months ago, and so far absolutely zero feedback! Suggestions for a better name would also be welcome! -- PaulMorrison

When I was learning ObjectiveCaml, I didn't understand how TypeInference worked (but I was very impressed). Then I learned PrologLanguage (and unification), and it made quite a bit more sense. "Oh, it does unification on the type signatures...and resulting in an unbound variable means the function is parametric! Cool!" - ScottVokes

Languages that implement type inference: MlLanguage, HaskellLanguage, ObjectiveCaml, OhHaskell, EffSharp, MercuryLanguage, ScalaLanguage

CategoryLanguageTyping CategoryLanguageFeature CategoryTypeTheory