Type erasure is the removal of an object's typing information from the runtime system/image of a computer program (more accurately, it is the decision not to put it there in the first place).
When a compiler (etc) types a program (in other words, it solves the CompileTimeTypingProblem for a program), it has a collection of terms with associated types. How to encode this stuff in the runtime image? There are several different choices to be made, depending on the language and its semantics.
- In languages without type subsumption (procedural and functional languages with StaticTyping), all terms are ideally typed (see CompileTimeTypingProblem) and each term has a unique type - in other words, all type information is known by the compiler, and the execution of the program won't change that. In these cases, there is no reason to keep the type information around for the runtime system; so it generally isn't generated. This is type erasure.
- In languages which have either DynamicDispatch (essentially, any ObjectOriented language) or DynamicTyping (including several non-OO languages like pre-CLOS LispLanguage), not all information is known by the compiler, so the runtime system will need to perform type queries to ensure program correctness. Thus, there must be some type information available to query - hence type information is made part of the runtime image. In general, objects have hidden members to data structures which have type information (some necessary optimizations for small objects like ints can cloud the issue; we discuss down below). In some languages, these structures are first-class objects (or contain first-class objects) - SmalltalkLanguage is an example (an object of Class type). In others, like CeePlusPlus, these structures are not first class (there is no portable way to access an object's VeeTable in C++). See also RunTimeTypeInformation.
- In some languages, it depends on the type. C++ performs type erasure for intrinsic types (ints, doubles, bools, chars, etc.), pointers/references (different from the object in C++), enums, and PlainOldData objects; type erasure is not performed for objects with virtual functions. (C++ uses StaticTyping, but does use DynamicDispatch). Likewise, JavaLanguage (pre 1.5) performs type erasure for byte / short/ int / long / char / boolean / double / float, but not for Object and any of its subtypes. (One could say Java performs type erasure for interfaces, but since those don't really exist, that's a moot point).
- And some languages use advanced techniques like BoxingConversions to get the best of both worlds; CsharpLanguage is one example and JavaLanguage will get this capability in 1.5.
Is
TypeErasure important? Generally, for large objects, it matters little - the runtime memory cost of the type information is one structure (of some small size) per type that the system knows about, and one pointer per object. For an object consisting of dozens or hundreds of bytes (or more), one additional pointer isn't noticeable.
For small objects, such as numeric types, booleans, ConsCells, characters, enums/units, etc. - the overhead is tremendous. A boolean needs only a single bit to encode; compared to 32 or 64 bits for the size of a pointer (on a modern machine). In many cases, it is desirable to have objects of this sort occupy a single machine register (an important optimization), if they have to lug around a type pointer it is a huge problem. Many runtimes for languages that never perform type erasure use various "hacks" like reserving one bit for an "object/intrinsic" flag, redefining ints to be 31 bits rather than 32; and insisting that pointers be word-aligned (a limitation present on most CPUs regardless). One of the distinguishing features of the LispMachine (the hardware, that is) is extra "tag" bits which allow the type information to be economically stored. Many like to flame JavaLanguage for excluding intrinsic types from the family of objects; it is precisely this sort of hassle that lead the designers of Java to make that decision (the semantics of Java are such that the intrinsic types may always be subject to type erasure). That said, the BoxingConversions of C# might be a more elegant solution.
TypeTheory