Manifest Typing

A ProgrammingLanguage uses ManifestTyping if and only if it requires a type annotation to be given explicitly for each declaration. That is, types are "manifest" in the source code. Contrast with ImplicitTyping.

ManifestTyping is a special case of StaticTyping. StaticTyping means that the language design requires each declaration to have a nontrivial type, but the type does not necessarily have to be specified in the source code; it could instead be found by TypeInference.

StaticTyping is divided among languages with ManifestTyping (CeeLanguage, PascalLanguage, ActiveOberon?, CeePlusPlus, AdaLanguage, JavaLanguage, CeeSharp, etc.) and languages with TypeInference (HaskellLanguage, Standard MlLanguage, ObjectiveCaml, CleanLanguage, etc). For many people, having to annotate all identifiers with their type represents a burdensome or at least annoying task; thus, they prefer either DynamicTyping or languages based on type inference that preserve the advantages of static typing. In the context of static typing, it may turn out that type inference is not a perfect solution either; see ManifestTypingConsideredGood.

In addition to the benefits (and drawbacks) associated with static typing in general, ManifestTyping allows for easy and straightforward overloading of names, while modern IDEs (Idea, EclipseIde) use the context provided by ManifestTyping to implement automatic refactorings and very effective CodeAssist?.

Note the word nontrivial above. The simplest way to view various kinds of type systems in a uniform framework is to treat all declarations as having types, but in the case of DynamicTyping, the type is always "any". That is, a dynamically typed language has a trivial static type system with only one type. Dynamically typed systems also differ from static type systems in that they do not treat the possibility of a type failure as a compile-time error. (This property is shared by SoftTyping.)

All of this is independent of WeakTyping vs StrongTyping vs StronglyTypedWithoutLoopholes. Strong typing means that there aren't any back doors in the type system: no unsafe casts, for example. Weak typing means that the type is only a suggestion; the programmer may write code that ignores this suggestion. CeeLanguage is an example of a manifest and weakly typed language. (This is the most common usage of the terms "strongly typed" and "weakly typed", but they are often used inconsistently; see the StronglyTyped entry.)

The types that appear in declarations should not be confused with tags that are used to distinguish run-time values that would otherwise have the same representation. Sometimes these tags are called "run-time types", but I find that usage unhelpful.

In summary, type systems vary along the following dimensions:

is there a nontrivial type associated with each declaration? [StaticTyping -> yes, DynamicTyping -> no, SoftTyping -> optional]
if there is, are these types declared explicitly in the source code? [ManifestTyping -> yes, TypeInference -> optional]
does the possibility of a type failure cause a compile-time error? [StaticTyping -> yes, SoftTyping or DynamicTyping -> no]
is the type system strictly enforced, with no loopholes or unsafe casts? [StronglyTypedWithoutLoopholes -> yes, WeakTyping -> no]

The last of these questions is a matter of degree, since some languages that are usually considered to have strong type systems may have "unsafe" features (for example native methods in JavaLanguage, or unsafePerformIO in HaskellLanguage).

Notice that there are some corners of this design space that have been poorly explored, for example designs that allow a programmer to control whether potential type failures are considered to be errors on a fine-grained basis. In principle, this would give a type system that generalizes both static and dynamic typing. (SoftTyping is not a generalization of StaticTyping, because it never rejects programs at compile time.)

Interfacing between languages using dissimilar type systems is also a neglected area (IDL-based approaches such as CORBA or COM impose a rather unexpressive "least common denominator" static type system, which may have a large ImpedanceMismatch with the type systems of some or all of the languages being interfaced).

The preceding paragraphs have it backwards. In a manifestly typed programming language, values have types but variables do not. For example, Lisp is a manifestly typed language, but variable declarations in Lisp do not use types. (They may have types as a suggestion from the programmer, but strictly speaking, the variable does not have a type.) See AnsiCommonLisp, by PaulGraham, page 218 (ISBN 0-13-370875-6 ). I couldn't find a good online definition (other than the one here, which is backwards).

At least the SchemeLanguage standard agrees with the usage on the rest of this page. Quoting R5RS [RevisedReportOnAlgorithmicLanguageScheme]: "Scheme has latent as opposed to manifest types. Types are associated with values (also called objects) rather than with variables. (Some authors refer to languages with latent types as weakly typed or dynamically typed languages.) Other languages with latent types are AplLanguage, SnobolLanguage, and other dialects of LispLanguage. Languages with manifest types (sometimes referred to as strongly typed or statically typed languages) include AlgolSixty, PascalLanguage, and CeeLanguage."

CommonLisp is strongly typed and dynamically typed. Dynamically typed is not a synonym for weakly typed. Statically typed is not a synonym for strongly typed. C is, in fact, a weakly typed language because the programmer can force any datum to be interpreted as any other datum (and not necessarily with good results). Also see "The Truth about Type Safety in Programming Languages" (http://lisp-p.org/tat/).

The standard usage in type theory (see <http://research.microsoft.com/Users/luca/Papers/TypeSystems.A4.ps> for a good overview article by LucaCardelli) is to have a two-dimensional grid to represent type systems. On one axis is strong/weak typing which is about type safety. (AssemblyLanguage and ForthLanguage are weakly typed, CeeLanguage is intermediate, and AdaLanguage and LispLanguage are strongly typed.) On the other axis you get dynamic versus static or manifest typing. C is statically typed while both Lisp and SmalltalkLanguage are strongly dynamically typed. I'm not convinced of the value of static typing myself. I also find it weird that most people who seem passionate defenders of static typing aren't using the programming languages with very advanced static type systems like MlLanguage but languages with relatively weak type systems like C.

Indeed. It should be pointed out that there's a huge difference between PrecambrianTypeSystems? as found in C and Java, and ModernTypeSystems? as found in HaskellLanguage and ML. -- StephanHouben.

WhenIsManifestTypingConsideredaGoodThing?

The following discussion describes something that is basically orthogonal to typing (the italicized text describes why), but is interesting nonetheless.

Also note that what constitutes a variable can be quite different under manifestly typed languages than under automatically typed ones. Firstly, not all values are stored in variables in either kind of language. An expression can yield an anonymous value, which is passed to a function, or serves as an operand to an enclosing expression, or is discarded. Under ManifestTyping, this creates the uncomfortable problem that some values are not addressable memory with a type. For example, consider that in CeeLanguage, the return value of a function is not an lvalue. This creates the following problem:

 struct contains_array { int y; int x[10]; };

 struct contains_array returns_it();

 returns_it().y; /* okay */
 returns_it().y = 4; /* error, not lvalue */
 returns_it().x[3]; /* oops! */

We cannot access the array, because the returned object is not an lvalue, and consequently we cannot take its address, which is necessary for doing the pointer arithmetic. The C standard says that an object is some region of storage, but examples like this show that the model is broken; what returns_it() returns is consequently not an object, but some pseudo-object thing that does not occupy addressable storage.

GCC compiles the code without complaining (bar the assignment). GnuCpp even compiles the assignment (arguably, a big fat warning would be warranted). I'm not sure what the C99/C++98 standards say about these things, but IMO there is no reason why returns_it() shouldn't be an lvalue, though it's arguable whether it should be const (in C++). There are applications for non-const lvalues, though they are rather obscure:

 struct Foo {
 void doInvert();
 Foo& invert() {
 doInvert();
 return *this;
 }
 };
 Foo returnAFoo();
 void acceptAFoo(Foo);

 void doSomething() {
 acceptAFoo(returnAFoo().invert());
 }

-- ArneVogel

Under sane languages implementing ImplicitTyping, you don't have this kind of silliness.

This has very little to do with manifest typing. It has more to do with CeeLanguage not having GarbageCollection - although even without GC, it could have been made well-defined in C by defining the scope of the returned object to be the scope of the calling function. In other words, it's just a language design flaw. The above example is valid code in CycloneLanguage, for instance (I think; I haven't tried to compile it).

This has nothing at all to do with garbage collection. Take a look at the concept of temporaries in CeePlusPlus. -- ArneVogel

Further discussion moved to GeneralizedReference.

Some potential issues with this definition are given in ImplicitTyping with regard to non-alphanumeric "clues" that languages may give to the compiler/interpreter.

CategoryLanguageTyping CategoryCodingConventions