Type System

A TypeSystem is a particular theory (in the mathematical sense: consisting of axioms and logical statements) of types and their construction. TypeTheory is the study of TypeSystems, and generally abstracts TypeSystems by both their emergent properties and the mechanisms by which they construct and pass types (e.g. function-types, covariance, contravariance, dependent types, predicate types, etc.). Useful TypeSystems are generally characterized by a certain degree of flexibility in describing types and their interactions, a relationship between types and programmatic elements, plus logical consistency. (One could come up with off-the-wall TypeSystems that have no relation at all to programming, but they wouldn't be of much value to anyone - not unless they revealed some emergent properties that could, with some design work, be exploited by programming systems.)

A TypeSystem can be attached to a programming language by associating elements of the programming language with certain types. When done properly, one can then perform proofs or certain forms of MetaProgramming over programs. The most common use is to obtain TypeSafety (a guarantee that the program performs no illegal or undefined operations), which requires that the TypeSystem be applied to the operators and operands of the language. This is what most people think of when they hear about use of 'types'. Proving TypeSafety requires a proof of progress and preservation: that well-typed program segments always have a good next-evaluation-step, and that predicted types are, in actuality, preserved across evaluations. One can generally extend 'TypeSafety' in the 'no crash' sense to other purposes by generously re-defining in your language what means 'illegal operation'

Examples are multifarious. One can use types to prove that calculations have the right units (e.g. distance in meters), and that certain calculations terminate (by attaching a 'terminates' clause to a function), and that only certain exceptions are thrown (Java guys are familiar with this one)... that protocols aren't violated, that security policies are maintained, that memory is handled in certain ways (e.g. structural typing, which allows for a variety of optimizations to be guaranteed such as in-place updates and avoidance of the garbage collector), and secrecy/privacy/encapsulation concerns (no peeking!).

But types have other purposes than just proof-of-nonviolation. One can utilize types to form type-sensitive syntaxes (a simplistic example: knowing that 'red' refers to something of type 'color' when 'type color = red | blue | green' is defined earlier in the file). With a little reflection - the ability to ask for the 'type' of an expression, or some properties of it - one can use types for MetaProgramming. CeePlusPlus TemplateMetaprogramming would be a classic example.

When types are used in manners beyond proving TypeSafety - and even for the more complex interpretions of 'illegal operation' - one might call it TypefulProgramming. In practice, most of these advanced uses of types must be performed statically (at least as far as possible - StaticTyping or SoftTyping) to be of any practical value - the necessary computations are too expensive otherwise.


I think of types as a distinct formatting of fixed length byte strings. Some of those formats are defined by the the CPU (long integers, double floating point numbers) and other types are language defined or programmer defined. In an OOP language, the type determines the set of functions available to read/write the formatted data that makes up your object/type. Objects (typed variables in OOP) are "known" or implemented by the language and all programmer defined types/objects are defined using composition or inheritance of "known" Objects using a Class. A program can only execute "known" types.

I allow annotated variables (variable declaration), implied types from constant assignment and types that are returned from the declared result of a function. If you explicitly define a variable then it's type can't be changed (overwritten) unless you explicitly declare it as a different type. If you assign a value to a variable that hasn't been defined then it gets the type of the expression and the resulting value. If the assignment is to a variable that has been defined, then like types are copied and unlike types are converted or an error is generated if the conversion is deemed illegal. I allow a reference to a variable where that variable hasn't been declared and I can't infer it's type at compile time. This would only be valid if that variable got a type and a value before this code was executed. It would always be a run time error to refer to a variable's value that hadn't been given a type and value.

If you initialize variables at the top of your function, the compiler can correctly infer it's type without you having to declare it. If a function is defined with typed parameters and a return type, it isn't hard to infer the type of any variable assigned the value of that function. In an OOP language, all defined Objects have a known type for each member variable so these types are automatically known using it's Class (which is available at execution time). Few variables actually need to be explicitly declared.

If "type theory" is about how to implement polymorphic functions, then my solution to not check parameter type or number at compile time, provides a flexible solution. With my solution, you can call a function with optional parameters or by using any type of variable as a parameter that can be converted and still make sense. My language does automatic conversions as a default but this can be restricted at multiple levels if desired. Checking function parameters is left to the programmer by default but compile time parameter checking can be done automatically to satisfy the "type safe" fetish.

I see nothing Mathematical in my use of types and I wouldn't mind knowing what more there is to types, other that uniquely identifying combinations of bytes that have a specific interpretation. If you use a 4 byte sequence that is actually a "long integer" as if it was a 4 byte "float", you will obviously have a problem. I think the problem for most compiled languages is that they just point to memory locations at execution time and expect the correct format of variable to just be there! In most interpreted languages, all the attributes of a variable (including name, type, size, value etc) are available at execution time and can be accessed and used whenever necessary.

I define 2 quite different types of Objects. Server Objects define the equivalent of a table, it's indexes and the functions that use that data. Regular Objects are defined using a Class and can be used for smaller Objects that are contained in the Server Objects. Server Objects exclusively communicate using messages that are a kind of flexible Map rather than a function call so all function calls are local to a Server. This means that matching function parameters and return values isn't a big problem because they are always local. With the appropriate switches down to the function level, compile time errors can be raised for undefined type, position or number of parameters on function calls. Automatic type conversion can be turned off and assignment can be with or without auto type conversions.

As no static compiled code can be guaranteed never to crash, my interpreted code does internal type checking at execution time regardless of how strictly it was compiled. This small performance hit is the price I am willing to pay for flexible, dynamic code and the ability to catch all errors into programmer defined programs.

To check whether a variable has a type you only need to refer to the variable like "variable?" which returns _true if it has a type and value. To check if it has some kind of numeric type the syntax is "variable?n" or "variable?c" for some kind of character type. The clause "If variable?long and variable > 50" wouldn't cause an error if the "variable" wasn't defined as a "long" but it would fail the "If" clause. The "variable > 50" wouldn't be an error because the first clause would fail and the second clause would be omitted.

--- DavidClarkd

A TypeSystem is theoretical; as described above, it is "a particular theory (in the mathematical sense: consisting of axioms and logical statements) of types and their construction."

What you describe is an implementation of a TypeSystem.

[Why make such a fuss over something so simple and obvious? How does this Mathematical theory relate to the real world of computer languages? I am currently finishing the design of a computer language and I really want to know.] --- DavidClarkd

Simple and obvious TypeSystems are sufficiently intuitive to not require mathematical consideration. There are more sophisticated type systems for which mathematical consideration is valuable. I have a colleague who developed a type system based on classification theory, which allows sophisticated constructs to be defined entirely within the type system. For example, he illustrated his idea to me by demonstrating type definitions that declaratively described the allowable movements of chess pieces, and he showed how the movements could be constrained by TypeChecking. However, using such a TypeSystem can result in undecideable questions -- you might be able to define a scenario that could take an infinite amount of time to determine whether type 'x' is compatible with type 'y'. In such cases, the underlying mathematical theory can be used for a pragmatic purpose: to predict where such undecideability might occur, and warn the user. It can also tell the user when a decision might a long time, and he designed an environment around this where certain forms of type checking would run in the background and return helpful warnings -- e.g., "warning, under <x, y, z> obscure circumstances, <a> might fail" -- days or weeks after you wrote the code.

[Does it do something that ordinary code couldn't? It seems to me that the functionality you describe should be in functions or other code that programmers would have access to, rather than constrained to compile time only (assuming type safety) and only by the terms specified by the compiler writer. Implementing functionality in this manner obviously can be done but does it do something new or something old significantly better? Many times when deciding if some feature should be included in a computer language, the general use of a feature comes into play. If a feature has only limited use and can still be implemented by the language using a more general set of features, then that feature would normally be excluded from the feature set of the language. Would this kind of complex type system be used often enough and in enough contexts or just in some special cases?

One of my design goals was to automatically do anything that can be inferred correctly (memory management, disk persistence, concurrency etc) so that the programmer is freed up to think about their problem rather than implementation details. Does a complex type system help or hurt this goal?

-- DavidClarkd ]

It does nothing that "ordinary" code couldn't. It just does it with less code. We use TypeSystems because they provide automatic verification of aspects of a program before and during program execution. The more sophisticated the TypeSystem, the more that can be verified automatically.

[Is it a real problem that isn't currently being addressed or just an interesting solution? I don't mean this statement in a bad way at all. Is it a nice solution to a problem that normally doesn't exist? It could be quite useful for writing papers but does it have meaning to most programming? Given the design goal I stated eariler, is it something that I should be concerned with? --- dfc ]

It's an approach that allows you to express complex relationships amongst types -- e.g., "an 'x' can be a 'y' under circumstance 'p' unless 'q' is true and then it's a 'z' -- perhaps more easily than with current, popular approaches to TypeSystems, along with a degree of verification and validation that I haven't seen before. It's a nice solution to problems that often exist, but that normally are handled with imperative code rather than declaratively. Is it something you should be concerned with? Currently, probably not. What I saw was still under development. When there is at least one reference implementation, and (if) it gains some attention, then maybe.


See also AbstractInterpretation (which is not about safety, but also about properties of programs).


EditText of this page (last edited December 16, 2013) or FindPage with title or text search