Applying The Side Tag Typing Model

These are various scenarios, notes, and conceptual illustrations regarding the controversial "side tag" model of programming language/engine typing. (Sometimes they are called "side flags" or "type tags" in various spots on this wiki.)

Compiled "Static" Languages

The "side tags" are used by the compiler, and can be discarded upon run-time because they are no longer needed because the "execution path" has been pre-limited by the compiler such that type checking is no longer needed.

Interpreted and dynamically-typed languages

Under these languages, the side tags are kept "with" variables and objects so that they can be checked and changed during run-time.

As a physical analogy, static compiled systems (above) lay down walls and physical filters (such as the anti-car obelisk barriers found in front of some government buildings), but once the cement dries, there is no active checking (no guard on duty). In contrast, dynamic types have a live guard that checks pass-ports and ID (side tags) at entrances and exits.

Many newer languages that seem "compiled" are actually hybrid compiled and interpreted. How they handle the mix of static and dynamic types depends on the specific language.

Tag-Free Language

Tag-free languages are usually dynamic languages that treat values more or less as a string (list of bytes) unless non-string actions are requested, at which point operators or operations will attempt to parse the string as a given data type, producing a conversion error if unsuccessful. One does not ask the system if a variable or object "is" a given type, but rather ask if it can be ''interpreted as" a given type.

One can also view most computers' machine language as tag free. It doesn't enforce the usage of the referenced bytes such that bytes originally laid down as strings can be re-interpreted as (used as) floating point, for example.

Keep in mind that many dynamically-typed languages (the 2nd type) also use re-interpretation (parsing) to convert types as needed. (A sub-distinction between "strict" and "loose" could perhaps be made.) The difference is that they may make use of the type-tag to guide interpretation, such as using the type-tag to resolve ambiguity over overloaded operators. Different languages can handle this differently, and the distinction may be subtle. See EmpiricalTypeBehaviorAnalysis for examples on how to test for tag-like behavior. (I'm not much of a fan of overloaded operators such as "+" for both addition and string concatenation.)

A fancier tag-free language may also permit complex variables, not just scalars. For example, every variable or object may be treated as a map or a tree. If you make a scalar assignment, such as "x='foo';", then the value "foo" is merely a "data leaf" at the very root of the tree [footnote 1]. If you then perform "x[4]='bar';", a branch called "array" may be created with one leaf [foonote 2]: "bar". Executing "x.y='zoop';" may create a branch called "y" with a leaf of "zoop":

Tree of "x" (root)
- [leaf] foo
- [branch] array
  - [branch] 4
    - [leaf] bar
- [branch] y
  - [leaf] zoop

Generally there is a one-size-fits-all structure that can "be" or represent complex or compound objects[footnote 3]. There are different design approaches to such languages and syntaxes. Generally one does not ask IF a variable/object "is an" array, but rather one would ask if it has multiple elements.

Function/methods (as addresses or actual code) can also potentially be stored in dynamic trees and maps to make them object-oriented.

Comparative Advantages

Each tends to have different strengths and tradeoffs, although there is wide disagreement over the practical value of each, and which applies best to which niche. Thus, the opinions below do not have universal agreement.

Compiled static languages tend to be the more resource friendly in terms of memory, disk, and speed because the type tags don't need to be kept in the executable and don't need to be machine-analyzed analyzed for each usage of a variable/object. They also can protect the programmer from making TypeSafety mistakes, and the type info can serve as a form of documentation to describe intended usage of a variable/object. The down-side is that the benefits of dynamic programming are lost, such as lack of MetaProgramming; and the requirement of a potentially long compile step.

Dynamic flag-based languages tend to be a compromise between the static and the tag-free languages. However, they are also conceptually the most complex because one must consider both the "static world" and "dynamic world". But they also allow one to use stricter typing when needed, but "back off" a bit when dynamic techniques are a better fit.

Tag-Free languages tend to be conceptually simpler because one does not have to think about "type tags" and what might be in them. The values are more WYSIWYG. In tag-centric languages an Integer and a String may both print out "1234", but behave differently in other circumstances because they have different side tags. This will never be the case in tag-free languages (barring unprintable characters).

Oh boy, the NextBigThing. A bible prophecy of what will happen once this superior tag free non existant programming language eventually surfaces (VaporWare). You will still have a string "tag" so that you can print out the string, will you not?

Perl and ColdFusion already do it at the scalar level.

The code may also be more compact because type declarations are not needed. (Validation code may still be used in some places.) Many programmers find that such compact code is easier to read and change, at least for small-team projects.

The down-side is that the IDE's and language tools will not be able to "protect" the programmer from certain TypeSafety mistakes, relying more on other techniques to check and document programmer intentions. They do in general require more self-discipline on the part of the programmer.

Real or Virtual

Note that a tag may still be used "under the covers" for efficiency purposes, but tag-free languages don't "expose" or behave as if they have tags, and would in theory behave identically under a non-tag version of an interpreter for the same given language (other than perhaps performance and resource usage issues.). Thus, the notion of "tags" is a conceptual model, for under the hood extra or adjusted information may be included for various internal reasons.

Object or Variant Types, and Compound Tags

In hybrid languages there is often a "catch all" type called something like "object" or "variant" or "any". These allow the variable to hold different types over the course of program execution. Generally there are two different ways to model these under the side-tag model.

The first is to use processing similar to the tag-free model where if say a number requested, the value is parsed as a number. The second is to essentially have two side-tags, or a compound side-tag that indicates that a variable/object is allowed to be dynamic, and a second that tells which type is currently is. The "variant" declaration then is essentially a permissions mechanism that allows variables to change their type.

Generally one should consider tags to be lists rather than a single type-name, because it could potentially have information such as "vector-array", "integer", and "variant" at the same time. This would mean that the variable/object is a vector-array of integers, and the type(s) can change during run-time. (Whether this is per array element or per entire array is language-specific matter.) Other language info such "private" and "public" can also perhaps go into the tag. Thus, tags are a very powerful conceptual concept. This brings up the question: "is 'private' a "type"?"

--top

Footnotes

[1] The approach I favor is to have a branch called something like "~value", and regular "scalar" assignments put a value leaf there. The "~" avoids conflict with arrays or array elements that may have "value" as a value.

[2] Some approaches don't make a formal distinction between a branch and a leaf, such that an "array" branch may not be needed. There are various trade-offs, and trees of maps can be used instead of just trees, although trees can serve as maps in themselves (and allowing nested maps).

[3] Sounds like EssExpressions? See GreenspunsTenthRuleOfProgramming.

See http://dl.acm.org/citation.cfm?id=1869635 for a concrete application of type analysis to a dynamic language. --AnonymousDonor

Note that the above requires paying a fee. Based on the synopsis, it seems to emphasize machine performance issues rather than human developer productivity or conceptual models. --top

CategoryTypingDebate