Type Tag Discussion

A discussion moved from WhyNumberingShouldStartAtOne:

So-called "dynamic" languages are frequently rife with abominations. Few popular languages are as muddled as PhpLanguage.

Agreed. I consider flag-free to be a LiberatingConstraint to simplify dynamic typing model. I'll live with the few drawbacks.

What's a "flag-free"?

Sorry, I meant "tag-free". I used to use "flag", but others here preferred "tag" and I concurred. Related: DefinitionOfTypeTag, ColdFusionLanguageTypeSystem.

The only "tag-free" popular languages are assembly language and ForthLanguage. The majority of languages associate a type with every value, even if that type (in some languages) is usually "character string".

That is true, but that doesn't necessarily means it's "good". Perl and ColdFusion are also languages that use tag-free typing. A compromise may be to include tags as an option, but create built-in and library operations that don't rely on the tag. In other words, have the tag there if a given developer wants to use it, but don't otherwise encourage it in the "base" language. One-based indexing is one way to help. -t

I used to rant against excess OOP usage/hype, and now I rant against excess tag usage/hype. Dynamic languages have grown too "tag heavy". (By the way, we should probably move this discussion to some other topic or section.) -t

How do you measure "tag heavy"? How do you distinguish it from "tag light"? By the way, Perl internally represents numbers (for example) in three different formats. Perl's numeric operators typically work on only one of the three formats, but which of the three depends on the operator. So, when a numeric operator is invoked, Perl may have to reformat a number so it's appropriate to the given operator. Is that still a "tag-free" language? If it is, how does the language know which representation is currently used for any given number, in order to (possibly) reformat it for a given operator?

There are some operations/tricks that simply can't be done without a tag. DefinitionOfTypeTag has some examples. Reformatting a number doesn't require a "tag". What you describe appears to be parsing and/or validation.

Neither. Perl simply associates a type with every value, and uses that type to determine whether or not a reformat is needed.

Where is this "associating" happening? How can one test for its existence? (Let's assume we are testing the language behavior, not opening up the hood and studying a particular implementation. Ideally a language is defined by its behavior (I/O), not by a given compiler/interpreter implementation.)

Unless you're "opening up the hood and studying a particular implementation", why should it matter where the associating is happening? All that matters is that it does happen. It's what the language does: it uses a value's type to determine whether the value needs to be cast or not.

It matters because we (developers) mostly define and judge languages by their behavior, not by their implementation. Implementations should be swappable in theory anyhow. You seem to be using an interpretation of what the interpreter is doing. In the "tag" model, the variable has (at least) two parts: the value and the type tag. In a non-tag language, it has no (detectable) tag. An operation may do validation and/or parsing to process an operation, but it does not need a tag associated with the variable itself to do such. -t

Some argue that the "tag" is also an implementation detail, but really it's a model. Being a model, it may or may not reflect actual implementation. One may call a model a virtual implementation. And, models and implementations are perhaps interchangeable concepts, but that's another philosophical cave to explore another time. -t

From the above, your model appears to boil down to this: "A variable may (or may not) have a type." Some languages associate types with variables -- and constrain the variable's values to that type -- but some don't and allow assignment of values of any type. I'm not sure value (pun intended) is added by bringing tags into it, and it alludes to certain implementation mechanisms whether you intend it or not.

The "tag" means that there is something "separate" from value portion of a variable (or variable-like object). To me it's a pretty strait-forward concept, but hard to rigorously define in English. DefinitionOfTypeTag gives experiments that may tease out whether there is something extra besides just the value. It's not (just) an implementation mechanism because it affects externally-observable behavior.

Generally most language have operations to extract the value, and extract or examine the type (type-name or testing). If it's possible for the second to vary while the first stays the same, that strongly suggests that "type" is or can be independent of "value". Languages that don't have type tags will usually parse the value to determine or approve a "type". -t

I still don't see how this adds anything to the simple and intuitive notion that some languages associate a type with a variable, and others do not. The difficulties you claim exist appear to be entirely in your own mind. The vast majority of my students certainly don't have any problem with it, and the few who do are inevitably having such conceptual difficulties with programming in general that I don't see how tags would help. At best, invoking "type tags" would add complexity to something that, for them, is already way too complex.

Perhaps they are not ready for such models or discussion yet. When programmers start out, they often use a lot of trial and error to figure out such things. But after a while one starts wanting consistent mental models to be able to predict the behavior of languages without having to use trial and error or wait until gotcha's happen. Your students maybe haven't reached that stage yet. That's fine, but that doesn't mean the issues go away, only that they are swamped by others at the time.

If that's the case, then I'm not clear where you intend your "tag model" to be of value. If it's not intended for beginners, then who should use it? More advanced programmers easily appreciate the behaviour of various type systems without the artifice of tags. A mental model consisting of, for example, "every variable has a type, and values assigned to it must be of matching type" is trivially understood even by novice programmers.

I disagree. When I ask them about it, they often say something like, "I don't know why, it's just the way that Foo++ works. I've been using Foo++ for several years and learned through experience what it does under different circumstances." Most programmers don't think, they just "do", relying on the instance-reinforcement-based instinctive pattern recognition ability of their neural network rather than modeling to predict. And how does one illustrate "has a type" on a pen-board? Maybe drawing a, say, ...tag?
I've no problem with illustrating that a given language may define a variable as a tuple {v, t}, where v is the value and t is a type. That's simply a re-phrasing of "every variable has a type". I have a problem with your claim that this elementary illustration is a model. A model supports making predictions about the system for which it is an abstraction. What predictions does nothing more than {v, t} permit? If you're inclined to point to your ColdFusion examples, stop. They rely on a vast number of assumptions knowledge about the language that cannot be derived from a mere {v, t}.
First, is how a given language uses the "t" portion. For example, if two or more variables are being operated on and are mixed types, does the "t" override syntactical context and which "t" gets priority as far as alternative interpretations. Second, it can be used to model and/or explain the behavior of a language that does not have/use the "t", such as Perl and ColdFusion. You have not shown an alternative. And there is nothing wrong with making assumptions as long as we test them. First, one needs a model to test assumptions against. We can devise tests to see if a language has/uses "t". It's called "science". And why did you include the phrase "nothing more"? I'm not locking the model behind an iron curtain; for it may be integrated into other models.
Components of your description above -- e.g., "if two or more variables are being operated on and are mixed types ..." is far closer to being a model than only {v, t}. What I mean by the phrase "nothing more" is that you essentially claim your "model" is {v, t}, without having any additional components to your "model". You claim it is the whole model, but then you use a set of rules in your mind -- some of which you've actually written (in a non-rigorous and rather ambiguous) fashion above -- to apparently derive conclusions, when in fact the "conclusions" are effectively elements of your "model". In short, it's all muddled: Your "model", some predictions you apparently make using it that probably should be parts of the "model", plus a big bag of hidden (i.e., in your mind, but not written down) rules and assumptions, and (apparently) your own definitions are all rolled together in a generally undifferentiated ball of yarn. You might wish to study FormalMethods, as knowledge thereof might give you (at least) the tools to unravel some of this. As it stands, there isn't enough in {v, t} to test anything, at least not without an acre of unwritten assumptions. It's those assumptions and the framework around them that need to be stated before your notion can be considered a model.
What's a good example of the alleged vagueness of it? And I'm not necessarily looking for "pure" rigor. A UsefulLie is sometimes far more useful to a student or newbie than a convoluted and verbose truth. -t (I doubt you have pure rigor because the classification of linguistic concepts is still an art.)
By the way, lest you take my suggestion that some of your post might be components of your "model" as some acceptance of it, I should point out that I find phrases like "if two or more variables are being operated on and are mixed types, does the "t" override syntactical context and which "t" gets priority as far as alternative interpretations" almost entirely meaningless -- and I write interpreters and compilers!
The model is not necessarily meant to reflect the actual toolbox of a language implementator. Concepts such as "type tag" may merely be working definitions for the model and only for the model. The context is teaching a student/newbie how to reason about code, not necessarily write interpreters. One can say, "here is our working notation and rules for 'variables', 'values', and 'types'." Note that I tend to model constants as dummy, temp, or anonymous variables to keep the model simple. It works good-enough for most of the "common" languages.

And they are "not in my mind", I gave explicit examples. It can be shown that PHP differs from ColdFusion in externally observable behavior, for example. Thus, you are wrong on that one. I started using the tag model when trying to "explain" to myself why different dynamic languages handle "types" differently. A pattern emerged: tag languages and non-tag languages. And the non-tag languages were easier to handle, which got me wondering why people want to keep the miserable tag model.

Obviously, there are differences in TypeSystem behaviour and they are easily illustrated. Your apparent belief that people (in general, presumably) have difficulty understanding these, or that "type tags" help people understand them, is what appears to be in your own mind and your mind alone.

See above. Illustrate away. I gave code and tag examples, your turn...

Huh? I'm referring to your illustrations. They are easily created. What is in your own mind is the belief that "type tags" help people understand trivial notions like "a value is of a given type" or "a variable is of a given type, and can only be assigned values that match the type" because they find these difficult to grasp. They don't, except when they're already irredeemably lost. In that case, "type tags" don't help, because nothing will.

If you can make a visual or notational representation of them and write down steps a hypothetical interpreter goes through to evaluate expressions, it can give students or newbies a concrete sense of what is happening. It's a "machine" they can see and x-ray and follow the steps of with their own eyes in a step-by-step "mechanical" way.

You're describing implementation of languages, not their conceptual basis.

It can answer questions such as "why does appending a "+0" at the end of a JavaScript expression force "ambiguous" expressions to be "numbers"? And what is meant by "ambiguous"? One can give examples and compare them to non-ambiguous variable examples. What does the {v,t} of an ambiguous variable look like compared to the {v,t} of a non-ambiguous variable? It can also explain why some static languages are static: you can't change a variable's tag during runtime. -t

How does that any better than, "you can't change a variable's type during runtime"?

Because "seeing" it happen step-by-step, and compared to the alternatives (such as flex t's), often makes it clearer in the student's mind. Statements of rules like you give sometimes don't sink in because one cannot see it "in action" and in contrast to other statements (or representatives there of). That's why math theorems are often given with a set of problems to work out where students apply them, and are often required to show the steps. One "executes" the rules with samples to make them more concrete. If it's a common teaching technique for math, why is it (allegedly) not useful for code processing understanding? Are they that different, or are math teachers "doing it wrong"?

There is absolutely nothing wrong with working out problems, seeing processes happen step-by-step, and examining the internal workings of systems to understand them better. I encourage all of these things. Unfortunately, your model is none of these. Mainly, it's incomplete (it's {v, t}, essentially), but when you appear to add to it, or attempt to explain it, we get unrecognisable phrases like "flex t's" or "if two or more variables are being operated on and are mixed types, does the "t" override syntactical context and which "t" gets priority as far as alternative interpretations". I have no idea what those mean. For a model that's supposed to add clarity, it sure seems obscure.

I thought it was fairly clear. I apologize if it's not, and fuzz wasn't intentional. Perhaps I should take examples from existing languages and give concrete step-by-step examples and show which portions of the code it corresponds to. That way I can point to "this thing here in the sample code line X" and then vocabulary is less likely to get in the way because we all know what "that thing" is regardless of whether we personally call it a variable, object, constant, or a whizzle. I'll see if I can work something like that up. -t

You underestimate the ability of neophyte programmers to understand trivial concepts. Perhaps you're projecting your unusual difficulties with abstraction onto programmers in general?

I'm going based on my experience with other programmers. If your experience with such differs, so be it. I can't download your life experience, and vice verse. My statement above about "instinctive pattern recognition" is an accurate statement based on my actual experience with other coders, and I swear to God, Buddha, and all the Indian gods that it is true to the best of knowledge. And yes, I do have difficulty with your view of "abstraction". Thank God/Buddha/etc. I didn't get you as my instructor. Further, different students learn different ways. Some are verbal, some are visual, some like symbols, some like words, etc. The more angles you present the same info from, the more likely something is to "click" with a given student.

Example scenarios to explore:

Static languages don't allow you to change the type tag at run-time
Dynamic tagged languages often use a combination of information to determine final "type", including a variable's current type. The precedence rules vary (tag versus context) per language. Context can include neighboring operands and operators.
Tag-free languages have no detectable tag. There are no known experiments that suggest the value is separate from a "type".
Do ambiguous types (such as "object" or "variant") behave differently than strings in the same circumstances?
What type (tag) is the result of expressions with mixed and/or ambiguous types?

TagFreeTypingRoadMap

OctoberTwelve