Type Tag Difference Discussion

Related to DefinitionOfTypeTag, hypothetical illustrations ("prediction engine") of the "tag model" for scalar variables in both "kinds" of dynamic languages.

 TL = Tag-based dynamic language
 NTL = Non-tag-based dynamic language

Givens (assignments):

 a = 123
 b = "123";

Observation 1:

TL:

  write(a.typeName());  // number (<--result)
  write(b.typeName());  // string
  // The type extractor function/method name may differ per language

NTL:

  (Has no direct equivalent of "typeName")

Observation 2:

No known way is discovered in a language (such as a NTL) to determine if a variable like a or b is declared with quotes or without. Thus, if we replace a=123 with a="123" in a program (and vice verse), they behave identically for all known programs.

Observation 3:

TL: (May very per language)

  write(a + a); // 456
  write(b + b); // 123123

NTL: (May also vary per lang, but a and b always same result[1])

  write(a + a); // 456
  write(b + b); // 456

Comments:

To avoid this ambiguity, a dynamic language would use a different operator for concatenation. Personally I prefer this - addition and concatenation are different operations. -- AnonymousDonor

I agree, but in practice common existing languages screwed this part up and "+" is overloaded, even if a dedicated concatenation operation is available. - t

Observation 4:

 // both TL and NTL
 write(isNumber(a)); // true
 write(isNumber(b)); // true
 a = 0;
 write(isBoolean(a));  // true  (CF at least)
 write(isNumber(a));  // true
 a = "0";
 write(isBoolean(a));  // true  (CF at least, may vary for TL's)
 write(isBoolean(a)); // false in PHP equiv. (and considered a design flaw by some)
 write(isNumber(a));  // true

Observation 5:

The triple equal in Php ("==="). Description pending...

Observation 6:

 // JavaScript
 a = 123;
 b = "123";
 alert(a + a); // result: 456
 alert(b + b); // result: 123123
 alert(b + a); // result: 123123
 alert(a + b); // result: 123123

(Explanation pending)

--top

Pseudo-Machine-Language Representation of variables using XML-like structures.

 foo = 12.3;

TL:

 <var name="foo" typetag="number" value="12.3"/>
 .
 // further examples will just use "tag=" for brevity:
 .
 <var name="foo" tag="number" value="12.3"/>

NTL:

 <var name="foo" value="12.3"/>

...

(Dots to prevent C2 "PRE" spacing bug)

Observation 1 Modelling:

TL's have an explicit "type tag" ("tag" attribute), and the "typeName" operator examines this tag and returns its name. (It may be a type ID internally, but turned into a fuller description via a look-up table in the interpreter of some sort.)

 <var name="a" tag="number" value="123"/>
 .............R^...R^................... // The interpreter examining the tag attribute (R = read)

The tag was set in the first place by examining the assignment of the constant, such as "a=123;". If it doesn't have quotes then it's usually assumed to be a number of some kind in TL's, and "a" gets a tag of "number" (or "int" or "double" etc. depending on language. We'll keep it simple for the example and use "number".)

 b = "123";
 ...R^..R^. /* The interpreter detecting (reading) the quotes, and consequently 
 setting the tag attribute to "string", as follows: */
 <var name="b" tag="string" value="[etc.]"/>
 ..................W^..........  // The interpreter setting the tag attribute's value (W=write)

NTL's don't have such functions because they have no "tag=" attribute. There's no tag to set because there is no "tag=" attribute to set. Only the value is read and kept for the assignments (without quotes). Something like Observation 4 tests could potentially guess for a typeName()-like function, but as you can see it could have a non-unique or confusing answer, and thus would provide very little practical use. (Ob. 4 discussed in more detail below.)

Observation 2 Modelling:

TL languages use the quotes in constants used in assignment statements to set the variable's "tag=" attribute as shown in Observation 1 Modelling. The "string" or "number" settings are kept in the tag attribute until otherwise changed by the program. (The tag's value is NOT temporal.) This tag will affect processing and results (such as Observation 1). TFL's don't have a "tag=" attribute in variables and therefore have no "slot" to track whether quotes were used or not. They only keep the value (the stuff in-between quotes) in the "value=" attribute. (In theory, the value could store the quotes, if given, but essentially that's a round-about equivalent of a "tag", and complicates the model.)

However, as explained in Observation 1 Modelling, TFL's have no "tag=" attribute to track whether such assignment statements used quotes or not. There is no "slot" to keep such info, so it's discarded, keeping only the value (without outer quotes).

Observation 3 Modelling:

I will finish long explanation later, but the short explanation is that in TL's the "+" operator can examine the tag of one or more operands to determine whether to apply concatenation or arithmetic addition. In NTL's, there is no tag to examine, and thus only the value is available for examination, which wouldn't be different based on the quotes, as described in Observation 1 Modelling.

Observation 4 Modelling:

The biggest question is how can "isBoolean" and "isNumber" be true at the same time.

If we look at the conjecture that these "isX" functions look at the tag, then we have at least two problems.

First, is that the result differs from what one would expect from Observation 1: they don't necessarily match.

The second problem is that we get two "types" at the same time. Our XML-like model of variables only has one "tag=" attribute in TL's and zero in NTL's. Thus, the variable's "structure" has no place to store a second type.

The answer is that common interpreters parse the value portion ("value=" attribute set) to determine "isX" (or at least behave as if they do). In our model, we'll represent all values as strings for now[2]. And if a string has a letter in it, then if fails "isNumeric", for example. ("e" may be an exception, if scientific notation is permitted.)

In ColdFusion, "0" and "1" are considered valid Boolean "values" by convention (along with others such as "yes", "no", "true", and "false"). Thus, "1" parsed as a valid number and as a valid Boolean.

And even TL's behave as if they parse the value instead of examine the flag. Even values assigned with quotes (such as 'x="123";') pass "isNumber()" or its equivalent in most TL's. "isNumber()" essentially asks, "Can this thing be interpreted as a number?" The same value (or string) can fit the qualifications for "Boolean" and "Numeric" at the same time regardless of what the tag says.

(If the tag says the value is "number", then parsing to check "is numeric" may not be necessary in some languages if consistency is enforced. But either way the result is the same as "always parse" if performance is ignored. We are modeling for output here, not speed, and choose simplicity of the model above performance.)

Php's is_bool() function appears to only look at the tag (IINM), and thus both is_bool("True") and is_bool("0") return False, even though is_numeric("42") returns True (parse-testing a string of digits). There's some grumbling about this inconsistency based on some of the comments I've seen in Php forums. One wouldn't ever have this problem in NTL's.

Foot Notes

[1] TL's may vary widely because sometimes tags are used, sometimes parsing, and sometimes both. This makes it a bit difficult to classify them and creates subtle differences between TL languages. However, they almost always "fail" at least one test that takes them out of the NTL category. So far, though, I haven't found a TL that fails only one test.

I would like to know what these tests are.

[2] Storing as strings is not necessarily the same as processing as strings. Typically such are converted into binary during math operations and then converted back to decimal strings, if needed. However, this process is generally hidden and there's no direct way to really tell how things are internally stored. It may be possible that loss of precision converting back and forth between binary and decimal can provide hints of what's actually going on under the hood. The fastidious among you will probably balk at this, saying it makes the model imperfect, at least when floating point is involved. That may be so, but it may not be the best model if you are using floating point at the edge anyhow. It's roughly comparable to using Newtonian physics even though Einsteinian physics is potentially more precise, but at the cost of model simplicity. (You probably don't want to use a dynamic language if your app is floating-point intensive anyhow: it's the wrong tool for the job because of both efficiency and potentially extra conversions happening under the hood.)

This place is a scary place for dynamic language users, as I was. Anyway, FWIW, the difference between a dynamic language and a typed language is - a typed language decides on its behaviour depending on the type of data being processed, and a dynamic language determines its behaviour depending on the process being processed. The overloaded "+" is a great example of that. -- PeterLynch

Okay, but that depends on how "type" is defined, which has been a sticking point between both parties for years. As far as "scary", in my opinion, tag-free languages are usually easier to reason about because one can use a simpler model: IT HAS LESS PARTS. Overloading "+" is a faddish mistake that could potentially be fixed, reducing the confusion even more. Thus, dynamic languages can be quite nice; it's just that fads and tags screwed up the current batch.

Why Care

Regarding Observation 2 - I feel like writing "Who cares?", but that sounds a bit rude. But that is what I feel. What does it matter that the type of the data is not held explicitly separately? Excluding the overloaded "+" of Observation 3, in a NTL, the function determines the type. If you have really made an error and you are trying to add the string 123 to the string AB345, the function will fail. Unless you are using Intersystems Cache, which will happily strip the alpha characters from the second operand.

I agree that if one practices "defensive programming", they usually won't know the difference. But the model is explicitly made to address those odd situations where it does, such as Ob. 3. If one is not interested in the subtle differences, then they are not going to care about either type model anyhow. The topic is for those who care about dynamic typing subtleties. There's also screwy crap like the triple-equals used with strpos() in Php where the tag makes a big difference. True, Php has some bad design ideas, but future language makers can learn from the tag lessons. Hopefully future language makers will skip tags and avoid risky non-WYSIWYG behavior. That's also of the lessons I wish to convey: we don't need stinkin' tags in dynamics languages.

Dynamic subtleties. This is the only little subtlety that has hit me ever in a dynamic language. A part number of form NNANN. One of the parts was 98E01. When the dynamic language was sorting this, it interpreted it as the floating point representation of the integer 980 and shoved it into an inappropriate spot. Of course there was a workaround.

If for whatever reason dynamic typing systems don't concern you, feel free to skip such topics. Geeks tend to obsess on pet minutia for good or bad.

LOL? I guess I should explain the laughter. I have been a fan of dynamic languages since the 70's. I have written more business and software development systems than most. I was just pointing out that the only gotcha I have ever had from a dynamic language and which is a result of the very nature of a dynamic language is the one above. And there was, as always, a workaround.

I've encountered a handful; it's what got me curious about "tags". Maybe my coding style takes me into the differences more often than yours.

Well truth is I have no idea why anybody is talking about tags. I think there was a semantic misunderstanding a while ago and they are still arguing about their own perspectives. Unfortunately one side is just pontificating, while the other is futilely responding. But this page is not about tags only - it is touching on interesting dynamic language attributes. What got me interested was the implication that in a dynamic language the lack of a hard-coded data type is a problem (Observation 3). I guess if you really missed defining types then you could call a C program.

Incidentally, if you use ColdFusion's built-in map and array sorting gizmos, there's an attribute to indicate whether to sort as text, as numbers, or date/time, with the default being text IIRC. I don't know what happens if you have text cells in a number-requested sort. One speculative design option is to trigger an error, or just to stick all text at the bottom.

Another "reason to care" is honing one's technical writing skills, especially if that's your specialty. Sometimes we believe our internal notions and feelings are accurately captured in our writing, but may not otherwise be. When we use words like "associated with", we have to make sure it's sufficiently clear how this association takes place, how to measure/examine it, how to model it clearly, all relevant side-effects of the association, when the "association" ends, etc. Otherwise, it may confuse the reader. -t

AugustThirteen