Definition Of Type Tag

This is an attempt to define Top's "side tag" typing model, which is subject to many heated debates between Top and others on this wiki.

An informal definition of Top's "side tag" is a language system or engine that "act likes" it has a side tag to indicate types other than the value itself. "Acts like" is used to avoid prematurely tying it to actual implementation. Further, it should be considered a "working definition" for the scope of only this wiki rather than universal, although universal attempts are welcome here. Just please state this assumption.

So far there are 3 candidate definitions, and there is discussion about whether "tags" should be considered a model for prediction such that having a definition is of secondary importance.


Attempt #1:

  A language uses the "type tag model" if it's possible to get different
  results for two different variables even if their string representation
  is identical. Scope differences excluded (local versus public, etc.)

Well, now we have a definition of "a language uses the 'type tag model'". What about "type tag"?

{Furthermore, what does "it's possible to get different results for two different variables ..." mean? When and where do these "different results" manifest themselves, and in what manner? What do you mean by "results"? What do you intend by the sentence fragment, "Scope differences excluded ..."?}

Would you suggest a topic name change to DefinitionOfTypeTagModel?? I'm not sure what your point is.

PageAnchor: test_53

As far as "different results":

  print(toString(a) == toString(b));  // result: True
  print(myPureFunction(a) == myPureFunction(b)); // result: False

Alternative:

  if ((toString(a)==toString(b)) And Not (myPureFunction(a)==myPureFunction(b))) {
    print("Possible type tag detected.")
  }

Consequence of this definition: Whether a language uses the "type tag" model depends on its toString function. If a language has a toString method which completely represents its internal structure, then that language doesn't use the tag model.

You didn't do what you set out to do. You said you wanted to define "type tag". That's all well and good, but instead you define "a language uses the 'type tag model'".


Attempt #2:

  Given two variables, a and b. If it's possible to get different
  results using identical operations or transformations upon these two variables even if 
  their textual representation is identical, then the set of 
  properties for a and b that cause a and b to give different 
  results is called a "type tag". If it's not possible to get different results, 
  then a given variable has no type tag.
  Scope differences excluded (local versus public, etc.)

Questions:

Do you mean the textual representation of the

Textual representation of the "value" of the variable. I realize that some languages have a "dump" value that may be different than a "print" value. The "dump" value is generally used for debugging or serialization across servers etc., while the "print" value is designed more for end-users. People may have to first agree on which best represents the "value" of a variable that "prints" out in a production "output" sense. How to work that into the definition is still an open issue.

By identical operation do you mean

Same source text, such as same expression.

Just to exclude the obvious: Consider the identity function id:

        a = 1
        b = 2
        id(a) -> 1
        id(b) -> 2 
This returns different results and according to the definition I'm bound to call the difference a type-tag, even though it is just a different value.

Thus something about the variables must be equal, otherwise values would be type tags.

Please state the difference.

I don't understand what the above example is intending to show. What is "id", and why are the variables given different assignments? That would give them different "printable" values, and thus disqualify them. Is "id" similar to the RAM address of the variable? I suppose that's one of those weird exceptions, such as scope differences that may have to be excluded. It may be difficult to make a definition without a messy list of exceptions. But first let's work out the "gist".

Hm. If everything is the same (value, function, expression) there will never be a difference and hence no type tag. Something has to differ. I have to assume that you mean the 'printed' value at some intermediate place be the same like this:

  a = "2"
  b = 2
  toString(a) -> "2"
  toString(b) -> "2"
  toString(dup(a)) -> "22"
  toString(dup(b)) -> "4"
Thus it looks as if you define type tag relative to some toString method. So it appears you have to specify the toString method in more detail.

Please note that there are some languages where the default toString method renders a string "2" as ""2"" and a numeric 2 as "2", so theses differ in these languages and thus there'd be no type tag differing between "2" and 2.


Open Points:


{The gist appears to be "polymorphic, and has the same canonical string representation".}

That's an interesting way to put it. The "tag" would then be that property of the variable that allows it to polymorph. However, the definition of "polymorphic" can get messy also.


Re: "Don't you find it awkward to employ a definition that requires adherence to non-existent official standards..."

Hell, I doubt "variable" has an "official definition" (unambiguous). - t

There's no such thing as an "official" definition. However, the concept of "variable" is well-understood.

We've had a similar discussion in TopsTypeDeterminatorChallenge over "types" and "cats" (feline). The state of the art is still not beyond the "I know it when I see it" stage. Historical-pattern recognition is not sufficient for a definition. - t

Of course it is. What do you think definitions are?

Definitions are often influenced by historical patterns, but don't actually use historical patterns as PART of the definition itself. If I look up "bonnet" in the dictionary, it usually described the characteristics of a bonnet(s); it doesn't define it as "looking very similar to the following pictures....". It may have pictures as side info, but does not use the pictures as the definition itself.

Typically, a definition for a term <x> is simply an agreed set of terms that accurately reflects a common understanding of <x>. Rarely, formal or working definitions used in (say) academic papers may deliberately and explicitly deviate from common usage in order to add rigour within the narrowly-defined scope of a single paper, specialised field of study, or particular discussion.

"Common understanding" is not sufficient for settling controversy and differences in viewing/modeling things. "Good enough" for general or "most popular" notions, perhaps, but that obviously should not be considered the pinnacle of communication.

Indeed. That is why we create and use formal definitions.

Which so far seem poorly suited for software issues.

Why do you say that? Formal definitions are used extensively in SoftwareEngineering and ComputerScience. They are fundamental to programming language theory. They're also used extensively in mathematics, physics, chemistry, logic, and increasingly in philosophy and the humanities.

If they were truly formal and thorough, then a machine could be constructed that would be able to take a programming language grammar and compiler/interpreter as input, and unambiguously label the parts of source code. The problem is that you are too ready to accept "I know an X when I see an X".

Why do you think that's possible?

If it's unambiguous, it should be possible to produce an algorithm to perform the task instead of human.

Why do you think that's possible? There's a vast gulf between making a definition unambiguous and making artificial intelligence. Perhaps you're confusing formal definitions with automated theorem proving, and then confusing that with high-order pattern recognition and identification &/or formation of concepts?

What do you mean by "label the parts of source code"?

Similar to TopsTypeDeterminatorChallenge.

I'm afraid stating "similar to" doesn't reveal what parts you think are alike, or to what degree, and what parts you think are not.

The point is to be able to "apply" allegedly formal definitions such that given sufficient info about the language grammar and behavior, a machine (algorithm) could determine if a definition applies to parts of a given (and potentially new) programming language without human intervention. This would remove the "I just know it when I see it" technique that humans over-rely on to classify things and/or apply definitions. "I know it when I see it" is too close to ThenaMiracleOccurs.

Why do you think that's possible? There's a vast gulf between making a definition unambiguous and making artificial intelligence. Perhaps you're confusing formal definitions with automated theorem proving, and then confusing that with high-order pattern recognition and identification &/or formation of concepts?

{It's not going to be possible, any language with types and an eval equivalent will run into the HaltingProblem.}

I'm not talking about artificial intelligence. I'm talking about describing your mental steps clear enough to "automate" them. That's what us coders do: take "notions" and iteratively or fractally refine them into concrete steps. If you can't do that with something you are personally sure is "clear cut", then something is wrong. Sometimes things people think are clear cut are not really clear cut. And what's something that a computer would have a HaltingProblem with but not a human? For example, a human can read source and make an educated guess if a program will complete or not, but it's only a guess. A program could be made to guess also. - t

First you wrote, "if [formal definitions] were truly formal and thorough, then a machine could be constructed that would be able to take a programming language grammar and compiler/interpreter as input, and unambiguously label the parts of source code."

We pointed out that such a thing is, in general, not computable.

Now you write, "I'm talking about describing your mental steps clear enough to 'automate' them."

You're essentially asking for the same thing: an algorithm for mathematical intuition and understanding. That algorithm is unnecessary (assuming it's computable, which it isn't), because the problem here is one of communication, not automation. We can do mathematics, and we can achieve understanding of mathematics, and (most importantly) we can agree on our understanding of mathematics, without automating it. Furthermore, focusing on automation pointlessly deviates from a reasonable request on our part: that you provide a formal definition of "type tag". I doubt a formal definition of "type tag" would lead to automatic identification of "type tags", but it would certainly result in a more informed and fruitful discussion because we should finally get a clear idea of what you have in mind.

In general, a formal definition is a tool to facilitate human-to-human communication, not enable machine automation.

If it can do both, then it may be even better than one or the other. If it's "runnable" we can subject it to experiments and science.

That may be feasible, given current technology, for a limited set of definitions in pure mathematics. It's infeasibly intractable for everything else.


To the individual who attempted a refactoring and summarisation of the debate: I appreciate your efforts, but please, please, please strive for accuracy. Creating a third thread that differs from the first two will only further fragment what should be a trivially-resolvable debate. I have, however, removed all invective from the original threads.

{I accept that you want to preserve the ThreadMode. Possibly it is too early for DocumentMode. It tried to resolve this. -- .gz}

I think it is far too early for DocumentMode, and I'm still not convinced your summaries would do anything at this stage but fragment the debate. Sorry, but I've removed them. When this page has been quiet for a while, then it will be time for DocumentMode.

I'd suggest creating a parallel document of the improved version rather than rework the original until it's finished or settles down with regard to new material.


Top-- since you are the one who invented the term type tag, do you have any examples of languages that use the "type tag model" or dont use the "type tag model". Do you have any example programs that illustrate the model. I.e. a program that would behave differently if the language did not use the "type tag model".

A similar term has been used in the compiler building world, thus the source of the term may be more complex.

ColdFusionLanguageTypeSystem and TypelessVsDynamic discuss the issue. The tricky part is defining and/or characterizing it it in terms of externally-observable behavior versus as an implementation versus a model. The boundary between all of these can be blurry because a sufficiently-explicit model usually *is* an implementation. The models of the type tag I prefer resemble this:

Diagram var_01:

   Variable: [ [name] [tag] [value] ]  // tag-based model

Variable: [ [name] [value] ] // tag-free model
But turning it into explicit language/rules/definitions that people agree on, both practitioners and the academic type, can get sticky. The above resembles an implementation model, but we can't assume that one can open up the hood and see the implementation because the same language can be implemented many different ways. A language is not defined by the compiler/interpreter implementation techniques it uses. But we can present a model that resembles an implementation that predicts behavior (input-to-output), and this can be a "tag model".

I want to move past "I know X when I see X" with regard to both types and type-tags, but it's an elusive goal for everybody. We can make models that mirror our personal notions (or Cardelli's), but saying it's the "right" model is another level. --top

If the goal is a model rather than a description of implementation, wouldn't the following be sufficient?

   Variable: [ [name] [type] [value] ]       // variables are typed

Variable: [ [name] [value] ] // variables are not typed
This permits exactly the same explanations of behaviour as "tag-based model" vs "tag-free model", but without the added complication of having to define "tag".

From a model point of view (as opposed to implementation), it appears that "tag" is an unnecessary synonym for "type".

I can only answer that if given a reasonably-clear definition of "type". Sure, "tag" may not be well-defined either, but it doesn't have the baggage of an overloaded history. - t

Perhaps a "tag" can be seen as merely a second value. The built-in operations of the language tend to favor the transparency of one over the other, but that is a "soft" rule. I've shown in TopsTypeDeterminatorChallenge that they are potentially interchangeable: we can use the "type tag" to "hold" information we normally associate with a "value", for example. We could call it the "dual value model" and it would have pretty much the same properties as the other candidates. - t

If "tag" doesn't mean "type", then I assume a variable can have a tag of, say, "3" or "Dave"? Obviously, that is ludicrous. "Tag" only makes sense as a synonym for "type", thus it is redundant.

What is "done with" the tag is up to the language. I observe patterns in their typical actual usage, but that's not rigorous. One can use a wrench to drive in nails, and it works. It's just not as "effective".

Sorry, not following you. Do you know of any programming language where a "type tag" is anything but a type?

How does one know that? How is "is a type" being measured?

How is "is a tag" being measured? "Type" is a well-understood concept in ComputerScience and SoftwareEngineering. "Type tag" (outside of a some references to compiler/interpreter implementation) is not.

If it was "well understood", then it wouldn't fail TopsTypeDeterminatorChallenge. You can't automate it because you mistake your feelings for external universal truth.

Once again, you are conflating "understanding" with "computability" (they are unrelated!), and mistakenly believing your personal lack of understanding about types represents some general (and non-existent!) lack of understanding about types among the ComputerScience and SoftwareEngineering communities.

I'm not convinced there's a hard distinction. See ChineseRoomArgument. Further, "understanding" may be inadvertently defined in your head as "uses my favorite model also". Further, TopsTypeDeterminatorChallenge doesn't ask the machine to "understand" something, only label "types" properly and expose the rules/algorithm for all to see. "Understanding" is generally the ability to apply a model to some external activity or object. But that doesn't necessarily mean the model is the "proper" model, unless prediction against some external event is taking place, but you haven't defined what that external thing is. Testing that my head model matches your head model is of little practical use unless it can be codified and externally examined. Arguing about what can't be measured is often futile.

When you said that the tag "resembles a type", your head did some kind of computation to come to that conclusion. Why can't you codify the rules/formulas/algorithms your head used to make that determination when you made that statement? "It feels like a type" is not very useful here. If you don't know why your head did what it did, then we are going to have a very difficult time communicating. - t

I didn't write 'tag "resembles a type"'. Please read what I wrote. I wrote that '"tag" only makes sense as a synonym for "type", thus it is redundant.' In other words, your model makes "tag" into an alternative term for "type", but "type" is well-known whereas your use of "tag" is not. Therefore, your use of the term "tag" in place of "type" is unnecessary. If you can rigorously show that "tag" is different from "type" but explains all the same behaviours, then you may be onto something.

Yes, "type" is well-known, but also overloaded. Commonality is hardly the only concern. Again, I don't wish to bring something with historical baggage and other distracting complexities into the model. I can define "tag" as a "secondary value" and it's simple and fits into the model. When you tie a simple definition to a complex definition, it's no longer simple because it inherits the complexity of the added complex part. I don't want to foul up the model with drama-laden parts.

What do you mean by "type is overloaded"? What's a "secondary value"? What do you mean by "drama-laden parts"? What drama?

Unless you intend "tag" to mean something other than "type" -- and I'd like to see an example of a language where that's the case -- surely you'll inevitably be talking about types anyway, because that's what tags refer to, no?

As far as I can tell, you are using the term "tag" in precisely the places where a programmer, software engineer, or computer scientist would normally use the term "type" in a familiar, recognisable, and generally unambiguous way. Even beginners to programming quickly understand types via simple examples, like "integer", "string" and "float". What would these be in your model? Integer tags and string tags? How does that make them easier to understand?

I don't see how using the term "tag" makes understanding types the relationship to language behaviour any clearer. I do see how using the term "tag" makes it less clear: it (at least) complicates a simple notion by adding an unrecognised term and an apparent additional level of indirection in which 'value --> type' becomes 'value --> tag --> type'.

Again, familiarity is not the same as thoroughness. I am not sure what you mean by additional level of indirection. Where are you getting your two example mappings from?

By additional level of indirection, I mean this: You say a value has a tag, but apparently a tag always refers to a type. So: "Value has a tag" can also be written as "value --> tag". Therefore, "value has a tag, and tag always refers to a type" can also be written as "value --> tag --> type". But what does "value --> tag --> type" mean? Apparently, it means "value --> type". Hence, there is an additional level of indirection between "value" and "type" in stating "value --> tag --> type", because "value --> type" means exactly the same thing.

In other words, "type tag" simply seems to be your personal terminology for "type reference".

Without a rigorous definition of "type", one cannot tell. I'll choose simpler atoms over complex and defective ones to build a definition/model around if given a choice. And I never said a value "has a" tag. I consider them distinct in the model. See Diagram var_01. Although, there are different ways to model constants. I choose to model them as anonymous variables for now.

But there seems to be an obvious relationship between "tag" and "type", and it's trivial to identify the type references and type definitions in any programming language, so surely you can define the relationship between "tag" and "type" without having to define "type", no?

If no value "has a" tag, how do you explain why the following fails in C#?

  int x;
  x = "123";


You say ColdFusion discusses the issue. I'm asking can you show two variables in cold fusion that fit one of your above definitions?

ColdFusion behaves as if it has no type-tag (at least for scalars). (An exception may be the new "null" support.) It will never trigger a result in PageAnchor test_53.

Is that written in a specific language or is it pseudocode. Are there any examples of values for a and b that trigger the type tag detected message?

If ColdFusion doesn't behave like it has type tags, is there any other language that does. i.e. Are there two variables and a series of operations that fit one of the above definitions. I've tried for about 90 minutes to do this with C, assembly, Common Lisp and python and have been unsuccessful.

I'm pretty sure I've encountered such in JavaScript [see below] and Php, I just don't remember the code itself. There was a case where I had to use Php's triple equal sign (===) because a double equal didn't work. Below is an example from ASP.Net -t


Example "hungry":

    ' ASP.Net
    '--------------------------------
    Sub testType()
        Dim a As String
        Dim b As Integer
        '
        a = "123"
        b = 123
        '
        If a.toString = b.toString And Not myFunction(a) = myFunction(b) Then
            Response.Write("Possible type tag detected")
        Else
            Response.Write("Get me a sandwich, I'm hungry.")
        End If
        '
    End Sub
    '----------------------------------
    Function myFunction(byVal thing As Object) As Object
        Return (thing + thing)
    End Function

The result I get is "Possible type tag detected". For type "String", the plus sign appends, but adds arithmetically if it's type Integer. (I've put comment markers on blank lines because of a spacing bug on this wiki.)

Doing this also gives a result:

        Dim a As Object
        Dim b As Object
Thus, the quotes seem to be used to "set" the internal tag.

There are no overloaded operators in ColdFusion. There is also no equivalent of Php's "getType()" function. I believe one can trigger a result in Php by replacing the equivalent "myFunction" in Php with:

  a = "123";
  b = 123;  // no quotes
  ...
  function myFunction($thing) {    // Php
     return(gettype($thing));
  }
ColdFusion has functions such as "isNumber", but it appears to parse the value, not a type tag. It can't be used to write a "myFunction" that gives a different answer for a and b. Thus, it has no detectable "hidden tag". Placing quotes around the initialization doesn't change any known result.

You've shown that ColdFusionLanguage internally represents all values as strings. Its numeric operators must, upon invocation, internally coerce string values to numeric values. PHP associates types with values but not with variables. ASP.Net/VB associates types with values and with variables. I don't see anything here -- in terms of a model -- that is explained by "type tags" in general or a "hidden tag" in particular.

What do you mean by "associates types with variables"? My model can explain (mirror) these behaviors. You have not presented a strait-forward alternative that can also.

How does your model provide more information than that conveyed by "associates types with variables"? Are you sure you're not making statements based on your understanding of your model, rather than what you've explicitly stated about your model?

The problem with "types" is described further above. If I've left out some detail, please point it out and I will attempt to correct it. - t

You appear to believe there is some general problem using the term "type", that it is controversial or unclear. I submit that the only controversy or lack of clarity over "type" -- at least in the contexts where you wish to use the term "tag" -- is sustained by you, and you alone.

I'm not convinced that one individual's difficulty with the term "type", in certain contexts, is justification to promote "tag" in its place -- especially as it seems to be, upon examination, nothing more than an alternative term for "type reference".

I haven't seen a decent survey from you. Until you address why you fail TopsTypeDeterminatorChallenge (per above), I do not consider "type" a rigorous and useful term.

You've offered nothing to suggest why your personal difficulty with the definition of "type" should have any bearing on the world outside of you.

The concept is nebulous, or at least has fuzzy boundaries. That's not my fault; I'm just the messenger. "It's a set(s)" is far too open-ended to form a clear-cut model around because almost anything can be modeled as sets. I want a "mechanical" visual model that can be drawn and explained in a discrete way on a whiteboard with boxes and arrows and step numbers etc., not hand-wavy stuff about category theory or human "intent". Your approach is too abstract for regular-Joe programmers (even if it was rigorous, which I doubt because you fail TopsTypeDeterminatorChallenge and try to use non-determinable as an excuse even though your brain allegedly does it in non-infinite time, per above.)

(Or a virtual clerk in a room with pieces of paper representing bytes with labelled wooden bins in which the clerk follows explicit step-by-step instructions when "running" samples involving "type"-related stuff. Those 60's classroom films that attempted to explain "how computers work" with cartoons of munchkins moving around boxes inside a CPU sort of had the right idea.)


Example: JS_05 - JavaScript version of The Test

 testType();
 function testType() {
  var a = "123";
  var b = 123;
  if (String(a)==String(b) && ! (myFunction(a)==myFunction(b))) {
    document.write("Possible type tag detected.");
  } else {
    document.write("Get me a sandwich, I'm hungry.");
  }
 }
 function myFunction(thing) {  
  //alert("test 1:" + typeof(thing));
  return(typeof(thing));
 }
Result: Possible type tag detected.


Types are defined in a type system. Words are defined in a language. Top's insistence that a definition of type be "universal" to be rigorous is so irrational and unreasonable that it is NotEvenWrong.

This isn't saying anything useable or concrete.

Nothing about Top's theory of typetags is useable or concrete. Noting that types are defined by their typesystem IS usable, and concrete. It says: "go find yourself a typesystem before you try to define types, you wanker."

So if I define your digestive system as a "type system", then whatever comes out is "types"?

Sure, if you defined "types" to be "the output of the type system". Definitions are the meaning we give words. That is a simple concept. But you would need to be careful to avoid equivocation in arguments using the words "type system" with another meaning. And your choice to call the digestive system a "type system" would certainly be evidence of sophistry - use of words to confuse and mislead others - if not sheer idiocy.

So a "type system" is whatever the hell a language maker wants it to be?

Yes. But TypeSafety is defined independently of any specific type system. So it's wise to design a "type system" that will be compatible with generic terms like "type safety" (and otherwise support useful comparison of their type system with those of similar languages).

See the "function" analogy above (PageAnchor function_01).

The idea that a definition of types should be "universal" is flawed. TopsTypeDeterminatorChallenge is invalid.

If that's the case, then I have good reason not to directly tie the tag model to it because I want to make it as universal as possible (at least for "common" languages). QED. Thank you for solidifying my argument against tying the model to "type".


It was a mistake for me to focus on "definition". I should have focused on defining a model in order to explain/predict certain behavior associated with "types" in common programming languages. -- top

That sounds reasonable. Life is short -- why waste it worrying about definitions for novices, when there are so many interesting, rewarding, and potentially important discoveries, inventions and innovations to be made in programming, SoftwareEngineering and ComputerScience?

Having and using visual models or visual metaphors has served me relatively well over the years in school and IT. It was always possible to "explain" stacks and B-trees using visual metaphors/cartoons/machines etc. for example. These are useful for communication, "explaining", and predicting the behavior of software. I often form such models in my head and then test them against reality. If they hold up, I keep them; if not, I tune or toss them. They may not even reflect the underlying actual mechanism, but if they predict it well anyhow, it may not matter. I've yet to see something useful for explaining how "types" work in typical programming languages and the kind of oddities and differences demonstrated on this topic other related topics, other than the tag/flag model. If you can show something better, be my guest... - t


Let me make an EverythingIsRelative statement here. Maybe there is a "proper master model" (PMM) of types that every developer "should" know and use. However, most developers probably don't use this PMM in practice (since its existence is sketchy and cryptic and/or vague). So what are they using instead? Probably some kind of UsefulLie model of their own making (likely a conglomeration of experience-based pattern recognition and "computed" models).

These individual self-rolled models may not be perfect, and this lack of perfection may cause bugs or confusion at times, but the problems are fixed when encountered and the programmer continues on their merry way. Thus, even if the type-tag model isn't "perfect", it can still be a UsefulLie just as good as or better than most other typical UsefulLie typing model floating around out there in existing programmer heads. If you are the Purity-or-Nothing type, this may bother the hell out of you; but so be it. Accept botherdom like a man. -- top

Other than obfuscating the simplicity and obviousness of the notion by adding a level of indirection -- you add "variable -> tag -> type" to what can simply be "variable -> type" -- your "model" is functionally indistinguishable from "a variable has / doesn't have a type".

No, because "type" can be a "conceptual type" without a computer-kept representation. Example:

  foo = 347;
The programmer (reader) may conceptually model this as "having" type "number". But in a tag-free system, there is NOTHING inside the computer that tracks number-ness. It's purely in the reader's head and has no machine-represented counterpart (other than being parse-able as a number). But in a tag-based system, there is a computer-side representation: the tag. The distinction may become important when foo is used along with other variables or operations such that a conversion, implicit or explicit, may be applied. It affects machine behavior (output).

It is usually a UsefulLie for programmers to "track" intended type-ness in their head when reading and writing programs, but it does not necessarily have to have a computer-model-side counterpart to get results that are reasonably close to what the programmer intended based on "running" an explicit type model in their head.

You appear to be saying, in your example, that "a variable doesn't have a type." The alternative -- which you describe as having a "tag" -- is that "a variable has a type."

In most programmers' minds, "can be usable as type X" is nearly synonymous with "has type X". Thus, one could say that foo "has the properties of a number type" or "satisfies the requirements for being a number". This is largely because the practical, observable behavior of tag and non-tag languages is identical in most cases. The observable difference is usually subtle and the situational "fixes" cover up the differences. Thus, your suggestion would be confusing to a good many programmers.

In other words, "has the properties of a number" and "is a number" is pretty much the same thing in most programmers' heads, and your approach doesn't help clean up the difference between the two. "Has the properties of X" can be seen as or mistaken for the same thing as "is X". In fact, tag-free languages have functions similar to "isNumber()" to test for number-ness. But there is no computer-side representation of "number". (It may pass other isX() tests also at same time, unlike most tag-based systems.)

(Earlier in my career I didn't really care or ponder much about the difference either; I just coded, got the program working, fixed any discovered bugs, and continued on. Over time I grew more curious and did more experiments to isolate the differences between the two. It's kind of like when you get a new car: you are happy just to figure out where the basics are so that you can drive to work on time. Over time you start to explore the radio short-cuts and "funny knobs" some more. I wanted a concrete head model that always predicted the output properly; and got the tag model as the product of this.)

Why do you think my "suggestion would be confusing to a good many programmers", given it's what the majority of programmers learn (very early!) in university?

No they don't. They usually encounter the concept of a type-tag in say a compiler class, but other than possible existence don't give it any further thought. Maybe it's covered in an "interpreter" chapter or class, but most seem to forgot about such if it is. A lot of students think, "I want to be an application programmer, not a SystemsSoftware programmer", and so will just get a passing grade in such a class. If you look at the grades, most students get a "C" if the school has balanced grading. My degree did NOT formally cover interpreters, although I didn't choose the SystemsSoftware sub-specialty. They had about 5 sub-specialties IIRC such that only about 20% of CS grads would have covered interpreters thoroughly.

And even if they did, it's still useful to make a distinction between "having a type" and "having a tag" because one may not know if one is talking about their head model or the interpreter's guts. Often when talking to other programmers or working out designs, one tends to think at a higher abstractions level than perhaps what the interpreter can provide such that it's best to be clear.

You appear to be conflating understanding of language behaviour -- which is what the majority of programmers learn (very early!) in university -- with language implementation. When learning about the former, budding developers typically learn that "a variable has a type" or "a variable does not have a type" when they encounter their first dynamically-typed language after having learned a statically-typed language, or vice versa. Language implementation is covered in compiler and interpreter construction courses, where implementation details -- like type tags -- are explored in technical detail without the need for trivial models, and typically when students already have a good grounding in types and type systems in various languages and paradigms.

Re: "You appear to be conflating understanding of language behaviour -- which is what the majority of programmers learn (very early!) in university -- with language implementation."

No, you are, and that's the problem. And sometimes it takes experience before one realizes the impact of such distinctions. Most students have a lot going on their head at the same time, trying to absorb so much in such short periods. Subtlety often escapes them.

What makes you think I'm conflating them? And what is the intended audience of your "tag model"?

Those who care about the subtly of dynamic types.

To the extent that there is "subtly [sic -- I assume you mean "subtlety"] of dynamic types", it is entirely (and quite trivially) explained by "a variable has a type" or "a variable does not have a type", along with a few simple examples. There's no need to complicate it by inserting a "tag" between "variable" and "type".

And again, what makes you think that most students had an interpreter course, let alone one that covers AND contrasts tag and non-tag implementations? I have no reason to think my university experience is unique.

Where did I say I thought most students had an interpreter course? I wrote that budding developers typically learn that "a variable has a type" or "a variable does not have a type" when they encounter their first dynamically-typed language after having learned a statically-typed language, or vice versa. They learn about type tags if they do a compiler/interpreter course; such students are well beyond needing simplistic models of, or analogies for, types.

I already explained why "has a type" is ambiguous. And I estimated above that the majority of students don't have enough exposure to interpreter implementation to readily consider the difference in the real world. My explanations look perfectly good to me, I don't know why you are rejecting them. If a sub-assumption is wrong, then please point it out specifically at the very spot/word/phrase of fault itself rather than a summarized rejection. I cannot process an overly-summarized rejection.

In the eleven uses of "ambiguous" on this page, none appear to explain why "has a type" is ambiguous. Students require no exposure to interpreter implementation; they need only see the difference in behaviour between one statically typed language like C# and one dynamically typed language like PHP or Python, and for that a notion of "has a type" in relation to variables is sufficient. Introducing "has a tag" as a substitute for "has a type" adds nothing but complexity and irrelevancy.

Nope, because tag-free dynamic languages like Perl and CF behave different than tag-based dynamic languages like Php and JavaScript. There are different kinds of dynamic typing. Haven't we been over this already? (I'm not sure I'd call C# "static". It's kind of a hybrid because it allows "object" types that can morph into or be treated as the other base types during run-time.)

You have no decent model that can explain the difference.

The different kinds of dynamic typing are trivially explained by values and variables having or not having types. Again, introducing "has a tag" as a substitute for "has a type" adds nothing but complexity and irrelevancy -- at least until the student is prepared to understand compiler/interpreter implementation, at which point analogies are unnecessary because students are ready to appreciate technical reality. (And, yes, C# supports 'object' as an alias for Object, which is the base type for all other types. It's still statically typed, and there is no "morph" (whatever that might be) involved.)

"Having" is confusing, per above. We are going around in circles again. It seems we are using different assumptions about the minds of programmers, and these assumptions are not readily empirically testable here, so we are stuck at an impasse. (I disagree with your characterization of C#, but will save that debate for another day.)

Beware of conflating your personal "having" difficulties with a general confusion. (C# supports class inheritance, and in languages with class inheritance it is typical to be able to assign values of type T' to variables of type T when T' is inherited from T, because T' is-a T. In this case, T is type Object and T' is every other type. The same is true of Java for all but the primitive built-in types. Both are statically-typed languages.)

Perhaps you are doing the opposite conflating. All we both have are anecdotes and counter-anecdotes. You or somebody similar have suggested elsewhere that you are not very interesting in the psychology of "ordinary" programmers because it hinders your goal of "learning from the best". Well, I am interested the psychology of "ordinary" programmers and do pay attention to it. Perhaps my observations have some flaws, but that's probably true of anybody who doesn't have a secret shortcut to observation.

What does "the opposite conflating" mean here?

Note that somebody may be able to go a good long time in CF and Perl without ever realizing they have no detectable type tag (depending on programming habits). (Note that in C# one does not have to use the static types.)

If you can go a long time in CF and Perl without realising they have "no detectable type tag" -- by which I presume you mean they're dynamically typed, variables do not have types and values are always represented as character strings -- then this is a bit of a non-issue, isn't it?

C# is considered statically typed because variable types are known at compile-time. If you define a variable of type Object it is known to be of type Object at compile-time. At run-time, it can only be assigned values of type Object or values inherited from Object -- which are, by definition, all of type Object -- but this is checked at compile-time.

If you only use "Object" types and every variable is the dynamic type then the fact it statistically checks that everything is dynamic is mostly a UselessTruth. It can behave as a dynamic language.

A variable of type Object is not a dynamic type. It's a static type Object.

A UselessTruth at best. It's the same fricken behavior as a dynamic language. CF and Perl could be said to have a static type: "flex-string".

No, a variable of type Object -- in a statically typed language -- is always of type Object. It does not change, and is known at compile-time. I don't know about CF, but Perl variables can have one of three possible types -- scalar, array and hash -- and these are only known at run-time and a variable can, as I recall, change type.

Type Object in C# could change/be all those also during run-time.

No, the value the variable contains could be any subclass of Object, but the type of the variable is strictly Object.

Sure, but it's as open-ended as dynamic language at point. Outside of word games, it's the same behavior.

That's hardly "open-ended", and it doesn't change the fact that C# is a statically typed language.

No, it's a language that has static types and has dynamic types: I.E. a hybrid.

That's not in accordance with the usual definitions. All subtypes T' of a given type T may be assigned to variables of type T, but the hierarchy and the variables' types are always static in C#. They are not dynamic; they cannot be changed at runtime. In C#, at runtime you cannot redefine a variable V to be of type T now and T' later. The declared type of V is checked at compile time and remains throughout runtime. It is, therefore, considered statically typed.

How does one empirically verify the "type" cannot be changed during run-time?

Write a C# program where you attempt to redefine the type of a variable. E.g.:

 int myvar;
 string myvar;
The compiler won't let you do it, so you obviously can't do it at run-time if it won't even compile.

That's because it's duplicate declarations; it's not a "type" problem.

There is no mechanism to change the type of a variable, so it can't occur at runtime.

 Object foo;  // declare foo as type Object
 foo = 7;
 foo = "blah";
Is this "changing type"?

No. It's assigning various values of type Object to foo. The variable foo is always type Object.

Not.

 Object foo;
 foo = 7;
 System.Console.WriteLine("T1: " + foo.GetType()); //T1: System.Int32
 foo = "blah";
 System.Console.WriteLine("T2: " + foo.GetType()); //T2: System.String
C-sharp has two kinds of types, compile-time types (seen with "typeof") and run-time types. We can model this with two tags: one tag that's locked at run-time and one that isn't.

No, you're simply printing the type of the value stored in foo, not the type of foo. The type of variable foo and the types of values 7 and "blah" are strictly static, known at compile time, and immutable. The only run-time change is the value stored in foo.

You are inventing vocabulary here or using self-rolled head models. When is a type stored "with" a value versus "with" a variable and how does one know the difference?

       System.Console.WriteLine("T1: System.Int32\nT2: System.String");
I'm simply reiterating familiar C# semantics. However, it's trivial to demonstrate:

Note that 'Object foo' declares a variable foo which may be assigned values of type Object. By the C# inheritance model, values of type Object include values of types inherited from Object. In general, a variable of type T may only be assigned values of type T or values of type T', where T' is a subclass of T. E.g.:

 // Base class T
 class T {
   ...
 };

// T_prime is a subclass of T. class T_prime : T { };

// foo is of type 'T' T foo;
For the sake of argument, let's assume an assignment to a variable V changes the type of V. E.g.:
 // Hypothetically, foo should be of type T_prime
 foo = new T_prime();
It should then still be true -- by virtue of well-understood C# semantics -- that a V of type T may only be assigned values of type T or values of type T', where T' is a subclass of T.

So, if we assigned V a value of type T' and it changed the type of V, we should not subsequently be able to re-assign V a value of type T, because we can only assign values to a variable that are of its class or its subclasses, and T is not a subclass of T'. In other words, the following should fail because T is not T_prime and T not a subtype of T_prime:

 foo = new T();
Of course, we are not so restricted; the above is permitted. We can assign V any value of class T or any subclass of T, regardless of prior assignments. Therefore, the type of V must consistently be T.

Couldn't one declare that all variables in a dynamically-typed language are subclasses of some flexible "god type", and therefore statically typed? That's essentially what C# can be, except that there are other optional types besides the God Type (Object) to choose from.

In a typical statically-typed language, the declared type of a variable remains unchanged -- and cannot be changed -- for the lifetime of the program. In a typical dynamically-typed language, the declared type of a variable may be changed at runtime.

No, the God Type in the dynamic language is the one and only type from birth to death. Anything else that resembles a type is merely something else or the type of the value, not the variable just like your rejection of the "getType" method above.

Sorry, you've lost me here. "God Type"??? "Anything else that resembles a type is merely something else ..."??? Wat?

Type Object in C# is essentially a God Type: it can be integer, string, Dictionary, etc.

You mean Object is a base class? I've never heard that described as a "God Type" before. In C#, integer, string, Dictionary, etc., are all subclasses of Object and can therefore be assigned to variables of type Object. An integer type is an Object type, but an Object type is not an integer type. You can assign an integer value to a variable of type Object, but you can't assign an Object value to a variable of type integer.

That's pretty much how tag-based dynamic languages work, regardless of what label you glue on it. You can call it subclassing or plagnorffing, but it's pretty much the same result.

C# is a statically-typed language, regardless.

If it can be made to resemble a dynamic language almost perfectly, then perhaps the existing classification needs a fix.

No. Given a variable V, in a dynamically-typed language, we can unconditionally assign values of any type at any time. In a statically-typed language like C#, we can only assign values to V that are of its declared type and its subtypes. For example, a typical dynamically-typed language allows:

 var v = "blah";
 v = 123;
C# does not allow the following, failing to compile on the second line:
 var v = "blah";
 v = 123;
This is because we can only assign values to v that are of its declared type and its subtypes. var v = "blah" implicitly declares that v is of type string. 123, an integer, is not a string or a subtype of string.

A more comparable example would be:

 Object v = "blah";
 v = 123;

Which is allowed.


Candidate #3

   A type tag is an observable trait of a variable or constant that cannot be 
   determined by examining the canonical string representation of the content
   of said variable or constant (usually via a Print- or Write-like statement).
Discussion

To use our specimen languages, there are no tags in Perl and CS (at least when used as scalars) because there are no (known) observable traits that cannot be determined by examining the output (string representation). However, in Php and JavaScript, there are such traits (as illustrated by prior examples). - t

That appears to be a definition by incomplete exclusion, which is a bit awkward -- like saying "a frog is a creature that can be identified by its legs but not by counting them." You've said a "type tag is an observable trait ... that cannot be determined by examining the canonical string representation". How, then, do you observe that trait? Are you saying that a "type tag" exists where two values have the same canonical string representation but behave differently? Isn't that more trivially stated that the variables (do you mean values? variables don't have string representations) or constants have different types but the same string representation? It appears, from the above, that Perl and CF represent scalar values as a string type whereas PHP and Javascript represent scalar values using various types.

I already gave coded examples of how such traits can be isolated. I have since added "content" to clarify the "variable" issue. I'm not going to use the word "type" because it's either ambiguous or not determinable by practical means (as our long debates over it have shown). Your use of "represent" is also questionable. Representation or its impact would have to be observable in some way to be objective or stable enough for a definition. I agree the exclusion approach is awkward, but tags can potentially store or do a lot of stuff.

But "type" is familiar, even to beginning programmers. "Tag" is not. How would you explain the relationship between "tag" and "type"?

But it's too ambiguous for reasons already described above. I'm frustrated that I have to keep pointing this out.

You keep saying it, but you don't provide an argument to defend it or evidence to support it, and it appears to be untrue because "type" is familiar even to beginning programmers and apparently well understood, given the number of programs successfully written that use it. Once again, how would you explain the relationship between "tag" and "type"?

I explained all that already. Jeeeez. I fucking give the hell up! If somebody else wants to try to explain in clear terms the position of either side, I welcome such attempts.

{I can give it a try. His position is "Your definition isn't clear. In particular, what do you mean by 'observable traits of a variable' and how does this relate to the (commonly used and well-understood) term 'type'." Your position appears to be, "I understand what I said." Hope that helps.}

I can't speak for Top, but that nicely summarises my side.

Test_53 is an example of "observing traits of variables". Printing "typeOf()" is another. We probe and test it like a scientist does to new species of animal. How does Species A act in box 7 compared to how Species B acts? And I've already explained difficulties with existing general notions of "types". I wish to avoid using "type" for now because language is poisoning this whole debate and I just wish to only describe observable traits for now on rather than "probe head notions" further.

{What's needed is a definition, not more examples.}

You cannot produce decent definition of "types" (usable by regular devs), so why would you expect it's reasonable to produce a decent definition of "tags"? A model may be a more obtainable goal.

{The definition of "type", as used by regular developers, has been given to you repeatedly. So why do you think we can't produce it? (And I thought you didn't want to talk about "types" right now.)}

I poked holes in it, but you pretend like the holes don't exist.

{In order to poke holes in it, you have to do something other than say it's vague. To date, that's all you've done.}

One cannot poke holes in a cloud.

{So you agree, you haven't poked holes in it. I'm glad that's cleared up.}


The Fligmook Experiment

I believe language is screwing things up here. How about this experiment: explain the differences in the two "kinds" of dynamic languages withOUT using existing language terms such as type, value, variable, etc. Call them Flig, Mook, Zog, etc. or whatever. Just make sure that whatever rules and terms your model uses are clear and self-standing. None of this, "programmers already know what X's are" stuff. After you successfully demonstrate your model can explain the differences on its own, THEN you can go back and assign Flig, Mook, etc. to common IT terms. Are you up to it? - t

In some dynamically-typed languages, every flook is of splork fleem. In other dynamically-typed languages, a flook can be of any splork. In the former, certain fizzles turn fleems into forgles or noofs as appropriate. In the latter, the appropriate fizzle is chosen based on the splork. Does that help?

How do we objectively measure/observe "is of"? (Don't forget the problem of "can be transformed to and/or viewed as" versus IS-A hierarchies. The first does not require any hierarchy.)

It's essentially synonymous with "associated with" or "has a property of".

"associated with" or "has a property of" is still vague. It only means that there is SOME relationship, but doesn't say if that relationship matters to anything we care about here (output). And how is "can be" verified? Under what conditions is be-ness true or false? "As appropriate" and "based on" needs metrics also.

"Associated with" or "has a property of" is as precise as the "has a" in "has a tag", and is a sufficient basis for describing program behaviour in terms of a variable or value having or being associated with a type.

No. For one, it fails the "parse issue", below.


  <!--- ColdFusion Example CF002 --->
  <cfset a=123>
  <cfset foo(a)>
  <cfset b="123">
  <cfset foo(b)>  
  ...
  <cffunction name="foo">
     <cfargument name="p" type="numeric">
 ...
  </cffunction>

Equivalent C-ish style would be: a=123; foo(a); b="123"; foo(b); ... function foo(number p) { ... }

Both function calls run successfully, meaning they both "pass" the type="numeric" parameter test ("abc" would produce an error). Remember, our working assumption is that CF uses parse-based verification at call time under the hood. You don't need to know CF to agree with that working assumption. If it's wrong in reality, it does not matter for the scope of this example. I'm dictating "how it works" for this example.

What is this intended to illustrate?

Let's try again:

"Tag-free" appears to be your label for some languages that have variables without type references and values that are always of type "string". Languages with "parse-based type checking" associate more specific (at least, more specific than "string") type references with values as needed. Where is the contradiction?

All your 4 steps appear to illustrate is that if "tag-free" is intended to mean "no type references anywhere", then perhaps "tag-free" isn't the most accurate name for languages with "parse-based type checking".

But you just said "as needed"? The contradiction is back. Your "as needed" contradicts "no type references ANYWHERE". Nowhere-ness and as-needed-ness cannot exist at the same time in a language, at least not without adding caveat rules to the model, complicating it.

By "as needed", I was referring to examples like <cfargument name="p" type="numeric"> from above, where "parse-based type checking" is -- according to you -- "needed". Hence, "as needed".

Please elaborate on this "associate...with". That's too nebulous. And no association is necessary for explaining/modeling program behavior, so why introduce such fuzzy ghost terms? Specifically the cfArgument process examines the parameter passing through and it either passes the examination or fails and the program stops. No creation of a reference is necessary; we don't have to glue anything extra on to the package; and it doesn't explain anything taking place. I will agree we could perhaps rework the model to include such "association", but it complicates the model. Lack of a tag is conceptually simpler.

"Associate with" is the intuitive and expected meaning. For example, in a language that specifies a declaration like "int x", we can say that "x is associated with 'int'" and vice versa. "Lack of a tag" is precisely the same as "not associated with a tag".

Note that tagged dynamic languages may also do parse-checking in some cases, but it's used less often because the tag can be examined first, often eliminating the need for parsing.


Maybe we should try models that are semi-machine language. Such can risk adding low-level detail that we may not need, but machine-language is something that most graduates should have experienced and can be defined in concrete ways where all the bits are X-ray-able and observable with clear ordering and rules. No more "notion processing". But hopefully there is a way to abstract away or postpone the portions that are not in dispute. -t

Undergrads are often taught the inner-workings of the machine via machine-language, typically in 1st or 2nd year. However, this illustrates actual mechanics -- I thought your goal was an abstraction or a model?

Yes, that was the goal, but it's not working for either side. Something more explicit may be needed. Abstraction is in the head, and everyone's head is different. We may have to go to a lower level of abstraction to have a prediction model that is clear to both sides. Keep in mind that a virtual machine is not necessarily an (intended) implementation, but rather a model with prediction capabilities.

That's not unreasonable. Some educational institutions favour giving students a heavy dose of computer architecture and assembly language, often in 1st year, for this reason.

Like I said somewhere, it depends. My university had about 5 "minors" (sub-specialties) of CS, and only one covered such heavily.

But even if this was the case, the actual tag existence or non-existence would probably be reflected. Dynamic langs like Perl and CF probably only represent scalars as strings without any extra tags for number versus string versus date, etc.; and likewise Php and JS probably have an explicit tag separate from the value. Thus, if type=tag as you assert, then parse-based testing/comparing is not "types". QED. (It would also make CF's use of it in cfArgument a misnomer. But I get away from that problem by using "tag" instead of "type". I've thus solved 2 problems. Now give me my fucking Nobel so that my ego gets even more obese, like the Gods of Logic intended.)


Note that the example above perhaps may run something like this under the hood:

  ...
  <!--- Example CF003 --->
  <cfArgment name="p" regexVerif="[-+]?([0-9]*\.[0-9]+|[0-9]+)" failMsg="Not a number">
Yet have the same result.

Yup, and it means exactly the same thing -- it's performing TypeChecking.

That's an awful wide definition of "type checking". Swiss Army Types? What regex's are "type checking" and which are not?

If you're performing validation to determine whether or not a value belongs to an identifiable set of values, then you're doing TypeChecking, and the identifiable set of values is a type. TypefulProgramming recognises that types are pervasive, and endeavours to make types explicit wherever possible.

What? So if we have range checking, then it is the "types" of all things in that range? That's ludicrous! Waaay too open-ended. Every WHERE clause is a type picker?

Recall that a popular definition of "type" is that a type defines of a set of values and a set of operations on those values. Does a WHERE clause define a set of values and a set of operations on those values?

Such string parsing is blind to operations down the road. It doesn't think/plan about the future and the "meaning" of numbers; it just follows orders. The programmer or language designer may have had intentions or future usage patterns in mind, but that's exploring human heads, not program results. I want to model program output, not YOUR output.

Even so, which set of all possible regex's qualify it as "type checking" and which don't? Do we have to know programmer intent to answer that? If so, are we back to the old WhatIsIntent fights?

As I pointed out above, experienced programmers can recognise when (say) validation of some string is actually a form of ad-hoc TypeChecking, but that's about a given program rather than the programming language. However, a language feature like <cfargument ...> with a keyword "type" can and probably should be included in your model, but it probably doesn't hurt (much) to treat it as spurious and exclude it, either.

If one is modeling behavior, then modeling a difference based around the existence of the the word "type" is against that goal. Behavior-wise, it's no different than any other regex expression or filter rules such that bifurcating a model to have split paths based on the existence of the word "type" is a violation of OnceAndOnlyOnce, since Path A and Path B could be explained the same but are not to cater to a word-centric or historical-habit-centric model.

Let's just agree that both approaches can "work" and having both model choices can help one view the thing from different angles, one behavior-centric, the other vocabulary-centric or history-centric. Let's agree to let both live.

I agree, as long as you're ok with the fact that I will oppose the "tag model" every time I see mention of it. I think it's confusing, redundant, and possibly misleading.

The feeling's mutual.

So, what does that mean, exactly? I prefer to emphasise understanding language behaviour in terms of language semantics and syntax and well-known elements like types, values and variables, so if the "feeling is mutual" does that mean you're going to oppose actual language semantics and syntax whenever you see them? Are you going to oppose explanations of language behaviour in terms of syntax and semantics? Or do you mean you think existing language semantics and syntax are confusing, redundant, and possibly misleading?

But your model is either more complicated because of your language-centric tilt, or if simplified outright doesn't work to explain differences in languages. One of main problems is that "types" has been overloaded with explicit typing and parse-based typing (examining the value only). The tag model cleans that up and models both in a clearly different way, not your hazy "associated with" vagueness. If one bifurcates these two concepts into distinct modelling actions/features, then it reduces confusion and highlights why the two language families act different. --top


Surely the "$" in the Perl variable is a type tag, while @ and % are other examples in the same language. The var named $fred may contain several types of value but they must all be all suitable for use in a $-variable. This is enforced by the language and is not just a convention.

Whether this is what Top meant or not I don't know but several popular languages have had these sorts of type tags, including the classic implementations of BASIC that many of us started out on (a$, b$, i%, j% etc.).

It's not what Top meant, though it might be the inspiration for his "tag model". Single character prefix or suffix "tags" are syntax for specifying a variable's type. BASIC's 'a$' is semantically equivalent to C#'s 'string a', but obviously syntactically different.


Top, would you consider C to be a language that uses tags?

During compilation, yes; during run-time, no.

I'm surprised. From the attempted definitions and examples you've given, I see nothing that would change between compile-time and run-time. I also would have expected the answer to be yes during run-time since the following program outputs "Same string", "Different foo".

 #include <stdio.h>
 #include <string.h>

static double foo(double x) { return x; }

int main() { double x = 1.0 / 3.0; double y = x + 0.0000001;

char xstr[100]; sprintf(xstr, "%f", x);

char ystr[100]; sprintf(ystr, "%f", y);

printf("%s\n", !strcmp(xstr, ystr) ? "Same string" : "Different string"); printf("%s\n", (foo(x) == foo(y) ? "Same foo" : "Different foo"));

return 0; }
Curious. If I'm following this correctly, then "%f" is rounding or truncating. In this case the "canonical string representation" (CSR) is not showing us the "full" value. I'd perhaps argue that's a flaw with the language's CSR. The CSR shouldn't chop off precision: default to sufficient precision to show all influence of the 8 bytes, but have formatting operations to round if desired. Maybe "%f" is not the CSR anyhow because we can use different formats in printf for any variable regardless of declared type. C may not have a CSR. (My C is rusty.) I wonder if any dynamic languages do this?

C doesn't really have a CSR. Only the arithmetic types, pointers, and arrays of characters have anything that's even a reasonable candidate. ("%f" or "%g" are the best candidates for both float and double. Both suffer from rounding.) But the question is now, how can we determine if a "Same string, different foo" is because of a "flaw in the CSR" or a "tag"?

Is there any direct way to extract the full value faithfully? (Indirectly we may be able to subtract the value from slightly rounded versions of itself to slice the parts.)

You can explicitly specify the precision. Regardless of which precision you choose, C allows for an implementation that exceeds it. C does give you the ability to look at any object as a sequence of bytes. I don't see how either helps in distinguishing between "flaws in the CSR" and "tags".

If C has no CSR, then we cannot apply candidate definition #3. But the existence of tags during compiling but none at run-time can help explain/model why we can apply different formats to different (compile) types willy nilly. Also note that I mostly limit using the tag model for dynamic languages.

My question still stands. How can we differentiate between "flaws in the CSR" and "tags"?

How does one find flaws in something that doesn't exist? I will agree that hypothetically, if a CSR doesn't show us the full "value", then def #3 would be picking up the hidden effects of this under the name "tag", which is probably not what we want in the def. There will probably always be odd caveats that break any def or "rule".

Forget C for the moment. Let's say we have a language that defines a toString method that works for every value, but that method truncates floating point values. That toString method would be the obvious choice for the CSR. How do we tell if that toString method is flawed as a CSR or if the language has tags? (BTW, there are plenty of rules and definitions that don't have any exceptions. As examples, the standard definition of integer, Von Neumann ordinals, modus ponens, etc.)

Because math doesn't have to worry about practical issues, unlike programming.

We could add 0.0000001 to one group of numbers, say monetary amounts, and not add to quantities to create a kind of "tag" that differentiates between both kinds of numbers in an app or shop, and build that into a library (including removal of the 0.0000001 end-marker for processing). Thus, it can act as a typical "type tag". Whether it's convenient or not is another matter. The difference comes down to intent and usage, and we've already done the "intent dance". What I really want is a model that explains behavior of the language, not a model of human heads.

Yes, we can take our own ad-hoc type system and stick it on top of the language's type system. I'm not interested in that at the moment. I want to know how to differentiate between CSR flaws and tags.


Ummm, it's called typedef. --MarkJanssen

A typedef is a C-specific alias for a type name or a means to associate a name with a type definition. How does it relate to the above?

Try actually making a compiler to transform language text into machine code, instead of debating abstractions, and then you'll know. Find more at ComputerScienceVersionTwo.

I have done, and I still do. I wrote my first compiler in 1982, and I've been writing them ever since. I've written a number of compilers, and I am the principle author of the RelProject which incorporates a compiler for a VM. What "more" do you expect me to find at ComputerScienceVersionTwo?

Again, how do C's typedefs relate to the above?


Rather than endlessly (and circularly -- it's getting lengthy and repetitive with no resolution in sight) debate the merits of the "tag model" vs understanding language behaviour in terms of language semantics and syntax and well-known elements like types, values and variables, I have created TypeSystemCategoriesInImperativeLanguages. Top, I encourage you to create a similar page for the "tag model", and then LetTheReaderDecide.


Stumbled across this interesting ComputerScience about tags and typing:

Quote: "a tag section that describes the type of the data: how it is to be interpreted, and, if it is a reference, the type of the object that it points to."


See also TypeRelationComparison, TagFreeTypingRoadMap, ExampleTwoRooms


CategoryTypingDebate


JuneTwelve AugustThirteen


EditText of this page (last edited December 25, 2013) or FindPage with title or text search