Types And Side Flags Discussion

Continuation from TypesAreSideFlags (to be renamed TypesAndSideFlags eventually.)

Issues raised include:

(Please put comments about the list below in the discussion. It's meant to serve as a table of contents, not a discussion in itself.)


Originally extracted from SantaEverywhereFallacy (how'd it get that far off topic?):

Any definition of "types" is useless to most practitioners, including the "TypesAreSideFlags" definition you propose. The only practitioners that even matter are those to whom a definition of types would matter - a population that, I suspect, is closely correlated with those who develop languages and wish to discuss or research types and type systems in order to make intelligent decisions. So, no, I don't care about the 'problem', and I don't even see how it is a 'problem' or why you "have to" care about it.

This is incorrect. Practitioners care when it affects how the program works. Flag-based dynamic languages act different than flag-free dynamic languages. For example, ColdFusion and PHP act very different in this regard. How are you going to explain those differences without reference to the flag model?

[Trivially. Reference is made to specific behaviour without the need for, or any benefit from, a (questionable) pseudo-model.]

This is a very curious response. Many people work better if they have a mental model for why things do what they do (even if it's a UsefulLie). That is often better factoring than describing each and every case as if they are all independent little cases. It becomes like English spelling instead of a purely phonetic language. It's a less compact form of information: a big bag of arbitrary cases. The total errors caused by forgetting the myriad micro-rules may be larger than the total errors caused by the larger-scale model being imperfect. --top

[I am referring to your ill-defined "TypesAreSideFlags" pseudo-model in this particular case, not models in general.]

Same thing. If the side-flag model enables a developer to understand, or at least predict, language behavior better or equal to memorizing behaviors as mostly independent special little cases, then why is it a problem? Some people have an almost photographic memory for tons of small menutia, but some don't and prefer simpler rules if they exist. --top

Seems like a big "if". I imagine you could demonstrate that this side-flag pseudo-model of type-implementations could be applied to a few languages (at which point it makes which predictions, exactly?) - and perhaps those are the languages you care about - but I currently have no reason to believe this model won't fall apart when dealing with MetaObjectProtocols, TypefulProgramming, TypeInference, type-driven coercion and static dispatch, sub-structural typing (uniqueness types, region inference), dependent types, etc.

[As stated elsewhere on this page, the "side-flag model" isn't a model. It's at best a statement that some language implementations associate type tags with run-time values, and at worst merely a terminological equivalence of "type" and "side flag".]

I actually don't know how the mentioned languages implement types under the hood. They merely act like they have type flags or like they lack them.

[ There is no substance, mechanism, algorithm, or other content within the specified "model" that allows us to make predictions about behaviour.]

There are examples under TypelessVsDynamic, particularly near the bottom under heading title "Testing For Flags".

[To be (closer to) complete, it needs to provide (at least) a series of steps (or some equivalent structure) that allows us to consistently and unambiguously determine an output (some specification of behaviour) for a given set of inputs (a snippet of code, presumably). In the discussion above, it's apparent that the steps exist in your head -- as you've clearly, for example, used them to distinguish ColdFusion behaviour from PHP behaviour -- but they haven't appeared on this page because I can't use "type = side flag" (which is essentially all we've got from you) to distinguish ColdFusion behaviour from PHP behaviour without assuming (i.e., guessing) what you mean by "type = side flag".]

Again, see TypelessVsDynamic for examples.

Try applying your tests to Haskell and ML.

Someday maybe. Note that I never claimed the model is universal. If it doesn't apply to some languages or parts of languages, that does not by itself make it useless or worse than the alternative. It's a matter of optimizing MentalIndexability of how we remember language behavior. In practice people tend to use a combination of big-picture models in conjunction with exceptions to the big-picture models (DeltaIsolation). If the big-picture model grows too many exceptions, then perhaps its time to toss it and just learn the small-scale rules. It's a weighting decision.

I should also note that the flag model seems to work best for interpreted dynamic languages, not compiled ones. The flag model is a tool, and like any tool has places where it works well and doesn't work well. PickTheRightToolForTheJob.

--top

As of yet, your 'flag model' doesn't seem to make any useful predictions, and seems too difficult even for you to apply to a language that you don't already know intimately. It's somewhat pointless to have a model of language behavior that can only be applied to languages for which you already know the small-scale rules. I think it reasonable to filter the 'models' we bother learning to those that have proven themselves useful. What makes you believe this 'side-flag model' is the "right tool" for any job?

It does make predictions. The lack of a flag predicts that the value and only the value is what affects how operators react to it, and also predicts what operations the language is likely have or not have, such as a "typeName()" function. If you can't find a "typeName()"-like function, for example, then it increases the chances that the other flag-free patterns will be there. One can then verify that and have more confidence in understanding how to work with the language. As far as learning new languages, it helps one know what to keep an eye out for. And what is the alternative you propose? (for practitioners). --top

I suspect an understanding of tagged unions (a class of variants, also 'inductive' types) would be sufficient to replace your entire 'side flags' model. And it would be accurate to both TypeTheory terminology, terminology in common use by users of at least ML, Haskell, and C, and often associated with physical implementation details in the obvious manner (tag+union).

[It's not a "model" in the usual sense -- it's merely a partial statement of the fact that some dynamically-typed languages associate type tags with the internal representations of run-time values. Why redundantly use the term "type flag" to represent the mechanism commonly known in computer science as a "type tag"? By the way, your "model" doesn't make predictions. The predictions you're making are based on your own mental model of the type system under examination. There is nothing in the "flag model" that logically describes how the input of a given language statements can be translated into output consisting of specifications of behaviour.]

If it reflects the actual implementation, that's a bonus. As far as the term, I felt "tag" is too overloaded with markup languages. If it really gnaws at you, we can consider a renaming plan.

[Many words are overloaded. E.g., table, relation, set, etc... Using "flag" where "tag" is the familiar term really gnaws at me, in the same way that a database proponent using "relative" instead of "relation" would gnaw at me. That, and the inappropriate use of the word "model"...]

No term makes everybody happy. Bunches of managers here are arguing over PowerPoint titles right now as I write this (along the lines of "Our Great Successes" versus "Our Great Solutions".). And your analogy is weak; "relational" is common both in academia and practice. Only professional language designers would be aware of "tag", not language users, of which there are far more of. You may disagree with my choice, but it is based on a rational weighing.

[I didn't use the term "relational". Read what I wrote, please. My point is that there's no need to invent a new term for an existing mechanism when a perfectly suitable term already exists.]

I disagree with your logic complaint for reasons already given. We can logically compare what the flag model predicts versus what we actually see. If the predictions don't match either value (flag or no flag), then we can abandon or modify the model.

How exactly is "model" "inappropriate"?

[The three bullet points on TypelessVsDynamic merely specify how to (roughly, informally, and ambiguously, rather than rigorously) identify languages that might use type tags vs those that possibly do not. There is no specification of the logical connection between language characteristics and language behaviour. By way of analogy, a "model" that purports to describe how a car engine supplies power to the wheels (call it the "drive-pipe model") but only provides three bullet points that may, or may not, identify cars that use an engine, is not appropriately called a "model".]

Your analogy does not make sense to me. Are you trying to identify poles or engines? Note that the 3 bullet points are not meant to be exhaustive. More may be added later. Also note that it is not necessarily meant to describe how languages actually work under the hood. It is a prediction mechanism, not necessarily a model for "how". It is comparable to Newton's "laws" in that it makes sufficiently accurate predictions for practitioners to use, but is not necessarily the proper model under the hood (relativity is more accurate, but more cumbersome).

[Poles? Huh? I still don't see how your "model" provides predictive capability without total reliance on assumptions about what your "model" means. That does not a model make, because the "model" only exists in your mind, not on paper (or Wiki page).]

In general, I've found that languages that "act" like they have flags and act like they don't have flags are consistent. That is, the model is consistent for the languages I've used. None had a mix of some behaviors that looked tied to flags and others that didn't. They were all or nothing across the board (at least for scalars). True, its far from an exhaustive list, but a reasonable sample from a practitioner's perspective. If I did find an exception, that would not end the model's usefulness, but merely confirm there are exceptions. --top

[Fine, but your description of the "model" does not address the relationship between language features and language behaviour. That relationship exists in your mind, but appears not to have made it onto a wiki page. Yet. Elucidate this relationship, and you'll be on the way to providing a model.]

I don't know what problem you are imagining in your mind. I need more specifics. I cannot scratch your itch if I don't know where it is.

[A model should be an abstraction of the real objects it represents, in sufficient detail to be used as a substitute for the real objects it represents, in order to be used to answer questions about those real objects in an automated manner. This may be provided by axioms, theorems and proofs. This may be provided by diagrams. This may be provided by algorithms. This may be provided by physical structures (e.g., a model ship, rocket, car, motor or aeroplane) that can be empirically (or mathematically) examined. In an analogical sense, your "model" claims to be a model ship, but it's only a picture of a ship -- there isn't any structure to your model (i.e., physical manifestation, diagram, algorithms, etc.) that can be tested in an automated manner. Note, for example, that the RelationalModel provides structures in terms of set theory, boolean logic, relations, tuples, types, scalar values and an algebra that may be tested and evaluated in an automated, logical manner. Your "flag model" provides no such written equivalent, yet it clearly exists -- you're using the structures that exist in your own mind in order to use your "type = side flag" equivalence to translate language statements into descriptions of language behaviour. However, you can't "show your work" for these translations in terms of your model. With a complete model, you would be able to do so.]

Automated? There's not enough scenarios to bother automating so far. How would automation to measure "relational" look? Again, you are not clear on what you are envisioning. And, the model does not necessarily have to reflect the actual underlying mechanism involved. Regression is an example of a modeling technique that can be quite accurate (if no "hard" edges), yet not reflect the actual mechanism in any way.

Top... perhaps you should read what was written and avoid making such obvious mistakes in reading comprehension before putting the blame on others and calling their words 'unclear'. Anyhow: (1) Models should have sufficient details "to be used to answer questions in an automated manner". The ability "to be used in an automated manner" does not imply "must be automated". It only implies "could be automated". (2) RelationalModel is a model of data, not of "relational". (3) Models aren't properties, so the question on how "to measure 'relational'" is particularly senseless. (4) The explanation of modeling does not imply that models "necessarily have to reflect the actual underlying mechanism involved", so the related objections are moot.

I find it poor writing. I've seen good technical writing and I've seen poor. Maybe you academic professionals are used to poor writing. Anyhow, the best test of "automatable" is showing it automated. Show it doing whatever this thing you imagine doing on "relational". It's your (pl) example, not mine. In other words, lets see an example of something doing the something that you want done to something (such as relational), and I will then show you how to do a similar something with types flags. And I would characterize "relational" as a set of operations, not a "model", although they are perhaps interchangeable. Type flags are a "model" in the sense that one can draw cubby-holes on a chalkboard and show how values and flags get changed as each statement is executed. It's a "mechanical" model, if you will. --top

[By "automatable", though I do mean that a series of steps or transformations can conceivably be automated, there's rarely a reason to do so. What I mainly mean is that the mechanics of the model are sufficiently rigorous to be automated, and do not require assumptions. The usual mechanism for "automating" mathematical models (which your "flag model" is, whether you like it or not) is simply a human brain capable of reading statements of formal logic. I'm not sure what you mean by "relational", however, as it is an ambiguous term. Do you mean the RelationalModel in its entirety? Or do you mean the RelationalAlgebra component of the RelationalModel? I assume you're referring to the latter, as you're describing "relational" as a "set of operations", which appears to refer to the RelationalAlgebra. As for "type flags" being a model, as far as I can tell the whole of your model (thus far) is the following: "For every value V, there exists a related 'type flag' T." Okay, fine -- that's a good start. But what does that tell us? What are the characteristics of T that are of interest? Given some language statement S referring to some set of values, and some language behaviour B, how do S, B, and some set of (V, T) pairs relate to each other? And so on. These are the components of your model that appear to be missing on paper, but clearly exist in your mind.]

Note that the relational example as an analogy to flags was *not* something that originated from me. Thus, I cannot answer questions about what was meant by it.

[Using the relational model, we can determine interesting results. For example, it can be shown that if R is a given restriction (i.e., some WHERE clause) and T represents either a union, intersect, or minus operation, and p and q are relations, then R(T(p, q)) = T(R(p), R(q)). This is a very useful result for implementing query optimisation, because it says that given a SQL-ish query like (SELECT * FROM p UNION SELECT * FROM q) WHERE x = 1, we can internally perform the WHERE on p and q individually before performing the UNION, with a potentially huge performance increase over the original. The proof of this is left as an exercise for the reader, but suffice to say both its determination and application are automatable, given the elements of the RelationalModel: set theory, boolean logic, scalar values, tuples, types, relations, and relational algebra. Can you use your "flag model" to find similar interesting results?]

You guys are trying to over-complicate this. I am not trying to do "interesting math" with the flag model. It can be used to explain the behavior of common dynamic languages in a mechanical kind of sense.

It is for practitioners, not math gurus. If it can be repackaged for math gurus, I don't know. It's not one of my goals.

[With my practitioner hat on, looking at "values have (or do not have) type flags", I can't see how that explains language behaviour. Obviously, it does in your mind, but without some clear written elements to your "model", I'm afraid it remains locked in your mind and inaccessible to the rest of us.]


PageAnchor: TypesAndSideFlagsDescription

The following is a pedagogical tool which may assist students of certain programming languages in understanding some aspects of language behaviour, and is also an actual run-time or compile-time mechanism used by some language implementations:

Given a value or variable, its type can be indicated by an associated tag (aka "side flag"). In other words, every value or variable is a dual-cell node where one side of the node is a value or value container (aka variable or slot) and the other side is a single type indicator (tag) or a collection of type indicators. The tag(s) identify the value's type, and are used to decide how the value should be used.

Note that: This description does not dictate how the type tags are actually used, only that they exist and make a difference in implementation behaviour. That is, if the values of the type tags are changed, the behaviour can potentially change. For this to be useful in terms of understanding language behaviour, we need to know specifically how the tags are used in a given language implementation.

For example, in a hypothetical language implementation where type tags apply (as described above), a value represented internally as the ASCII string "1542" will be treated as a character string if its tag is STRING, treated as an integer if its tag is INTEGER, and treated as a floating point value if its tag is FLOAT.

Therefore, given the following variables:

 a = 1542 (where a's tag is STRING)   
 b = 1542 (where b's tag is INTEGER)  
 c = 1542 (where c's tag is FLOAT)    
 // comments about the example moved to its own section below.
The following behaviour may be seen:

 WRITELN a + a;
  15421542
 WRITELN b + b;
  3084
 WRITELN c + c;
  3084.0
In this example, the '+' operator uses the type tags associated with its operands to determine its behaviour.

Note that the above does not and cannot predict the result of, for example, WRITELN a + b.

--AnonymousDonor

That is not necessarily true.

It is true. What is written above (hence the phrase "note that the above...") does not and cannot predict the result of WRITELN a + b. What is written below, however, apparently can. It must be noted, however, that what is written below is not above that which is written above, so the statement above is necessarily true. REFACTOR THE ABOVE IF YOU DON'T LIKE IT, YOU ARGUMENTATIVE BASTARD, AND STOP WASTING OUR TIME WITH CRAP!

Some languages may have rules of type precedence. For example, the "+" operator may have a rule that if at least one operand is numeric (such as int or float), then it attempts to execute math addition, parsing any string operand as numeric. Another language may have "+" assume the type of the first operator. Thus, "print(a + b)" would assume type STRING, because that is a's type tag.

Languages that lack type tags (or downplay tags or act like they don't exist) usually do not have ambiguous (overloaded) operators. "+" would always be math addition, and another operator, such as "&" or "." is string concatenation, for example.

--top


Making Predictions

It has been claimed that the flag model "makes no predictions". I counter this by saying I've never found a language that "violates" the 4 "rules" listed in TypelessVsDynamic, under heading title "Testing For Flags". If it passes one, it always passes all four. This is an impirical observation, not a mathematical/logic one. If we find exceptions, we can either list the exceptions, or revise the model. -t

A set of empirical observations may be used to validate a component of a model, but what is lacking in your "model" is that component -- an equation (or set of equations) that represents the relationship between language characteristics and language behaviour -- which could be validated. The equation obviously exists in your mind, but has not been written down. The 4 "rules" you've listed appear to ambiguously detect type tags, and thereby serve as a partial test for whether or not the "model" applies, but they are not components of a model itself. It's like defining "if stuff falls, you've got gravity" as a test for the presence of gravity, and then claiming that "stuff falls" is a model of gravitational forces. In short, at best you've established a correlation between your observations of certain languages and your tests to see whether certain languages fit your observations of certain languages!

I've never seen that done for any language. For example, could you write a mathematical equation or algorithm to detect whether a query languages is RelationalAlgebra? (IIRC, nobody could build an automated type-presence detector either, because the definition depended on "intent".) -t

That's an interesting question; off the top I'd say it's non-trivial but theoretically possible, but only given a large number of significant constraints on the inputs. Without constraints, we'd be forced to consider expressions in my newly-invented (as of this sentence) language Splorp, where queries consist of random line noise where ASCII characters are bound to RelationalAlgebra operators by a random number generator. However, I'm not sure it's entirely relevant here. It would not be a model per se, so much as an implementation of a language parser and a collection of tests. Models are, however, used throughout computer science. Of relevance here -- as an example of a useful ComputerScience model, and one that your question reminds me of -- is that for finite state automata. See http://en.wikipedia.org/wiki/Finite_state_machine, in particular the "Mathematical model" section.


Judging from the roundabout way that Top discusses it, I summarize his 'model' as follows: "Look, you stupid moron practitioners whom I believe incapable of understanding types! In some languages, you can view values or objects have two parts! When this happens, one part, which I call a 'flag' even though it is commonly known by you guys as 'tag', is used to interpret the other part and thus understand the whole value! Operations in the language do the same - they use both parts of the value to determine how to operate on it! That's all there is too it! Oh, and since I'm so arrogant, I'm calling this 'the side-flag model of type implementations!' even though the academics will crucify me for abusing both "type" and "model"! I'd prefer you call it a definition of 'types'! If we all work together we can change the accepted definition of 'types' and show those stubborn academics!" So... am I missing anything?

I've never called myself "clever". You are just envisioning that because of your bias against me. You view me through demon-colored glasses, and picture behaviors that never actually happened. --top

Ah, sorry. I'll correct it to "arrogant", which I know you've called yourself.

I did? Nevermind, don't want to take issue with that.


Comments from example above:

 a = 1542 (where a's tag is STRING)   // but why not just say a's TYPE is STRING, 
 b = 1542 (where b's tag is INTEGER)  // but why not just say b's TYPE is INTEGER, it is more clear!
 c = 1542 (where c's tag is FLOAT)    // ... TYPE is float! not tag!!
Saying "it is a" tells us little about what's going on under the hood.

Likewise, saying types are flags tells us little about... because "are a" and "is a" don't tell us much nor provide a model.

As a vague notion it might be "good enough" in some cases, but may not be sufficient to answer more complicated type issues. The stickier the issue, the more one needs a concrete "mechanical" model to both mentally model it and explain it or communicate issues to others.

What goes on under the hood in a mechanical car may be different between different brands of cars. PHP may use C structs (tags) at some point to store information about some value that was parsed, and it may or may not throw away this struct and start up a new struct later... I don't see why knowing this would help a person choose a good programming language. I'm not going to choose PHP because they did or did not use throw away structs (tags) at one point in time to track type info or to ensure that two numbers can be added together. What baffles me is that the "tag free camp" seems to think that choosing a language should be based on whether tags are used or not. A better programming language apparently (laughingly) is a "tag free" or "type free" language. This seems to be saying that "missing information" is better than having type information. The less information you have on types, the better. I don't get this, and it almost seems like it's some kind of troll joke.

"Under the hood" is perhaps misleading. What I mean is a model that "explains" what is going on with the programming language. In other words, "why is it behaving like it does"? Now, I don't expect it to mirror the ACTUAL implementation, for languages should not be defined by implementation, but by how they behave. The model can be a UsefulLie as long as it facilitates prediction and "understanding" in a planning and debugging sense.

And it's not just about choosing a language, but also about knowing how to work with it.

As far as the value of tag-free languages, I personally find them better than tag-based dynamic languages for many domains because you don't have to consider both parts (type tag and value), just one part of a variable. It's more WYSIWYG. I haven't seen the alleged benefits of tags explained with realistic scenarios. Show me the wonderful things tags help me do and you just may sell me on the idea if they outweigh the downsides.

{That doesn't even make sense. Could you please rephrase your comment using recognised ComputerScience terminology?}

What doesn't make sense? Please be more specific. And where is the official ComputerScience dictionary? And why is terminology from say Economics excluded?

{What is a "tag-free" language? I find no mention of "tag-free languages" in the Literature (peer-reviewed journals and popular texts), nor do I find any rigorous explanation of "tag-free languages" (versus, I assume, "tag languages") here. You may find it helpful to confine yourself to terminology from the ComputerScience dictionary at http://www.oxfordreference.com/pages/Subjects_and_Titles__2D_C01 }

I've explained in other topics. As a working definition, it's a language that behaves as if scalar variables are implemented as strings and only strings, with no detectable "type" tag or indicator (outside of the printable string). Operations parse the strings as needed into other "types", such as numbers.

{"It's a language that behaves as if scalar variables are implemented as strings and only strings" and "operations parse the strings as needed into other 'types', such as numbers" makes a certain amount of sense. The reference to "detectable 'type' tag or indicator" loses me. It would be far more helpful if you'd describe the behaviour you're seeing, rather than speculating on the nature of the implementation. It would be the same if you were trying to describe a problem with your car: Far better to say that there's a rattling noise under the hood when you accelerate, than claim "it behaves like the dangling rod has fallen into the swiveling wheel," knowing full well that "dangling rod" and "swiveling wheel" are your own personal descriptions.}

Sometimes using an analogy of another sound source is easier than trying to recreate it with your own lips. ItDepends. "It sounds like a cow-bell in a cement mixer" is likely more accurate than most of the population can simulate with their own mouth and more descriptive than "rattling sound". Your particular physical analogy example just happens to be a crappy choice, I have to say. There's good models and bad models. A "fake" model can still be a good model if it does what it intends to do. (Taken too literally, it might backfire, but that's life. Use the tool for what it's intended for only.)

"Example Debra" under JavaScriptSucks gives a situation where there's a "detectable type tag". A "print" statement (or conversion to string) is not enough to describe/predict the behavior of a variable. There is another "element" not revealed by a print statement (or string conversion) that affects behavior. If a variable (or model of a variable) contains only a string value and no "tag", then if two variables have the same printable/stringable value, then they always behave the same. This is not what Example Debra shows. It indicates a "hidden" element or attribute that is not directly observable (printable). In tag languages you'd usually have to print/display both the value and the type indicator to predict behavior accurately:

  printLine("value of a: " . a);
  printLine("type of a: " . typeName(a));
  // (period is assumed to be string concatenation)

Tag-free languages don't have the equivalent of a "typeName" function. They may have operations such as "isNumber", "isDate", etc., but these only tell you that a given variable is parse-able as a number, date, etc. But it never violates the print-same-behave-same rule highlighted above (per experiments or documentation).

If you have a better way to describe such behavior without assuming a 500-page thesis, I'm open to suggestions.

Note that under the hood, a "tag-free" language may still use tags for efficiency purposes. However, its behavior would be identical to a tag-free implementation of the same language under a different compiler/interpreter (but maybe slower). In other words, we could clone the c/i without using tags for implementation. The tag model primarily intends to model behavior of the c/i (language), not necessarily the implementation.


Expression Comparison

Here's another way to think about the usage of the side-tag model.

When programming professors are describing "how" computers evaluate expressions to newbies, they may offer a model that produces the correct answer, but probably doesn't actually reflect what the compiler/interpreter is doing under the hood. It's a model to facilitate "understanding" and prediction, not to reflect the actual guts.

The professor may start out, "scan the expression from the left to right until you encounter a closing parenthesis ")". Put your pencil tip on that parenthesis. Then move your pencil leftward until you encounter an opening parenthesis. The underline represents an expression that we'll call E. Evaluate the expression E based on the operator priorities listed on page 123 of your textbook. Then replace E, including parentheses, with the reduced expression, and repeat the above steps for the next set of parentheses...."

--top


Is it a model, merely an analogy, or both?

Actually it is just a silly metaphor, IMO.

{It's a tool. A useful tool. What category of tool it is, well that's a LaynesLaw mess. -t}


See also: ColdFusionLanguageTypeSystem, TagFreeTypingRoadMap


CategoryLanguageTyping, CategorySimplification JanuaryZeroNine


EditText of this page (last edited June 9, 2014) or FindPage with title or text search