Type System Categories In Imperative Languages Discussion

From TypeSystemCategoriesInImperativeLanguages

This essentially resembles the "tag" model (with some unnecessary parts added), except the word "type" is used. I avoid the word "type" to avoid the kind of word confusion as found in Example CF002. We've been over that already. -t

Don't you mean the "tag model" resembles programming languages? There is no "model", above -- I've merely explained certain programming language semantics. Note that "type" is undefined in the above, but used in precisely the ways that any programmer must know in order to use a programming language. Hence, "type" is familiar and recognised whilst "tag" is not. Thus "tag" is unnecessary.

Well, I'm TRYING to turn it into a model to avoid getting tied up in psychology, or at least reduce dependence on psychology. "Semantics" is in the head. Again, I want to predict language output, not heads. My head is far better at processing images than language and I'm sure there are others. Visual thinkers have been some of the most productive in history such that it doesn't make sense to discriminate against them.

And you still haven't solved the linguistic confusion introduced in Example CF002. In your...notation?...the "type" stays "string" even though cfArgument makes it look like it's being filtered as or coerced into type "number".

You're conflating "programming language semantics" with "semantics" in general. There is nothing above that relies on psychology. The explicit "type" facilities in a programming language are inherently unambiguous, and completely independent of whatever philosophical debates there may be over what the word "type" means in English.

I don't know what "linguistic confusion" you're referring to. The above deals with the subject of Example CF002 without any confusion or contradiction.

You claimed CF002 "associated a type" with the variable p. That contradicts your model-like thingy above, where you said of such langs "Every value has the same type, typically a string of characters." This rule would forbid "associating a type" of "number".

Variable p is determined to be of (or not) the type that matches the type name given in the 'type=' attribute. The type reference isn't retained; it's only used to determine whether to throw an error or not.

Then you description above for category D2 is flawed. You should perhaps say, "Every value has the same type, typically a string of characters, except at cfArgument where variables can temporary be associated with another type besides "string", but only in ways the programmer cannot see and cannot use."

No, it's correct. I build interpreters and compilers; I know how it works.

The interpreter actually creates a temporary "type" marker in RAM?

I don't know what a "temporary 'type' marker in RAM" is, but it certainly has a reference to a type because it has to invoke an operator defined by the type named by the 'type=' attribute to determine that the sequence of characters in the argument represents a literal in the set of values defined by that type.

Like I said before, one could call some of the parts of parsing "types" (for reasons unknown to me), but it's a different mechanism than what is usually used for "types" in other languages. The model, and explanation, is simplified if we simply say that category D1 lang variables have absolutely no tags ever ever ever.

There are no tags in the above. D1 variables do not have types.

You also don't clearly answer what aspects of parsing and all potential validation are "types" and which are not.

There's no need to do so, other than what's sufficiently covered in the D2 explanation.

And you still haven't put this parse exception into your D2 explanation.

I thought it was well covered. See the bit about <cfargument ...>

I don't see a solution. Above you said D2 has only strings, but you now admit there's an exception for cfArgument (numbers CAN "exist", at least temporarily). Why are you not building that exception into your description of D2?

See where it says, "... operators perform parsing as needed to determine the value's type, i.e., whether it is a string representing an integer, number, date, etc." That's what it does, at least from the user-programmer's point of view. (I didn't mention the part about operators internally converting the string representation to values of other types in order to perform operations on numbers, dates, etc., because this is generally invisible to the user-programmer.)

So now you have multiple simultaneous types in your model-like thing: a "most explicit type" that is different from a "base[?] type". It's more complicated and confusing than the tag model. Occam will slap you.

Not at all. The language checks to see if the sequence of characters in a string represents a literal in the set of values defined by the named type. That's precisely what <cfargument ...> does.

And once again, there's no model here. I'm explaining what real languages do.

I know that, but your way of explaining it is round-about, and overloads "types".

How is "types" overloaded, and how is it "round-about" given that it's precisely what happens? How would you describe what happens?

I told you: it parses, and we don't have to call anything "types" (so that there's no confusion between explicit types in other langs and between validation etc.)

Yes, but it parses to check that the argument is a... what? Why are we doing this parsing? Why do we use <cfargument ...>? What is its purpose?

Either way, I am looking for a model that explains/predicts the differences in dynamic languages and I want that model to be as simple and/or as least as confusing and least as ambiguous as possible. If YOU don't want such, tough titties. Ignore it until you have a competing model.

I don't need a model. I'm explaining what actually happens, and that turns out to be sufficient to categorise languages and explain their behaviour without ambiguity and without imaginary constructs like "tags".

No, you are playing word games; calling something in the interpreter a "type" arbitrarily based on some vague head notion.

I'm not calling anything a "type" that the language designer, language implementer, and language user wouldn't also call a "type".

As already explained, historical habit sometimes interferes with clarity.

Be that as it may, in introducing "tags" and avoiding types you're attempting to defy common understanding of programming languages and ignore facilities which are embodied in their syntax. How will you explain imaginary "tags" in languages that clearly have explicit syntactic constructs for defining and referencing types?

I realize there is a trade-off, but by replacing a fuzzy notion with 2 clearer and simpler notions, a better model is produced. Or at least a model that may appeal to certain WetWare over another being that some models fit others better.

Analogies may be helpful in gaining an initial grasp. They should not be a permanent substitute for a genuine understanding of reality.

"Types" are not reality; they are only in the head.

Type definitions and references are not only in the head; they are explicit parts of programming languages that programmers cannot avoid. Programmers must understand them in order to be effective programmers.

That depends on the definition of "type"; but that's a calendar killing topic we've learned. The existing attempts at definitions are not quite good enough to tune one's understanding, making many programmers rely too much on trial-and-error. I want a better model.

What it appears you want is a better understanding. You believe a better model will help achieve it. It will only help achieve it if there is a clear and explicit relationship between every element of your model and every element of the reality it models. As it stands, there is apparently no clear and explicit relationship between the "tag" element of your model and the elements of the reality it models, i.e., the collection of statements in various languages and the types, values and variables that they manipulate. I say "apparently", because although I claim that "tag" is a synonym for "type" or "type reference" and therefore redundant, you claim there is a difference.

Again, there is no reality to types. Languages and programming are abstractions in the head. The closest thing to "reality" would be the machine language, and as already stated that doesn't fit your explanation either unless one is loose with language.

  a = "123";
  write(typeName(a)); result in TL: String
  write(isNumeric(a)); result: True
  write(typeName(a)); result in TL: String
.

You don't have to define anything, but it's going to be an awkward model to use if it has entirely undefined components. How will the newbie programmers -- whom it's designed to help, I presume -- react to the fact that a "tag" is, like, just this thing, like, that you can't see? But it must be there, because it effects things. And so on.

If there is "no reality to types", what does the "int" mean in "int x;" in C, C++, C# or Java? What's the difference between 123 and "zbq" in C#? Or between 123 and 123.0? What's the difference between 123 and "zbq" in Perl? Why does '123 + "zbq"' fail in some languages and succeed in others? Why does '123 + 123.0' fail in some languages and succeed in others? What does the TYPE keyword do in TutorialDee? In C#, with a method defined as 'myProc(int x)' and another as 'myProc(string x)', which one is invoked for myProc(2) and why? And so on. For something with no reality, it certainly seems to have an impact on the answers to these questions. To correctly answer these questions without undue circumlocution, any programmer will either have to mention "type" or be forced to use some PrivateLanguage.

JS and Php vars probably have an explicit tag byte or two while running in RAM, while CF and Perl don't (at least for scalar aspects). Parsing will not attach such tag bytes to variables in those languages. That can be objectively observed. (Granted, one could make an interpreter that temporarily added such tag to the variable to indicate the results of the parse, but the programmer cannot sample that tag such that it is not objectively observable to the programmer, and is thus swappable with other models that produce the same result.)

Now you're talking about implementation, rather than a model. If you're talking about implementation, you should talk about reality. In reality, your "explicit tag byte or two" might indeed be a value associated with a variable in some languages, but that's not how it works in Javascript and PHP. In PHP and Javascript, variables have no type references. Type references are associated with values. In some dynamically-typed languages, a value is a tuple consisting of a reference to a region of memory that represents the value and a reference to a region of memory that represents its type. The type reference could -- depending on the language -- be an integer 'n' that refers to the 'n'th type definition in an array, a string naming the type, or a pointer to code that represents the type definition. The value = {value_representation, type_reference} tuple could -- depending on the language -- be implemented as a contiguous region of memory, or the value_representation and type_reference could be found as the topmost items on two unrelated stacks, or it could be something else entirely. There are myriad ways to implement TypeSystems, and myriad ways to implement the same type system. Therefore, the only accurate thing you can say about (for example) PHP is that variables don't have types but values do, i.e., Variable -/-> Type, Value ---> Type.

I've explained why it's not synonymous multiple times, particularly your treatment of "parsed-out-types". I don't understand why it's not sinking in.

Parsing has a fundamental role in programming language TypeSystems. A literal value appears in source code, or user input at run-time, as a sequence of ASCII, EBCDIC, Unicode, or other characters. Therefore, the following process must occur in every language:

Given a literal value represented as a sequence of characters, for each type defined by the language, pass the sequence of characters to an operator whose purpose it is to return 'true' if the sequence of characters represents a literal of that type. For example, imagine a hypothetical language with three types: Integer, Float, and String. Assume each type defines its own boolean operator called isValidLiteral(c), where c is a sequence of characters. Given the literal character sequence "123", we invoke Integer's isValidLiteral(c). It will parse the sequence into '1', '2' and '3' and return true, because all three characters are digits and therefore it is an integer. However, if the sequence is "Dave", Integer's isValidLiteral(c) will stop parsing the sequence at 'D' and return false, because if the first character isn't a digit or '+' or '-', it clearly isn't an integer. So we move on to Float and its isValidLiteral(c) also returns false because "Dave" doesn't represent a floating-point value. Finally, the String type's isValidLiteral(c) will return true because the sequence is a valid String. (In fact, its isValidLiteral(c) is almost certainly defined as "return true", so we don't even need to invoke it when we get to String.)

A notable aspect is when this process occurs for literals found in the source code:

You'll note I've used the phrase "sequence(s) of characters" several times. There's a shorthand way of describing a "sequence of characters" -- we normally call it a string. Thus, where I wrote (above) that in Category D2 languages, "values ... are represented as sequences of characters", I could also write "values are represented as strings" or "values are always strings" or even "values are always of type 'string'". They all mean the same thing: Strings are used to represent values of every scalar type. So, you should now be able to see how there is no confusion or contradiction in D2 languages between a value being both a string and an integer, because the sequence of characters -- i.e., string -- is all digits and thus represents a literal integer.

A point of contention appears to be the role of operators like <cfargument type=...> in D2 languages like ColdFusion. It should now be clear that <cfargument type=...> simply allows the user-programmer to explicitly invoke the process described above. Whether the process is invoked explicitly by <cfargument type...>, or implicitly inside a numeric operator like '*' or '/', it is the same.

I believe you are making the mortal sin of mistaking your head notions for objectively reality. You keep bringing up your type "reality" as if we can go walk on top of it in the back yard.

Everything above is objective reality, as it describes what actually happens when real code is executed in real programming languages.

If we go with interpreter modelling, as I've pointed out repeatedly, the internal type tag is not equivalent in the interpreter RAM patterns to a temporary variable holding a parsing Pass or Fail result. (We could force them to be equivalent, but it makes unnecessary steps.)

I'm afraid I don't understand this. What is "interpreter modelling"? What is an "internal type tag"?

Abstract Interpreter Instead of Verbal

It seems interpreter modeling may be the better way to go. Verbal descriptions are just not working, at least they have been ambiguous and seemingly contradicting to me. Interpreter modeling hopefully will avoid the pitfalls of language-centric approaches. It doesn't have to be a real interpreter, only produce the right results (match language's actual output). Thus, we don't have to consider efficiency. We can also perhaps avoid bit-level details in some cases, but something tells me at least for the value portion, we may have to be explicit about that.

Again, what is "interpreter modeling", and how does it help? If I infer correctly, it sounds like you're intending to use source code to... Understand source code?

"Source code" is like C or Perl. An interpreter model would resemble the machine language (interpreter) that runs the source code. For example, we'd have an explicit representation of variables and their content. We don't have to use fuzzy notion-esque English because we are talking about bytes and following the system's examination and alteration of the bytes step by step like a CPU executing machine code. We may examine snippets for parameter passing and validation, for example. We may use some medium-level abstractions to avoid outright machine language for many parts to keep the example simple, unless those become contentious, in which case we use StepwiseRefinement, down the the bit level if necessary until every bit and every step of contention sections are fully defined to both party's satisfaction (at least in terms of clarity).

So, effectively you're saying we'd find out how a language is actually implemented (so that the emulator can "resemble the machine language ... that runs the source code") and then emulate it (to give "an explicit representation of variables and their content") and see what the emulator does? Couldn't we short-cut the process by (a) stopping after we understand how the language is implemented, since we apparently have to do that in order to build an accurate emulator, or (b) read the language's reference manual to find out how its TypeSystem works, or if we still have questions, email one of the developers? I think (b) might be the most productive for least effort. Both (a) and (b) seem more productive and less effort than your approach.

If you can find a good manual, be my guest. Usually they use the same kind of fuzzy notions you do, perhaps because they all reference/copy the same popular-but-fuzzy authors. The best manuals give examples, but examples are not a model.

Good manuals simply describe how their TypeChecking works, without artifice. If you find that confusing, either ask one of the language implementers or ask someone who knows the language better. I can't imagine that would be more difficult than (apparently) implementing emulators and whatnot. Anyway, how do you plan to implement a language emulator if you don't know how the language works without having a language emulator?

They use a combo of tags and parsing, but use long-winded round-about language to describe it. They must have gone to the same College of Verbal Bloat that you did.

In other words, they describe it in terms of types and values and variables, and don't mention "tags and parsing", do they?

Nope, they don't. Again, I value model simplicity over fitting historical conventions for the purposes of use already described.

Why do you think they don't?

They just paste in slight variations of the same convoluted crap they heard from their professors or from another book. Sometimes it takes assholes like me to wake people up from the status-quo nap.

Do you genuinely believe they "just paste ... crap they heard", or do you think it's possible that they understand language behaviour in terms of types, values and variables?

How well do you think your attempts to "wake people up from the status-quo nap" are working?

Do you think being an "asshole" is the best way to "wake people up"?

It has a hit-and-miss history. But "asshole" is relative. Those who are invested in the status-quo generally view attacks on their stability as hostile.


The following can often be intermingled with the concept of "type":

However, these three tend to act somewhat different as far as program or language behavior, and if we want to model/predict program behavior accurately and clearly, it's best to make a clear distinction between them.

You try to squeeze them all under the same umbrella: "types", and that creates confusion and unnecessary "parts" of explanations. It's best to separate them in the model rather than try to make them resemble variations on "types". -t

As shown in Language Category D2, they are trivially covered under the same umbrella, accurately, without confusion, and without any "model" needed. The actual language semantics are sufficient.

No, you added an unnecessary type hierarchy.

I've "added" nothing that the language designer, language implementer, and language user wouldn't recognise as already there.

It's not needed.

I'm explaining what the language does; if you feel there's an aspect that's not needed then you need to speak to the language designer. Who owns ColdFusion these days? You need to complain to them.

The tag model explains it just fine, and without the need for a "type hierarchy".

Actually, on DefinitionOfTypeTag there was some discussion over whether your "tag model" adequately addressed <cfargument ...> if you included it, but you could exclude it without much harm though your model would be incomplete. By the way, what is this "hierarchy" you refer to? There is no type hierarchy, and Language Category D2 doesn't imply one. In the <cfargument ...> section, it only says that the string type may represent sequences of characters that represent literals in actual types. That's not a hierarchy.

"the value's most specific type" implies a hierarchy, otherwise you wouldn't need the phrase "most specific".

I've taken out the phrase "most specific", as it is was not necessary.

Both your approach and my approach bring up "parsing" such that neither can claim they've folded away that issue into another existing part. You just convert it into a middle-man type that I simply don't need.

Parsing is a fundamental feature of most TypeSystems, in order to identify character sequences that represent literal values belonging to various types. In many languages, parsing also forms part of type-based operators that turn character sequences into alternative value representations in order to facilitate performing operations on those values. For example, in most languages a character sequence 1236 in the source code which represents the integer literal 1236, will at some point be parsed and converted to a 32-bit binary integer because it's more efficient to perform arithmetic operations on binary integer values than on strings of ASCII or Unicode characters. Said parsing is obviously related to types.

Yes, but its behavior is a bit different than tags (explicit typing). By calling one "tags" and the other "parsing", I avoid any cross-confusion. It doesn't matter if "parsing is obviously related to types", it's different enough that the model should clearly make it something different. Again, forecasting accuracy is valued above fitting historical vocabulary in the model. The model's purpose is to differentiate type-like behavior, not encourage umbrellatizing type-like things to mostly just satisfy historical vocabulary habits.

What is needed here is not a more complex model with imaginary elements, or some artificial distinction between parsed types and un-parsed types (I guess...???), but a greater attempt at understanding. The above is a clear and accurate description of actual implementations. For the sake of not complicating or obfuscating reality, it makes sense to endeavour to understand how it is.

Imaginary elements? Such as "associated with" ghosts? Again, there is no reality: ProgrammingIsInTheMind. Our head-model can be any fucking thing we like (or hate). I just want to find one that is not confusing and has unambiguous predictive capabilities, at least for certain aspects of languages.

Is your rudeness necessary? By "imaginary elements" I mean "tags", which are not real from a user-programmers point of view. Types, values, and variables have syntactic elements that relate to them, i.e., they are explicit in the language and they "do" things in the language and in the machine. There is nothing in the syntax (or the language reference manual, that I know of) of any popular imperative programming language that refers to "tags".

There's no universal law of the universe that says explicit types and "parsed" types should be forced to look like the exact same thing. There may be historical habit, but I have no qualms about kicking tradition in the ass if it gets in the way of progress or a specific analysis. Sacred Cows, be warned: I have a hot grill and ketchup.

I think my explanation above, regarding the role of parsing in TypeSystems, should address that.

We are going around in circles. Seems time to give this one a rest. I'll go enjoy a hamburger or something...


Boldy

I have copied D2 above and emphasized what I see as the problem phrases and contradiction (-t):

Variables do not have, or are not associated with, a type.

Every value has the SAME type, typically a string of characters.

Variables may be assigned any value at any time. Upon invocation, operators perform parsing as needed to determine the value's type, i.e., whether it is a string representing an integer, number, date, etc. Sometimes, the parsing mechanisms are explicitly available to the programmer, such as <cfargument type= ...> in ColdFusionLanguage which can be used to reject operator invocations if arguments do not match their corresponding parameter's specified type. ...

[end copy]

To me it's:

1. Every variable only has one type.

2. And every variable can be different types, such as integer, number, date, etc.

That's like:

1. Our zoo only has apes

2. But our apes can be lions, tigers, and bears!

It's more like:

1. Our zoo cages can hold any animal.

2. An animal can be a lion, tiger, bear, weasel, ocelot, olinguito, etc.

// "normal"
x = a + b + c + d + e; 

// "defensive" x = conversionStuff(a) + conversionStuff(b) + conversionStuff(c) + conversionStuff(d) + conversionStuff(e);
. Granted, it may perhaps (sloppily) be making a distinction between "variable" and "value", but the programmer cannot "see" them separate such that the distinction helps nothing. If it's happening under the hood, then it's not observable and/or replaceable with alternative models, probably better ones, that don't need the distinction.

Programmers do normally see variables and values as separate. Given a statement like 'v = 2 + 3', we know 'v' is a variable, but what do we call the thing that results from evaluating the expression 2 + 3?

Another way of looking at it: What do we put in a variable?

Or perhaps there is a multilevel type capability of some sort, but that's not described, such as how many levels are the limit, what combo's are allowed in the tree, etc.

I hope my explanation above -- see the paragraph starting with "Parsing has a fundamental role in programming language TypeSystems" -- makes the that clear.

I don't dispute that, but "parse-based" typing approaches are different enough from tag-based typing approaches that our model should make such distinction clear. (If we can call parsing part of "types".)

{The only significant difference (for our discussion) between what you are calling "parse-based" (e.g. cfargument) and the others is that we have to explicitly program when the type checks occur.}

No! One looks at the tag, the other only looks at the value (or both in some cases). We can do experiments to determine what is affecting the output. It has nothing to do with explicit coding. (If the lang summarily has no tags, then its model is obviously simpler: any "type" determination is parse-only, i.e. value-only sampling. Itsa no-brainer; no tags to confuse. Hug simplicity, it's your friend, unless you like obfuscation as puzzles.)

{What tag? You've yet to define what you mean by "a tag", and your "experiments" apparently can't differentiate between "tags" and "flawed CSRs". Since I can't even tell if there's a "tag", it's certainly of no use to me. And why should I learn two terms for these things when one term happens to describe both things better than your two terms combined?}

CSR's have only been shown flawed in C, not the languages of our comparisons, and the flaw wouldn't ruin the big picture even if it was.

func internalValidation(param, validateType) {  // type-name comes from programmer's XML
useRegx = symbolLookup(validateType); // get corresponding regex expressn
// Note: no error-handling for lookup fail here because XML parser already checked it
return isMatch(useRegx, param);// return pass (True) or fail (False) based on regex parse
}

. [dot is formatting bug work-around]

And "Types" poorly describes the two kinds of type encoding, at least the way you present it. It's either missing from your description, or gummed up inside with non-observable terms like "value".

Sorry, I don't follow you. What do you mean by "two kinds of type encoding" and "non-observable terms like 'value'"? Values are certainly observable: What does a literal represent? What is the result of evaluating an expression? What does a variable contain?

Then how do we directly observe values and know we are looking at values and only values? The only way to see the "result of evaluating an expression" is via output.

We don't have to "directly observe values" if we understand conventional imperative programming language semantics. However, we can also see that given an expression like "3 + 4", we can't use it to replace the "int" in "int x;", so it's probably not a type. We can't use it to replace the "p" in "p := 5;", so it's probably not a variable. It appears to be something other than a type or a variable. In fact, wherever we can use it, it seems to be equivalent to "7". Indeed, we can replace "3 + 4" with "7" everywhere "3 + 4" appears, and the program behaves exactly the same way. So what's a "7"? Now, do many more experiments like this. That's how we empirically arrive at the notion of "value". Fortunately, we don't have to empirically arrive at the notion of "value" because it's already well-understood in -- at least -- language implementation terms. I've little doubt that most language users understand it too.

That's why I say they "act like" read-only variables in most dynamic langs we consider. (It's not a contradictory term because they are alterable (variable) at coding time, similar to static languages.) And languages that allow declarations such as "int x;" generally allow things like "(int) 3 + 4", which for all observable purposes, act identical to a variable, other than mutability. Yes, there may be exceptions for some languages or edge cases, but I don't think we should get bogged down in such minutia just yet.

Expression "(int) 3 + 4" is performing a typecast on integer value 3 (and, in this case, therefore likely redundant). It's casting, or converting, the value 3 to an integer value 3. How is that like a variable declaration, other than in "int x;" it's declaring that the variable is of type integer?

How do we objectively verify that this "casting" is different than "declaring"? In most dynamic languages they act the same to external observers. Being SystemsSoftware experts, I suspect you've had your "heads inside the guts" for so long that you no longer view languages like scientists, but instead like an engineer or mechanic.

In the majority of imperative programming languages, a variable declaration assigns a name to a variable and it's added to a lookup table of variables that identify their properties -- e.g., scope, memory location, type (if a Category S language), etc. -- for use when the variable is referenced in expressions. Expressions, on the other hand, have no name and no scope, and don't necessarily have a memory location because they're typically constructed dynamically on a stack. Of course, I know this because I view languages like a computer scientist, not a natural scientist. A natural scientist might observe that variables have names and can be assigned to by name and referenced by name. Expressions have none of these.

Okay, but that seems largely an efficiency-geared decision. An "accurate" interpreter can also be created by treating the result of expressions as local anonymous read-only (at run-time) variables. Whether doing such runs fast or not is not my concern here. But anyhow, let's focus on variables for now and come back to expressions later. The topic is getting too fat.

A variable is a more complex entity than a value, so it doesn't make sense to use a more complex entity when a simpler one will do. Furthermore, a variable is a container that can hold one item at a time. If a value is a variable that can't change, i.e. it's a container, what does it contain?

But then you are adding more parts to the model. If we can piggy-back on variables, we don't have define an entirely new thing: Re-use. I agree it's a tricky balancing act: variations on a theme versus different thing, but I vote to piggyback on variables because they summarily make the model simpler, per my judgement. Regarding containing, it contains the same kind of thing(s) variables do, per observations about their output, as already described. When you toss out words like "containing", make sure it's an aspect or feature we can measure and observe, or at least be able to test for the presence or absence of contain-ness. "They contain because I say they contain" is insufficient.

Perhaps "contain" was a poor choice of word. It's one I like to use because I often use the analogy of a bucket to represent a variable, but the usual term is "store". A variable -- at least in imperative programming languages -- is said to store a value. If a value is a variable that doesn't change, what does it store?

Even constants have to be "stored". Constants are like variables except they have a lock on the door, and the system knows the combo but not the programmer.

A variable stores something, and can change.

A constant stores something, and cannot change.

What is the "something" that they store?

The value, or at least something resembling a value.

Yes. Now, what's the result of evaluating an expression?

Let me rephrase it. The result of evaluating an expression and the "result" of a constant is a variable-like "result" (for lack of a better name). It has features of variables such as values and types (if the language has them).

  Chart Var-Like-01
  ...Object..|Type|Value|Name|Write
  ---------------------------------
  ..Variable:|..Y.|..Y..|..Y.|..Y..
  ..Constant:|..Y.|..Y..|..Y.|..N..
  Expression:|..Y.|..Y..|..N.|..N..

(Dots to prevent TabMunging)

(Addendum: for "Type" and "value", the long form would be "produces a type" and "produces a value", and "Write" would be "run-time write".)

You mean to say an expression, variable or constant produces a type? That doesn't make sense -- "produces" implies to me that it generates a type. Do you mean an expression, variable or constant has a "type" property or attribute?

We can usually models those as variables with "features switched off". That provides conceptual re-use and a way to clearly compare.

Granted, we can do the same with variables and say certain features are switched off on tag-free languages, and the result would look somewhat like your model. However, that tends to cover up the fact that some languages outright don't have some of the features throughout such that things are simpler to compare if we outright omit those from their model.

No, this is at best a confusing conflation and complication of familiar concepts, and at worst outright wrong (constants don't store variables, for example.) Rather than awkwardly (and potentially incorrectly) conflate variables and values, simply keep them separate. It simplifies all models, reflects actual language understanding -- programmers are taught that expressions evaluate to values, not that expressions evaluate to variables or constants -- and accurately describes how imperative programming languages are implemented and actually work. I.e.:

I never said "constants store variables". I'm curious about which text of mine provided your mind with that interpretation. Perhaps my statement "they produce variable-like results" is ambiguous. Perhaps this is clearer: "they [constants] produce results that are very much like the results that variables produce". (It's odd how such phrases have multiple interpretations that we don't necessary notice when we write.)

Sorry, I wrote it backwards! Above, I asked, "If a value is a variable that doesn't change, what does it store?" You replied, "Even constants have to be 'stored'", which implies that variables store constants. Above, I meant to write "variables don't store constants, for example".

And again, the "familiar concepts" are fuzzy. You seemed to admit that with the "country music" analogy.

Not at all -- read that section again. What I wrote -- and it seems relevant here -- was that "[p]eople choose to listen to songs and radio stations, watch television programmes and/or YouTube videos, and purchase and/or download music on the basis of definitions of "Country Music", so it appears to be well-understood. However, what we're talking about here -- types in programming languages -- is far more clear-cut. Indeed, in the majority of programming languages it's trivial to identify explicit type definitions and explicit type references from the language grammar alone. Most implicit type references can be identified from a trivial understanding of conventional imperative programming language semantics. Border cases, exceptions, and peculiarities can be identified in reference manuals and by talking to developers who use these languages."

It still stands that the examinable output of constants and expressions share the same elements and features that variables do in most dynamic languages. I'm focusing on observable traits, and the observations are that the "output head" is indistinguishable from variable's output heads. Remember, I'm approaching this like a scientist would: what's observable and what models can mirror (predict) such observations and which models are simpler.

It's generally understood that the examinable output of a constant, variable or expression is a value. Values are also the examinable input to variables and constants. Describing values as variable-like or constant-like or whatever adds nothing but confusion, especially given that a beginner's explanation of a programming language typically starts with values. I suspect your model might actually be strengthened by dispensing with any notion of variables or constants, and dealing strictly with values. Or, perhaps even better, start with a minimal model that describes behaviour strictly in terms of values, and then build a model on top of it that deals with variables.

Studying programming languages like a naturalist is an odd approach, unnecessary at best and likely to produce error (like not recognising values) at worst. Natural scientists are forced to study the world via observation, because we don't have "insider knowledge" about how the natural world works. We don't have to study programming languages by observation, because as computer scientists we have "insider knowledge" -- we know how they work. If our explanations of how they work are inadequate, then it's our failing to write clearly about what we know actually happens. We don't need to create fiction like a "tag model" -- along with new terminology -- when all we really need, perhaps, is better writing about how programming languages actually work.

Parse-based "types" do NOT create actual tags the way an explicit type declaration (or quote-ness) does under the hood. I'm pretty sure if we inspected the interpreter and machine code we would confirm this for most dynamic languages. We may argue about what is called a "type", but the design is different between both regardless.

I'm not clear what this has to do with my point. Where did I suggest that "parse-based 'types' create actual tags"? What's a "quote-ness"? In Category D2 languages, values don't have specific types (at least, not observably, and aside from being strings.)

Arrrg, we keep coming back to this. An assignment like 'a=123;' creates a byte(s) ID for "number" closely associated with "a" or a's value (I won't get into the distinction here) for D1 langs. But something like 'a="123";x=isNumeric(a);' does not create the same kind of "byte ID' in RAM inside the interpreter. There is no need for such whatsoever. "x" may receive a Boolean tag, but it's not a "numeric" tag. No explicit byte(s) representing "numeric" needs to exist in/for the second snippet, yet would be in the first. And if "isNumeric" is used in a conditional instead of the assignment shown here, then it won't even generate a (measurable) Boolean tag (or whatever you call them). Inside the interpreter, they are different animals, unless you are wasting processing to kiss up to tradition.

Again, I'm not clear what this has to do with my point. What is it, in the paragraph I wrote to which you're (apparently) responding, that you take issue with?


I will agree that a model can be created using all the "parts" you talk about: variables, a variable's type, values, a value's type, etc., and use it to model the languages we've been talking about. However, it has unnecessary parts for many of the languages; the parts would either sit unused or used to keep redundant state info. -t

How so? The descriptions at the top of the page use all the "parts" -- values, variables, types, and the relationships between them, plus relevant items related to operator invocation. What parts "sit unused" or are "used to keep redundant state info"? Note, again, that the descriptions are based on simplified descriptions of how language implementations -- compilers and interpreters -- are actually built, sans extraneous detail related to optimisation and the like. If there are parts that "sit unused" or are "used to keep redundant state info", then the same applies to language implementations. As a language implementer, I believe that isn't the case. Could you explain and illustrate?

I've given plenty of examples and descriptions of unnecessary parts. You just play word games and point something rather arbitrarily and call it a "type reference".

Your "unnecessary parts" wouldn't be those that would explain why 123 + "123" is 246 in some languages, "123123" in others, and an error in a third category, would they?

For "type reference", read "integer, float, boolean, double, date, etc."

I challenge you to model your variables as XML-like structures similar to those in TypeTagDifferenceDiscussion and show and explain step-by-step how they are read and changed by a hypothetical interpreter that follows your model during the observations.

I have amended the descriptions at the top of the page.

Okay, but you didn't "run" them through the observation scenarios.

See the "Actions" subsections. This is really simple stuff, so no need to be verbose. If you're unclear on anything, please ask.

No, man, you just fuzzed it up further by using words like "appropriate" and "compatible", and "The type of a value may be inferred or explicitly specified". If you believe that to be clear writing, then we are worlds apart in terms of English interpretation and what "good" technical writing is.

I've changed "appropriate" to "compatible", defined "compatible" under Category S, and provided an example of "inferred or explicitly specified". The description at the top of the page is intended for readers familiar with programming languages -- i.e., the typical WardsWiki participant. It's not intended for rank beginners.

I didn't get involved in defining/modelling how operators work in general in TypeTagDifferenceDiscussion. Rather, I focused on specific observations and asked the scientific question: "can we model such behavior/observation without a type tag" or the related: "Can we determine if a tag is being used?" You are complicating things by dragging polymorphism into it. But the existence of polymorphism still doesn't answer specific questions about type-like behavior since it's generally up to the operator builder to decide how to interpret/process the value and/or the tag; and I'm not assuming uniformity of treatment unless empirically demonstrated (such as a lang that has no detectable tags anywhere). In short, "compatible" is in the head of a given operator implementer.

The absence of operators in the "tag model" is a limitation, and your "scientific question" is trivially answered by examining language implementations.

Polymorphism, in particular its use with canonical operators like "=" and "+", is fundamental to understanding precisely the language behaviour your "tag model" appears intended to address. Note that it's only Category D2 languages where it's up to individual operators to "interpret/process the value". In Category S and D1 languages, dispatch is (for the most part) done by the language implementation; which operator gets invoked is dependent on value types. "Compatible" is only "in the head" of the operator implementer in Category D2. In Category S and D1, "compatible" is explicitly defined by the TypeSystem.

Re: "In Category...D1...which operator gets invoked is dependent on value types" -- Do you mean the tag? Some D1 languages will interpret or "convert" as needed for some operations. For example, "write('1' + '2');" can be handled different ways. A hypothetical D1 language could parse both operands to see if they are interpretable as numeric, and if so, go ahead and process "+" as addition instead of string concatenation. If one or both can't be parsed as numeric, then concatenation is selected.

I don't know what a "tag" is, so it's not what I mean. What you describe is a distinguishing characteristic between D1 and D2 languages. Only the latter "parse both operands to see if they are interpretable as numeric" as per the description at the top of the page.

What if a language did parse-only analysis for operation X but did tag analysis for operation Y? How would you classify it?

As I wrote at the top of the page, "individual languages may belong to more than one category depending on particular language features".

Perhaps it's better to make the classification on an operator-by-operator basis rather than per language. However, I generally classify langs as "tag-based" if ANY operation displays taggish behavior (at least per realm, such as the scalar realm). This is because the programmer can "sample" the tag in such langs even if it's not being used for any given operator. (Something like a typeName() function is usually the easiest way to sample).

What is "taggish behavior"? Classification on an operator-by-operator basis would make no difference.

PageAnchor Heisenberg01 :

It's also goofy model-wise to change the variable's structure after the fact based on what a particular operator does or examines (such as looking at the tag or only parsing). That's a Heisenberg-like model of vars. If I am interpreting you correctly, then a given operator is a "D1" operator if it examines the tag, but is a "D2" operator if it only parses. That would mean you use one data structure (your XML representation) for D1 operators and another for D2, implying the representation of a variable changes throughout program execution between a D1 variable data structure and D2 variable data structure: a part would only "exist" if examined. -t

No, the only observable state change -- once all definitions are present -- in all three language categories is variable assignment. Given a <variable ...><value ...>x</value></variable> as shown at the top of the page, the only thing that ever changes is the <value ...>x</value>. Hence, in a language with mixed D1 and D2 operators, the D1 operators would examine the value's type attribute whilst the D2 operators would not. The variable structure (and the value structure) remains constant.

Your descriptions at the top don't say that. They make it appear that the data structure of the variable changes per operator "type". Please review. You are loosey goosey with the scope and duration.

Yes, they do say that, just below the end of Language Category D2.

You mean the data structure changes per assignment? That's an odd way to do such models. What rule changes it to what?

No, I mean the <value>...</value> changes inside the <variable>...</variable>, as described at the top of the page and elsewhere on this page. The "rule" that changes it is called "assignment to a variable", in which the old <value>...</value> is discarded and replaced with a new <value>...</value>.

So the very existence of the "type='...'" attribute comes and goes depending on whether a D1 or D2 operator "processes" the variable? I'm considering a language which uses both tag-based and parse-based "typing" (which at least Php is). Your description seems to imply the structure changes depending on which "kind" of operator is involved, which would be really odd. As originally written as language-scope classifications, it made sense (or at least was consistent). But now that you agreed to re-interpret your descriptions as being operator-centric, the structures given don't make sense, unless they magically change per operator.

No, the "type='...'" attribute is constantly present in those languages that have one. The description at the top of the page is about language categories, and the structure of a variable does not change depending on which "kind" of operator is involved. Dispatch of some operators (like "+") may depend on the type of the operand value(s); other operators (like "isNumeric()") might not reference the operand type, though I imagine "isNumeric" would be more efficient if it was implemented as a polymorphic function that unconditionally returns "true" for values of numeric type, parses the value for values of string type, and returns "false" for values of any other type.

I agree that something like "isNumeric" can be more efficient machine-wise if it checks the type tag first, and then only parses if the tag is not "numeric" (for example, to see if it's a string that can be interpreted as a numeric). But it's simpler to model it by saying it always parses (or act like it always parses). Being that I'm looking for the simpler model as a priority over mirroring actual implementation, for this discussion I will generally assume parsing if that assumption accurately predicts behavior (I/O).

But your statement, "individual languages may belong to more than one category depending on particular language features", still puzzles me and seems to contradict "The description at the top of the page is about language categories". How a language works with your description and can be multiple categories "at the same time" is still puzzling to me. How does one know which to apply to a given language? Php, for example uses (or can be modeled as) both parse-based typing and tag-based typing, depending on operator being used for a given statement/operation. Thus, how do you classify it? (I classify mixed typing as "tag based" because it's a simpler model to say the tag is "always there" even if a given operator doesn't happen to use it because it's using parsing instead.)

{The example (given above) is C#. In C#, all values are associated with a type. This puts it into either the S or D1 categories. However, variables may or may not be associated with a type. In particular, variables declared as 'dynamic' do not have a type associated with them, while all other variables do. Hence, C# exhibits traits of both S and D1 languages. One knows which to apply by looking a the language definition. As far as I know (I'm not a PHP expert), PHP is entirely D1. The isNumeric function is defined as returning true if the value is associated with a numeric type or represents a numeric value. There also appears to be another whiff of hypocrisy, you've complained about "unnecessary parts" in our description (though you never seem to get around to pointing out what those parts are), yet here you are advocating that the "tag" is present even when it's unnecessary.}

I'd like to avoid analyzing C# here because it's at least partly static, and I'm limiting my model to dynamic languages for the time being. Php's "isNumeric" function uses parsing in my test. For example, 'is_numeric("123");' returns True.

 a = "1.234567";
 b = "1.23";
 print(a); // This prints 1.23
 print(b); // This also prints 1.23
 if (a != b)
  print("Tag detected"); // this branch is taken.

.

As far as "hypocrisy", by "fewer parts" I also mean "fewer rules", rules are "parts" (and I have stated both in some places). If tag attribute pops in and out of existence in the model during run-time, then we need to give rules for the in-pop and out-pop, which is obviously goofy and more complicated. (Your model appears to have the same issue.)

In the descriptions at the top of the page, no "tag attribute pops in and out of existence in the model". All constructs described have a static structure. The only thing that ever changes is the <value>...</value> inside a <variable>...</variable>. One could imagine, however, an "isNumeric" operator that only looks at the contents of a <value type=...>...</value> construct, and ignores the "type=..." attribute. That doesn't mean the "type=..." attribute, "pops in and out of existence in the model during run-time", because it doesn't. If I don't look at my chair, does that mean it pops "out of existence"? Likewise, if isNumeric() doesn't look at the value's type, that doesn't mean it pops "out of existence." (Though, in practice and as described above, isNumeric() is probably dependent on the value's type.)

In that case our models appears to be growing fairly similar. By isNumeric() being "dependent" on the "value's type", do you mean implementation-wise, or results-wise? My tests show one cannot detect it using the "tag"; thus I'll model it as being parse-based if it simplifies the model(s) for Php. (I still believe your description and/or classification system needs to solidify the scope.)

I mean implementation-wise, which is why I wrote "in practice" and mentioned it parenthetically.

As for your perception that "our models" appear "to be growing fairly similar", I can only interpret that to either mean your model has changed or your understanding of actual language behaviour has changed, because our description of actual language behaviour has not changed. It's only grown explanatory text.

I made no changes in description of actual behavior. The experiments I did before still produce the same results. (Although, I didn't know Php's is_bool() was screwy, but this is specific to an operation.)

Then your understanding of actual language behaviour has changed?

And your writeup still has confusion in the per-language versus per-operation department described above. I haven't seen that fixed.

I'm not sure what confusion you mean. The descriptions at the top of the page refer to language categories. That's why they're headed, "Language Category S", etc.

I ask again, what category would Php fall under, given the clarifications above? (As a reminder, some operations/functions use parse-based "typing" and some don't (AKA "tag-based")).

Category D1. The operations/functions you refer to parse strings to see if the sequence of characters represents another type.

Why do you describe a fair amount about parsing under D2 but no mention in D1?

I describe what predominates and characterises the languages' TypeSystems: Type references predominate in Category S and Category D1. Parsing predominates in Category D2.

I suggest you don't make your descriptions based on frequency between those two.

Suggest what you like. That's what they are.

I'm just trying to make it clearer. I'd suggest something like, "Here's how D1 languages handle parsed-based typing..." and "Here's how D2 languages handle parsed-based typing...". If they are the same, then factor it like a sub-routine.


PageAnchor 123123

Re: Your "unnecessary parts" wouldn't be those that would explain why 123 + "123" is 246 in some languages, "123123" in others, and an error in a third category, would they?

There are different ways to process "+", and they vary per language. One needs to experiment to see which combo of values and tags and parsing best explains them on a per language basis. I can think of several different rule sets (alternatives) for how to process "+" around values and tags and operand sides. (Some languages look at the right side first and some at the left side first when making "typing" analysis.) The tag model is just the kind of clean and crisp model to test against because it doesn't rely on hazy vocabulary. The combinatorial mess of alternatives may also be an object lesson on why both tags and "type" overloading suck.

The various ways of handling "+" (for example) is entirely accounted for by the descriptions at the top of the page.

That may or may not be true. I cannot process your word salad in a "mechanically clean" enough way to really know.

Are you sure your vested interest in your "tag model" isn't influencing your appreciation of the descriptions at the top of the page?

It's honestly vague to me. You defined data structures, but don't describe clearly when and where these data structures and their parts are examined or changed. I tried to do that in TypeTagDifferenceDiscussion so it's clear WHAT part of the data structure is being examined, WHEN it's being examined, and about what info is "kept" and what is not kept. If there is state change associated with the variable, I clearly show that. There is no other (undefined) "thing" on the outside that stores that state. I don't see that from you. To avoid the pitfalls of English, I believe we probably have to model things at almost a machine-code level and go step by step very carefully and be very clear about what is happening and any state that's coming or going and what rule makes that state come and go and make sure our data structures show that state and/or are cleared when the state disappears. Thus, everything's "on the table", and we know what changes, when it changes, and what rule changed it. None of this ghosty "associated with" stuff. If there is "associated with", then make that association clear in your XML. If the association disappears for whatever reason, make the rule AND TIMING for that disappearance clear, and show the XML after that change.

Maybe we can establish some modeling rules:

1. Any "association" is shown as an XML structure.

2. Any changes to any part of the XML structure, including attribute value changes, are clearly documented and shown as new XML (state).

3. It's made clear WHEN such changes happen. i.e. the statements that triggered it during execution.

4. It's made clear WHY such changes happen. i.e. a reference to a rule number or ID is given with the change.

The only observable state change -- once all definitions are present -- in any of the language categories is due to variable assignment. Given a <variable ...><value ...>x</value></variable> as shown at the top of the page, the only thing that ever changes is the <value ...>x</value>, and its structure remains the same even though its contents and attributes do not.

That seems to contradict what you said above about languages being a combo of D1 and D2. And you've said that cfArgument creates a "type reference". Where is that reference explicitly modeled in XML? When, where, and by which rule?

And, remember, all "associated with" means is that given a statement like "<x> is associated with <y>", if we're given <x> we can answer questions about <y>. There's nothing "ghosty" (?) about it.

What is "E.I."?

I fixed it. Had the letters backward and wrong case.

E.I. phone home!


PageAnchor Izzy-02

I dissected one of the paragraphs above, showing the problem areas: -t

In all three language categories, operators may interpret their operands as they see fit, including recognizing values of various types that may be encoded within their operand values.

For example, in PHP, the "is_numeric()" operator may be used to test whether or not its operand is numeric,

which can include both operands of numeric type and numeric strings.

(E.g., 123 is of numeric type, "123" is a numeric string.)

In ColdFusionLanguage, <cfargument type= ...> can be used inside a function definition to reject invocation if the argument (which is always a character string) does not match the type named in the 'type' attribute.

Like a textbook, the descriptions are text augmented by diagrams, rather than diagrams augmented by text. XML is used only to illustrate values and variables. The internal workings of operators, which may do whatever they like (that's what "as they see fit" means) with values, are not shown diagrammatically. That's because what operators do with values isn't central, or even important, to the explanations. It's obvious that individual operators can do whatever they like with values, including recognising that the string "12122013" is numeric, or the string "12122013" is a phone number, or the string "12122013" is a date. That's what is meant by "encoding". I know beginning programmers have no difficulty with it.

Re: " because what operators do with values [and the parts of variables] isn't central, or even important, to the explanations." -- I cannot believe you made such a claim. It's largely what we argue about. And "encoding" can mean to or from binary, HTML "&" or "%" references, EBCIDIC, etc. Such may mean something specific to YOU when you state it, but the reader cannot read your mind and has to guess which specific kind of encoding is being talked about. This should be dirt-obvious to anybody with a college education.

It may be largely what you argue about and I respond to, but that doesn't mean it's central to popular imperative programming language TypeSystems. Indeed, operators recognising values encoded in values is barely a footnote to the core operation of TypeSystems. In a language like ColdFusion, its TypeSystem is nothing more than variables are untyped, all values are strings, and all operators do whatever they like with their string-typed operands.

And yes, "encoding" can mean "to or from binary, HTML "&" or "%" references, EBCIDIC [sic], etc." That's it. You've got it, and it's not limited to those, obviously.

Re: "...and all operators do whatever they like with their [...] operands" -- That's true of just about any dynamic language: some operators look at (parse) the value, other operators only look at the tag.

I don't know what "look at the tag" means, but "all operators do whatever they like with their [...] operands" is true not only of dynamically typed languages; it's true of any statically typed language too. Of course, static typing obviously limits the kinds of operands that can be passed to operators, so there are inherent limits on what values can be encoded in values of certain operand types. (E.g., not a lot can be encoded in a boolean "true" value.) What it means overall is that encoding values of type x in values of type y is not a distinguishing characteristic of imperative language TypeSystems, and so it's worth mentioning only as a footnote. For the purposes of distinguishing language categories, it is sufficient to observe that statically typed languages associate types with variables and values, whilst dynamically typed languages associate types only with values.

Sorry, I copied the wrong text and have since adjusted the quote. Explaining/modelling the encoding is important to modeling the behavior of interest properly.

The explanation is simply that strings (which are mainly what we're talking about here) can be used to encode values of any other type. E.g., the string "123" may be encoding an integer, a house number, a boolean value, an extension number, a quantity of litres, a sum of money, etc. The string "true" may be encoding a boolean value, an answer to a question, a name, etc. The string "110110" may be encoding a Morse code message, an ASCII character '6', the integer 54, a portion of a monochrome image, etc. What the string encodes depends entirely on how the function making use of the string is designed to decode it.

That may be true, but we still have to model/explain/predict the specifics of all that if our goal is to model/explain/predict the specifics.

Those are the specifics. Further explanation either involves in-depth detail about data representations, or reference to specific operators and functions in specific languages.

A stand-in function such as parsableAsNumber() is often sufficient. StepwiseRefinement can be used if issues/questions still remain.

How does "a stand-in function such as parsableAsNumber()" address your concern that "explaining/modelling the encoding is important to modeling the behavior of interest properly"?

That's good enough for me, but if you want to flesh out "parsableAsNumber()" for your model, experiments, or personal curiosity, that's fine. Our level of StepwiseRefinement depends on what we are interested in modelling/studying. Note that for modeling simplicity, I find it easier to keep all values as string in the XML representation even if the language may internally compress such into binary etc. I agree such may result in rounding differences, but if rounding issues are not of concern to us, then we can skip binary modeling of floating point out of simplicity's sake. Type issues are the concern, not floating point rounding/truncation issues. It's intended as a mental model anyhow (thumbnail type model), and we may want to toss out some of the "weight of reality" to keep our mental boat light. If one does become interested in the intersection of types and the rounding of floating point, then the "binary" issue may need to be fleshed out in more depth, at the expense of a more complicated model. -t

That's fine, and internal representations are not what I'm talking about here. I'm referring specifically and solely to how a string may be interpreted by the operators that receive it as an operand, and not whether or not the string is compressed or represented in ASCII or Unicode or whatever. "Mental boat"???

 <var name="a" tag="number" value="123"/>
 ..................A^.............B^.....

Some ops will "look at" the tag (A above) and some will look at the value (B). We may want to make a pseudo-code reference such as getTag(varName) and getValue(varName) to "extract" parts of the XML model. If something appears to parse the value, then our pseudo-code may resemble:

  // snippet 4792
  if parseAbleAsNumber(getValue(thisVarName)) {
result = parseAsNumber(getValue(thisVarName));
  } else {
raiseError("Cannot parse % as a number", thisVarName);
  }
  Etc()

That seems like an awkward way of saying a variable has a type, if that's what you intend. Wouldn't it be simpler to describe a variable as "variable = {value, type}"? However, in dynamically typed languages, variables don't have types. Only values do.

"Has a" is awkward. And "value" is only an abstraction, perhaps a UsefulLie, and there are different ways to model "values". If you can find objectively observable (to programmer) evidence that "values" exist in the way you claim they do, please present it.

A value is a UsefulLie? Huh? What do you call the result of 2 + 4, or the result of x + 7? Here's evidence that values exist and have types:

 writeln(4 / 3)
 writeln(4.0 / 3.0)
  1. 3333333333333
The type of the operand(s) is(are) used to determine which operator is invoked from two (or more) operator definitions, one for floating point operands, and one for integer operands.

I call what you showed "output". Assuming a language is not defined by its implementation, we have no direct way to observe the contents of variables; we can only observe the resulting bytes of I/O operators acting upon such variables/constants. Any speculation as to the nature of "values" is thru indirect affects of such theoretical entities. (Note that diff langs will give diff answers to equivalent snippets. I believe most dynamic langs will give the same answer to both your snippets.)

Yes, writeln() is an operator that produces output. That's not the point. The point is that the choice of operator is determined by the values' type, and/or the format of the output is determined by the expression result value's type. There are no variables involved.

How does one empirically verify your claim? And those are not values, they are constants. (Or at least I prefer to call them "constants" [correction below]. If constants and "values" are the same thing, please provide evidence.)

How does one empirically verify that variables aren't involved? Look at the source code. I don't see any constants used either, by the usual definition of constant, i.e., a named value. PI and E and EPSILON would be constants in most popular imperative programming languages. Using familiar terminology, what you call constants are normally called literal values, or literals for short.

Here's another example:

 writeln(foo())
 writeln(bar())
 writeln(fizz())
 writeln(buzz())
 writeln(foo() / bar())
 writeln(fizz() / buzz())
  1. 333333333333
I see no variables, literals or constants; only function calls. What does a function return if not a value? How do you account for the last and second-last writeln() invocations producing different output?

"Literal" is name I'm looking for perhaps, not "constant". My mistake. The above can be modeled as "anonymous variables" in most dynamic languages, as I already described somewhere else that I forgot the location of. I didn't want to get into this discussion yet until we solved the "variable" modeling issue, but you seem to want to press it. I think this all distracts from the two-chamber model of the internal of variables: the type tag and the value "representation" (as you call it).

"Anonymous variables", as familiar terminology, refer to a concept quite distinct from a value. Fundamentally, a variable has or contains something that can be changed; what is that something? If you describe a variable as having a (anonymous) variable, you're caught in an unresolvable circular definition. It is resolved by simply modelling a variable as a container for a value, where a value has a representation and a type, a representation is a sequence of binary bits, and a type is a set of values and associated operators. This model is simple; requires no unfamiliar concepts, definitions or PrivateLanguage; has no unresolvable circular definitions; and fully and accurately can account for all dynamically-typed language behaviour. Statically-typed languages merely add a type to the variable -- it is modelled as containing value and having a type.

 // Version A
 writeln(fizz() / buzz()); 
 // Version B, equivalent in most dynamic langs
 var x, y;
 x = fizz();
 y = buzz();
 writeln(x / y);
What is being assigned to x, and to y?

Note that the difference between this:

<var name="foo">
<value type_indicator="number" representation="123.45"/>
</var>
And this:

<var name="foo" type_indicator="number" representation="123.45"/>
is not much, naming choices aside. It's mostly a matter of preference, for they both "work" in simple models. The second just has fewer parts and that's why I like it. I agree the first is sometimes more extendable to more complex modeling, such as arrays etc., but it may be overkill for our purposes, and each language will dictate the specifics of the more complex language aspects anyhow. For example, many arrays don't allow each "value" element to have its own type indicator; rather it's "shared", and thus your "value" tag is not re-use-able as-is for such. In other words, the first XML snippet is PrematureComplexity.

Your model doesn't even work for trivial illustrations like writeln(foo() / bar()) when there are no variables involved. If you're going to get around that by claiming values are variables without names, you're going to force the beginning programmer -- to whom these explanations are presumably targeted -- to embrace a rather abstract notion of variables without names whilst strangely avoiding mention of values, which anyone with a modicum of familiarity with programming, mathematics, or even owning a pocket calculator, will expect to see. Where are these "variables without names" declared? If anything, for beginning programmers you'd be better off dropping variables from the model and dealing only with values.

And you still have not shown that "value" is something objectively measurable. (It's a UsefulLie in my model, but I don't pretend any of the parts are "real". They are primarily intended to predict I/O, not mirror anything "real" about interpreters, if there even is such a thing.)

It's measurable the same way your "anonymous variables" are measured.

They are not. They are merely a UsefulLie in the model. They could be the epicycles of programming; it doesn't matter other than being a part of the model. "Variables" and "values" are in the head. There are potentially different models with different names for the parts that all predict I/O properly. I don't pretend there is a God of Programming that standardizes all that. Your head is not the center of universe.

There is no central body that standardises terminology, like L'Académie française for programming, but there are certainly established conventions. If you're going to defy convention, that's fine, but in order for new terminology to be recognised you have to make an extraordinary effort to clearly define it and explain how the relate to existing terminology. Using your own terminology as if it was established will result in a PrivateLanguage, and therefore it will be ignored or misunderstood.

And "value" is a vague/overloaded word also. "Literal" and "value" are often used interchangeably in the office, for example.

"Value" isn't vague or overloaded and it has a precise meaning -- it's the combination of a representation and an (explicit or implicit) type. Do not conflate casual (mis)use of language around the water cooler with some general vagueness. "Literal" and "value" are used interchangeably because a literal is a character representation of a value.

Also note we have things (tokens) in the language code we call "variables" and things we call "literals". But there is nothing (no tokens) in most such languages called "values" (although language designers can call them whatever the hell they want.)

Whilst we don't have tokens for values, they are essential to the operation of languages. What do we store in a variable? What does a named constant make reference to? What does a function return? What does an expression evaluate to? What does a literal represent? Despite not having a token for a value -- though a literal is a character representation of a value -- we are inevitably forced to consider values in language semantics.

Re: "Your model doesn't even work for trivial illustrations like writeln(foo() / bar()) when there are no variables involved." -- How do you recon that exactly?

There are no variables in writeln(foo() / bar()), but your model is defined in terms of variables. How does your model account for writeln(foo() / bar())? Again, if your goal is a simple model, wouldn't the simplest model of dynamically-typed languages be to associate a type (or a "tag", if you must) with each value -- which can account for all behaviour -- and simply (perhaps even glancingly) note that a variable is a named container for a value? That way, you wouldn't have to rely on invented constructs like "anonymous variables" to explain language behaviour.

If one "executes" such on paper, there will likely be a reference to the "result" from each of the functions for convenience. That "thing" can be called different names in different models. My model doesn't lack such a thing, it merely names and packages it differently from yours. Thus, your statement that it "doesn't work" if flat out wrong.

I can't see where your module names it and packages it at all. Where have you provided a new name for that which is stored in variables and constants, returned from functions, specified by literals, and obtained from evaluating expressions?

I'm not recommending one build an entire interpreter. The issues at hand are about types, not parsing function calls. For better testing, it's best to put the "values" (for lack of a better name) in named variables so we can examine them later using multiple techniques. "writeln(foo() / bar())" is a poor form for doing that. (Granted, it's possible to make a language where "embedded" expressions return a different result than those moved to variables first, but those are few and far between in dynamic-land such that it may not make sense to complicate all languages' models "just in case".)

I don't see how that answers my question. If you put the result of function calls in named variables, you've still got to deal with what that "result" is and what it is that you're putting in a variable. That sounds like adding complexity to your model, rather than reducing it.

I guess I am not understanding what you are asking for.

You appeared to claim that the "result" from each of the functions can be "called different names in different models" and your "model doesn't lack such a thing, it merely names and packages it differently". If so, where is it?

It still strikes me that your model can be as simple as you like and cover all the cases if you simply associate types (or "tags", if you must) with values instead of variables, with essentially no other significant changes. Why are you reluctant to do so?

You haven't shown where it's a problem for the stated purposes of the model. Thus, I skip the nested structure.

It appears to be a problem for explaining things like "writeln(foo() / bar())", which is precisely the sort of example that beginners ask about.

Not any more than your approach. If your "value" needs an identity for a model, then you have to add an identity attribute or wrapper tag. If don't need one, then it's not really different than an anonymous variable (a variable tag withOUT a name attribute). Your outer tag only provides one name. We can stick that same name in your "value" tag as an attribute and have the same info. Your outer tag is unnecessary. Factor out extraneous levels to keep things simple.

Sorry, what are all these tags you're referring to? Are you referring to XML? The only time an "extra level" is added is when it's necessary to represent a variable -- which is done from the beginning, because the top of this page is specifically about illustrating the relationship between types, values, and variables. If you have a language without variables (entirely reasonable, by the way, for simple illustrative "calculator" languages) there's no need to mention variables, but all else -- including user-defined operators -- can be illustrated with just values that have types.

You have not shown how it's "necessary". It may be necessary in YOUR model, but not in all possible models that forecast properly.

How do you explain "writeln(foo() / bar())" without values?

Again, anonymous variables can function exactly like your "value" tag.

If they're exactly like a value, then "anonymous variables" are being used in a PrivateLanguage sense -- just call them "values". I'm not clear why you apparently wish to avoid associating types (or "tags", if you must) with values. It would make everything much simpler (except variables, which you could otherwise almost ignore, and would merely be <variable name="blah"><value .../></variable> anyway) and you wouldn't need any odd terminology.

What exactly would it "make simpler"?

You wouldn't need to use PrivateLanguage which (presumably) you'd have to explain, and you wouldn't need to introduce variables (or whatever) to explain constructs like "writeln(foo() / bar())".

Your language is a PrivateLanguage, only known to a certain group of interpreter builders who use certain reference implementations. The average developer isn't going to give a flying shit about that small group. And variables are already "introduced", I'm just re-using them.

What part of my language is a PrivateLanguage? As opposed to the language found in every language reference manual?

Your own words, with my highlights: "It's fundamental, though it's frequently skipped in language references -- but invariably covered in texts about language implementation"

Ah, you mean "representation"! As I mentioned before, I could have left it out without harm, but I think it adds much to the usefulness of the descriptions in terms of answering inevitable questions, and I explained it. It's certainly not an unusual or "alternative" use of the term "representation", and it's consistent with every use of "value representation" that I've seen in similar contexts.

It's still a PrivateLanguage, although I'll grant that "private" is perhaps continuous. But, I'm considering the environment of a typical developer, and to them it is or is equivalent to a PrivateLanguage.

Shall I take it out reference to "representation"? It changes little, but it does raise a question: In illustrations like "Value is [ Representation | Type ]" -- which is how values are represented in many dynamically-typed languages and all statically-typed languages -- what shall I put in place of Representation?

That's up to you; I don't have a good suggestion so far when using the two-layer approach you prefer. (One of the side-effects of having too many parts is that you have to name the extra parts.) I don't need the equivalent of "value is..." in my model so I can use it instead for the so called "representation".

But that is precisely the flaw in your model that makes it awkward to explain common constructs like "writeln(foo() / bar())".

No it does not for reasons already given.

Don't you have to introduce variables to explain it? Or claim that values and variables are the same thing, i.e., that values are "anonymous variables"?

I don't know what you mean by "two-layer approach". Layer???

Two XML tags, one nested in the other.

You mean in variables? Why is that a problem?

Extra layers.

Why is an extra "layer" in one thing a problem?

The single XML statement is because:

1. It's less text, objectively.

2. Subjectively it groks better in my WetWare, and without a real survey, I'll take my subjective preference over yours.

3. The two-layer approach makes one wonder if this is possible:

 <var name="foo">
<value type="number" repr="123">
<value type="date" repr="12/12/2003">
 </var>

Excellent! You've demonstrated an array.

By not having the 2 layer approach, I avoid that potential distraction/question from the reader. You'd have to explicitly state such, but I don't because I'm surfing on existing XML rules.

Should the quirks of XML dictate how a model works?

Every data structure has quirks. We work around them when they are an issue. So far, I don't see any here.

How about representing a variable's value as an attribute, instead?

Please explain.

Instead of representing a variable as <var name="foo"><value .../></var>, how about representing it as <var name="foo" value="<value type='int' representation='3423'/>"/>? No "nesting".

That's Lisp, not XML. And we don't need either.

You mean an S-expression? No, it's still (essentially) XML. However, wouldn't it be much easier to read if you simply used a tuple notation like Variable = (Value) and Value = (Representation, Type)?

Why have two levels? We don't need two levels.

You mean a distinction between variables and values? I think we do need it, in order to explain how variables relate to the result of expression evaluation, what functions return, and what literals represent. It allows us to easily explain what happens in a statement like "a = b + 3;" -- e.g., "the value represented by the literal 3 is added to the value in variable 'b', and the resulting value is stored in variable 'a'" -- without requiring awkward concepts like "anonymous variables", or artifice like defining values as variables with the "name" attribute left blank. It also most closely resembles a common beginner's analogy that likens a variable to a bucket that can hold a value.

Concept? It's no different than your "value" when there's no name, so it's the same concept, only a different name. Maybe I should use the word "object" instead of variable, but the way I suggest the tests be ran, the model user will be dealing mostly with variables anyhow. It's optimized based on usage patterns.

Why would you use the name "object"? How is your approach "optimized based on usage patterns"? In short, I'm not following your response at all.

Most of my testing suggestions involve using variables because that makes them easier to examine through multiple transformations and output techniques. Embedded expressions are not easy to examine in small test snippets. One can do, "let's examine variable foo after op1, op2, AND op3 (etc.) transforms it." One can "do more things" to a given variable than an embedded expression unless you rely on copy and paste, which is ugly and error-prone in such tests. Does that make sense? Thus, embedded expression issues are only a side-note in my model because they are rarely be used in actual tests. Plus, it's easier to reference things by a name in descriptions and samples, and variables have a name already, so I piggy-back on that fact.

To me, it doesn't make sense in terms of presenting a model -- designed to help beginning or weak programmers -- of certain aspects of language behaviour. However, although I teach beginning and weak programmers on a daily basis, I accept that I am neither a beginner nor a weak programmer (I hope!) so my views may inherently be skewed. Have you yet had a chance to expose your model to your colleagues, or others, to gauge their reactions to it?

Briefly. I drew a big box with the "representation" (value) in it and a small box (tag) with the type indicator in it, and it seemed to click for that particular issue. It wasn't a detailed interview, however.

I target "typical" developers of the kind I typically encounter. Whether they are "stupid" or "bad" or "dumb" I am not going to put a value judgement on it. Humanity is what humanity is. Fight it all you want, but unless you resurrect the Nazis' and their breading programs, we are "stuck" with that. I provide "tools for the masses" as-is. (GodwinsLaw trigged yet again?)

Nazi breading programs? 6 million buns baked in Nazi ovens?

Tools for the masses are fine, but I think if this thread is to become worth continuing, you need to show evidence that your tools are working. A single case, which could be argued is a description of the conventional model ({representation, type indicator} sounds like a conventional description of a value!), is insufficient to tell.

You haven't shown evidence that the existing explanation techniques are clear and useful to the majority of programmers.

I don't have to. The sheer volume of code written on a daily basis in companies across the globe, the number of working mobile apps released, games of kinds, OpenSource code on GitHub and SourceForge; all of these point to a vast community of programming success.

I would like to quote from somebody else in TypesAreTypes:

Honest, I didn't write that. If you personally want to see an OfficialCertifiedDoubleBlindPeerReviewedPublishedStudy before considering alternative explanations/models, that's fine, but don't make it a PersonalChoiceElevatedToMoralImperative. Vote with your feet and leave if you want, but don't bury suggestions for everybody else.

Your suggestions, in the form of your tag model, is rife with flaws. This page and others are testimony to that. Go back and work on it, and keep working on it until everyone agrees that it's good. Don't be so arrogant as to believe your first, glancing stab at an idea is perfect.

You have not shown any real "flaws" in terms of the stated goals. Tradeoffs are not necessarily "flaws", or at least it's misleading to call each node of a tradeoff a "flaw". "Everyone"? Everyone doesn't have to agree, only those looking for something that fits the way they think. Different models work better for different WetWare. Perhaps you are making the false assumption that OneSizeFitsAll or should fit all.

Your conflation of variables and values is a flaw. I see no tradeoff there.

"Values" are not objective things. They are an invention of your model. We've been over this already. (Variables and literals are usually objectively defined per language syntax, but there is no equivalent for "value".)

Values are most definitely objective things, even though they don't have language tokens associated with them (other than literals, which are character representations of values), but they are what functions return, what expressions evaluate to, and what literals represent. If you use a debugger, you can see values being pushed/popped onto/from stacks, stored in and retrieved from registers, and copied to and from memory.

You are talking about specific implementations. I do NOT define languages by implementation, as we've been over many times. Anyhow, prove objectively that "what functions return" are NOT "anonymous variables". Prove those things you see in your implementation-specific debugger are "values" and not "anonymous variables" (or hidden variables or something similar). As far as I'm concerned, the debugger is showing me some kind of "output". Whether that output it's composed of values, thubnikks, or gronklemeisters is a secondary issue and probably relative. Incidentally, we don't need stacks to implement many parts of languages. That's an implementation choice only and the language could use a different approach. In fact, I once wrote an experimental interpreter that processed expressions similar to how one usually does it by hand in math class.

Stacks are irrelevant; I only mentioned them as an example of the kind of activity with values that you can trivially observe. It appears that you're merely using the term "anonymous variable" in place of "value", which is peculiar terminology and disconnected from the term "value", which is one of the first terms that a beginning programmer encounters, along with expressions, literals, and operators.

It's vague and/or overloaded. It's only a naming issue such that I'd change it in the model if I found a better alternative. "Representation" is also "peculiar" and long. It may be worth it to live with peculiar model vocabulary rather than create a nested structure. The structure is more important than the words used in my opinion, but it may be because I'm a visual thinker and not a linguistic thinker. A linguistic thinker may be bothered more by such. I'm weighing the trade-off, and vote for the simpler structure path. If you vote different, so be it. I make assumptions about the WetWare profile of the target audience, and so do you. Neither of us has formal surveys/studies to back our assumptions, only personal experience and anecdotes. (Nor do I necessarily target every developer. Different models may be a better fit for different people.)

I suspect your use of PrivateLanguage will significantly limit the uptake of your model. I bet comments like "what's he talking about?" will be commonplace.

You mean like I do with the existing shit?

Probably.

I'm de-emphasizing English anyhow because the model would still work if we called the parts zots, flozzo's, and muuki's. It may even help by reducing risk of accidental mental overloading with fuzzy or overlapping existing terms. For example, when one uses the word "value", some programmers are thinking of literals such that "result of the expression" may be more fitting for them.

It will be interesting to observe how well that works.


Regarding the one-level versus two-level XML models:

Let me see if I can put this in terms of proportion: About 95% of the tests will or should be based on variables for the reasons given. Your two-level approach may (arguably) be a better fit for the 5% or so of the tests. We should optimize our "structure" design for the most common use cases as long as it still works satisfactorily for the less common cases, which is the case with the single-level structure. I don't see how it is "economically" (in a mental sense) worth it to optimize the structure for the 5% at the expense of the 95% use case. If it made the 5% case VERY difficult or complicated, then I can see a potential justification for such. But that's not the case here.

As an analogy, if 95% of your driving is on-road and 5% off-road, then it would probably make sense to buy an on-road vehicle because they are generally cheaper, ride smoother, and more fuel-efficient than an off-road vehicle. The down-side is that you have to drive slower on the 5% off-road. However, if you fairly often got stuck in that 5% off-road terrain (not just have to drive slower), then it would probably be worth it to buy an off-road vehicle even if it's a bit wasteful (overkill) for the on-road trips.

Why do you think 95% of the tests will or should be based on variables? Furthermore, interaction with variables is simple. Interaction with function invocations is where the subtleties lie, and those clearly need explanation of values.

I explained that earlier. In short, one should make it easy to do multiple tests on the same "object" (for lack of an agreed-upon term). And I don't know what you mean by the second sentence. Perhaps an illustration is in order. Typically tests will resemble:

  x = [some expression]
  write(x);
  write(f1(x));
  write(f2(x));
  write(f3(x));
  write(f4(x, x));
  write(f5(x, y));
  write(f6(y, x));
  etc();
If you "embed" the "value" directly, one cannot do such without repeating the "value", which is error-prone, and a violation of OnceAndOnlyOnce.

You mean "repeating the 'literal'"? How do you account for what's returned by f1() .. f6() to be printed by write(), and what is it that [some expression] assigns to x?

It could be an expression and so I didn't want to use "literal". In my model, "some expression" creates variable "x" if it doesn't already exist (depending on lang), and then populates the attributes with the appropriate type tag (fills the tag attribute) and updates the "value" attribute of that variable (the XML tag for that variable). I don't have a shorter name for that process at this time: it is what is in the model. And I'm not sure what you mean by "account for".

"Account for" is a synonym for "explain". So what does [some expression] produce that is used to update the "value" attribute of 'x'?

I see no need to model "produce" so far. I won't model stuff not necessary for prediction to keep life simpler. I don't care what the epicycles are made of as long as they produce proper planet positions as output from the model.

All we need to know is that assignments can result in one or more of the following state changes:

 1. Creates a new variable (structure) if it doesn't already exist (depending on lang)
 2. Can update the "tag" attribute.
 3. Can update the "value" attribute.

Which of these 3 actually happen depend on the language.

It will be interesting to see if there's any adoption of your model. I suspect your use of PrivateLanguage will seriously limit it, as your audience will have difficulty understanding how your terminology relates to familiar terminology.

That's your assessment, not mine. I would note you are also using a private language, because most developers are not interpreter writers by trade and don't care about the lingo actual interpreter writers use.


See DefinitionOfTypeTag, TypeTagDifferenceDiscussion, TypeSystemCategoriesInImperativeLanguagesTwo, ThirtyFourThirtyFour, SignaturesAndSoftPolymorphism


AugustThirteen


EditText of this page (last edited November 3, 2013) or FindPage with title or text search