This essentially resembles the "tag" model (with some unnecessary parts added), except the word "type" is used. I avoid the word "type" to avoid the kind of word confusion as found in Example CF002. We've been over that already. -t
Don't you mean the "tag model" resembles programming languages? There is no "model", above -- I've merely explained certain programming language semantics. Note that "type" is undefined in the above, but used in precisely the ways that any programmer must know in order to use a programming language. Hence, "type" is familiar and recognised whilst "tag" is not. Thus "tag" is unnecessary.
Well, I'm TRYING to turn it into a model to avoid getting tied up in psychology, or at least reduce dependence on psychology. "Semantics" is in the head. Again, I want to predict language output, not heads. My head is far better at processing images than language and I'm sure there are others. Visual thinkers have been some of the most productive in history such that it doesn't make sense to discriminate against them.
And you still haven't solved the linguistic confusion introduced in Example CF002. In your...notation?...the "type" stays "string" even though cfArgument makes it look like it's being filtered as or coerced into type "number".
You're conflating "programming language semantics" with "semantics" in general. There is nothing above that relies on psychology. The explicit "type" facilities in a programming language are inherently unambiguous, and completely independent of whatever philosophical debates there may be over what the word "type" means in English.
Le Arrrrg. How is it NOT about psychology? Semantics is in the head! The computer doesn't give a fuck what humans think, it just follows orders.
Programming language semantics are about what the language does. Semantics can be understood "in the head", but they're ultimately about the machine.
Roughly about. If you cleared it up sufficiently, it WOULD be a model.
{The word 'model' in programming also has a meaning. E.g. ActorsModel, cellular automaton, discrete event simulations. It has a higher level meaning than 'semantics' (but is much more precise than 'paradigm'). Semantics has a relatively low-level meaning in programming languages.}
Sorry, you've lost me here. My goal is not to create a model, but simply to explain a particular aspect of popular programming languages in terms of language elements.
Well I find the language ambiguous or unnecessarily complex and therefore want a model instead.
I think what you want is an analogy. I use a number of analogies when I teach about types, values and variables to beginning programming students. I have not mentioned them here, because this is an explanation of actual behaviour, rather than analogies that might help in understanding it. For example, I sometimes illustrate values using toy blocks of various shapes and colours and a variables as a small bucket that can only hold one block at a time. Types are represented by the shapes of the blocks, and TypeChecking by a cardboard cut-out template that sits over the top of the bucket and only allows through blocks that match its shape. Numerous other analogies are possible; an occasional beer-time post-work amusement amongst me and my colleagues is to come up with the goofiest (or most vulgar) system of analogies.
Again, as a general rough "notion" such is fine. But it doesn't sufficiently explain the more subtle aspects of dynamic languages. It's probably possible to refine the block model, but it will probably either mirror the tag model, or be unnecessarily complicated. I'd bet money on it.
The "block model" isn't a model. It's a rough analogy to aid in understanding reality. Hence the non-analogy description here, which is sufficient to explain the more subtle aspects.
Bull.
Where is it lacking?
I don't know what "linguistic confusion" you're referring to. The above deals with the subject of Example CF002 without any confusion or contradiction.
You claimed CF002 "associated a type" with the variable p. That contradicts your model-like thingy above, where you said of such langs "Every value has the same type, typically a string of characters." This rule would forbid "associating a type" of "number".
Variable p is determined to be of (or not) the type that matches the type name given in the 'type=' attribute. The type reference isn't retained; it's only used to determine whether to throw an error or not.
Then you description above for category D2 is flawed. You should perhaps say, "Every value has the same type, typically a string of characters, except at cfArgument where variables can temporary be associated with another type besides "string", but only in ways the programmer cannot see and cannot use."
No, it's correct. I build interpreters and compilers; I know how it works.
The interpreter actually creates a temporary "type" marker in RAM?
I don't know what a "temporary 'type' marker in RAM" is, but it certainly has a reference to a type because it has to invoke an operator defined by the type named by the 'type=' attribute to determine that the sequence of characters in the argument represents a literal in the set of values defined by that type.
Like I said before, one could call some of the parts of parsing "types" (for reasons unknown to me), but it's a different mechanism than what is usually used for "types" in other languages. The model, and explanation, is simplified if we simply say that category D1 lang variables have absolutely no tags ever ever ever.
There are no tags in the above. D1 variables do not have types.
You also don't clearly answer what aspects of parsing and all potential validation are "types" and which are not.
There's no need to do so, other than what's sufficiently covered in the D2 explanation.
And you still haven't put this parse exception into your D2 explanation.
I thought it was well covered. See the bit about <cfargument ...>
I don't see a solution. Above you said D2 has only strings, but you now admit there's an exception for cfArgument (numbers CAN "exist", at least temporarily). Why are you not building that exception into your description of D2?
See where it says, "... operators perform parsing as needed to determine the value's type, i.e., whether it is a string representing an integer, number, date, etc." That's what it does, at least from the user-programmer's point of view. (I didn't mention the part about operators internally converting the string representation to values of other types in order to perform operations on numbers, dates, etc., because this is generally invisible to the user-programmer.)
So now you have multiple simultaneous types in your model-like thing: a "most explicit type" that is different from a "base[?] type". It's more complicated and confusing than the tag model. Occam will slap you.
Not at all. The language checks to see if the sequence of characters in a string represents a literal in the set of values defined by the named type. That's precisely what <cfargument ...> does.
And once again, there's no model here. I'm explaining what real languages do.
I know that, but your way of explaining it is round-about, and overloads "types".
How is "types" overloaded, and how is it "round-about" given that it's precisely what happens? How would you describe what happens?
I told you: it parses, and we don't have to call anything "types" (so that there's no confusion between explicit types in other langs and between validation etc.)
Yes, but it parses to check that the argument is a... what? Why are we doing this parsing? Why do we use <cfargument ...>? What is its purpose?
Either way, I am looking for a model that explains/predicts the differences in dynamic languages and I want that model to be as simple and/or as least as confusing and least as ambiguous as possible. If YOU don't want such, tough titties. Ignore it until you have a competing model.
I don't need a model. I'm explaining what actually happens, and that turns out to be sufficient to categorise languages and explain their behaviour without ambiguity and without imaginary constructs like "tags".
No, you are playing word games; calling something in the interpreter a "type" arbitrarily based on some vague head notion.
I'm not calling anything a "type" that the language designer, language implementer, and language user wouldn't also call a "type".
As already explained, historical habit sometimes interferes with clarity.
Be that as it may, in introducing "tags" and avoiding types you're attempting to defy common understanding of programming languages and ignore facilities which are embodied in their syntax. How will you explain imaginary "tags" in languages that clearly have explicit syntactic constructs for defining and referencing types?
I realize there is a trade-off, but by replacing a fuzzy notion with 2 clearer and simpler notions, a better model is produced. Or at least a model that may appeal to certain WetWare over another being that some models fit others better.
Analogies may be helpful in gaining an initial grasp. They should not be a permanent substitute for a genuine understanding of reality.
"Types" are not reality; they are only in the head.
Type definitions and references are not only in the head; they are explicit parts of programming languages that programmers cannot avoid. Programmers must understand them in order to be effective programmers.
That depends on the definition of "type"; but that's a calendar killing topic we've learned. The existing attempts at definitions are not quite good enough to tune one's understanding, making many programmers rely too much on trial-and-error. I want a better model.
What it appears you want is a better understanding. You believe a better model will help achieve it. It will only help achieve it if there is a clear and explicit relationship between every element of your model and every element of the reality it models. As it stands, there is apparently no clear and explicit relationship between the "tag" element of your model and the elements of the reality it models, i.e., the collection of statements in various languages and the types, values and variables that they manipulate. I say "apparently", because although I claim that "tag" is a synonym for "type" or "type reference" and therefore redundant, you claim there is a difference.
Again, there is no reality to types. Languages and programming are abstractions in the head. The closest thing to "reality" would be the machine language, and as already stated that doesn't fit your explanation either unless one is loose with language.
{Similarly, there is no reality to cars. Vehicles and automobiles are abstractions in the head. The closest thing to "reality" would be a bunch of particles and forces; it's our soft brain WetWare that vaguely categorizes these as cars... (or perhaps we should step back and recognize that abstractions can be 'real' if we can observe them?)}
That's true, cars are a UsefulLie. As long as the parties involved agree to what's part of the car and what's not, the UsefulLie remains useful. But there are boundary cases, such as an insurance lawsuit over whether rust is "part of" the car or not. Good UsefulLies generally result in fewer disputes. But cars are at least physical things that we can touch and point to and put frames around. "Types" not so much unless we use a physical abstraction to represent them. "Tags" is one such approach: Variables in certain languages have a "tag" attached to them that can be represented in a physical way similar to a matress tag, while in other languages we can model successfully withOUT the "tag": a variable has only a "value" inside its "box", no tags inside nor attached. This can successfully be used to explain differences in language bahavior and available operations. Perfect? No. Better than the alternatives? So far, yes. --top
Philosophical ruminations aside, "types" in programming languages are as physical and real as cars. What is the "long" in "long x;" in C, C++, C# or Java? Why does 123.0 == 123 return true in one language, false in another, and throw an error in a third? Inevitably, a correct answer must either mention "type" or be forced to use some PrivateLanguage.
It's fine to talk about types, but there are different sub-models to "typish" behavior. The tag approach doesn't forbid talking about "types", it just doesn't make it the primary focus, or force-fit it into every aspect of the the prediction engine like the insane ex girlfriend who won't go away. [dating story removed. -t]
What does your tawdry, misogynistic, sexist anecdote have to do with anything? Do you really think it adds something to this discussion, or is it an attempt to divert attention from your unsupportable assertion that there is "no reality to types"?
I was trying to highlight a concept to help it stick in your memory. Unless we agree on tests for testing whether something is "real" versus "in the head", this discussion probably won't go anywhere. I would note I can throw rocks at cars but not at types. I've never seen a photo of a damaged integer.
So you thought a sick little story would help it stick in my memory??? Baffling. As for "never seen a photo of a damaged integer", do you mean you've never fired up a debugger to see how data is represented and how variables change as a program runs? Have you never implemented a simple interpreter, or looked at a simple interpreter or compiler implementation to see how values, variables and types work? I can provide examples, if you like. I have simple compiler and interpreter examples, written in Java and implementing simple Javascript-like languages, that you can examine. By the way, there are no tags; TypeSystem behaviour is based on variables, values, types, and the relationships between them.
If you can examine a byte or segment that explicitly indicates "integer" etc. in a debugger THEN you are looking at a "tag". However, no tag-free language will such such in a debugger, yet you claim they still have "types". Thus, if you play the reality card, then calling the parse-centric type-like activities equivalent to that byte/tag/type-indicator/wtf-flag one sees in the debugger (your "reality") is inconsistent. You flip-flop between explicit byte tags and notion-based in-the-head typing, and it frustrates me to no end that you don't see this inconsistency.
Wherever "type" is used at the top of this page -- see the diagrams -- in a debugger there will certainly be a byte, segment, pointer, or some other reference to the code or data structure that represents a type. You might have to go looking for it, though. In RAM, it won't necessarily be immediately adjacent to the value representation, though if you're lucky it might be. It's just as likely to be the topmost item of one stack whilst the value is topmost on another stack, but only at a certain point during run-time or compile-time. It might an attribute of a node in a tree. Or it might be something else entirely. In certain Category S languages, it might only exist at compile time. In other Category S languages, it might exist at both compile time and run-time. In short, there is no universal way of representing type references internally; the best you can say is that a variable and/or value is associated with, or has, a type. And it's definitely and always a type, it's never something else that isn't a reference to a type, so there's no reason to call it a "tag". It's a type. It's always a type. There is no "flip-flop" between explicit "byte tags" (huh?) and "notion-based" typing. What is given at the top of this page is simply a description of what happens in language implementations, using familiar terminology in a conventional manner.
Your "associated with" is again vague. Many parts can affect the output, so in theory a good portion of the run-time RAM image could be called "associated with". The ButterflyEffect: the butterfly affects everything so is everything a type? I am the Walrus, You are the Walrus, Coo Coo Ka Joo, Suck my Yoko! In tag-based languages, there should be a fairly clear-cut byte associated with variables and variable-like things. In non-tag languages, there will not. But you call intermediate parsing steps or temporary hidden work-values "type", such as with the CFargument example, in a seemingly arbitrary way. "I say this is a 'type' because I say so." In short, type-ish behavior may be a direct RAM "part" associated with variables, or very indirect parts, and in between. This gives type-thumpers too much wiggle room. It's right back to a verbal labeling game.
{You claim "associated with" is vague, yet you use "associated with" to describe your tags. Do I detect some hypocrisy? You also appear to have blown a few neurons in the middle of this. Maybe you should step back, take a deep breath, and try to come up with something coherent.}
But you don't have it do anything observable in some cases: it just sits there in association suspension. The coherent advice applies to both sides. God I hate English.
Whilst I must apologise if I've not been entirely clear, I think we're considerably more coherent -- and certainly less pointlessly vulgar -- than your "I am the Walrus, You are the Walrus, Coo Coo Ka Joo, Suck my Yoko!" Is that meant to be funny?
What do you mean by "you don't have it do anything observable in some cases" and "it just sits there in association suspension." Where is this the case?
If you hate English, I'd encourage you to present your "tag model" using a FormalLanguage.
Part of the goal is to make a model presentable to "ordinary programmers". I'm hoping something like TypeTagDifferenceDiscussion strikes a sufficient balance between formality and approachability. -t
Re: "This gives type-thumpers too much wiggle room." What does that mean? Are you claiming that when <cfargument type="float"> checks the argument by verifying whether or not the sequence of characters in the argument can be converted to an IEEE 754 float, that isn't TypeChecking, and doesn't have anything to do with types? Or do you think IEEE 754 does not define a data type?
I thought you claimed that cfArgument was doing "type checking" and that your model had a type "associated with" the information being passed. That was a big point of contention. Are you now saying that cfArgument has NOTHING to do with "types" now? "KIND OF associated with"? You are driving me Yoko.
No, I'm saying the opposite. cfArgument is very much about types. I thought you were claiming it isn't.
I thought I made it clear that I use "tag" instead of "type" to avoid that vocab battle. You claimed tags and types were the same thing; I dispute that, but am escaping that issue by inventing my own terms. (Perhaps I should call it the "Yoko" instead of "tag" because it irritates you more.)
If you dispute that "tag" is equivalent to "type reference", then please show how it differs. You've been asked before, and in response you've claimed that you've shown a difference. I can't find it. Again, there's nothing wrong with defining your own vocabulary, but if you're going to do so, you need to provide a compelling argument for using your own vocabulary, and provide clear and comprehensive definitions for it -- preferably showing how it differs from conventional vocabulary. Without that, you will consistently be ignored, opposed and misunderstood.
I've been trying to explain both and am still in the process of trying more and more. I'm having trouble communicating it to you for reasons that are fucking baffling, like trying to explain cars to aliens that live on a gas-only planet is with gas-only stuff.
Once again, you refer to efforts to explain the difference between "tag" and "type reference", but I can't find them. Either they're lost in the morass, or you think you've written them but didn't, or I find them so un-compelling that I dismiss them instantly and forget them, or I think I've quashed them in counter-argument and forget them. Seems unlikely I'd forget them, though. Quick summary or PageAnchor, please?
My primary goal is NOT to define "type". I don't know exactly what a "type reference" is so cannot compare one to tags (because the def of type ref is fuzzy). In the tag model, one does not need or have tags for parse-based type-like validation. You claim such parsing indicates a "type reference", but it's not a "tag reference" because in tag-free languages there are NO TAGS in the model for those langs to reference. YOU CANNOT HAVE A REFERENCE TO SOMETHING THAT DOES NOT EXIST (in the model). I don't know how to make that more clear. The model runs fine (predicts properly) without tags for TFL's, yet you claim that parsing activates (?) "type references" in TFL's. That seems like a blatant contradiction to me, UNLESS tags and type references are different things. I've explained this roughly a half dozen times already but it's not sinking in and repetition is not working either. I'm at a loss to explain it to you.
You don't have to define "type". I didn't define "type", either. "Type reference", or just "type", is used here in its simplest, most familiar, and most intuitive interpretation, i.e., shorthand for "integer, float, boolean, double, date, etc." Please read again what I wrote at the top of the page regarding parsing in Category D2: "Operators perform parsing as needed to determine whether each argument value (which is a string of characters) represents an integer, number, date, etc. Sometimes, the parsing mechanisms are explicitly available to the programmer, such as <cfargument type= ...> in ColdFusionLanguage which can be used to reject operator invocations if arguments do not match their corresponding parameter's specified type." The only reference to "type" (outside of <cfargument ...> itself) is "... parameter's specified type", which is precisely what is stated in the "type= ..." attribute of <cfargument ...>. Is it not true that "type=" can refer to "integer, float, boolean, double, date, etc."? Therefore, when it says "type=...", would you not call the stuff in "..." a reference to a type?
Perhaps, but I don't NEED tags for that, so why have it in the model? Why put things into the model when the model does NOT NEED THOSE PARTS TO DO ITS JOB? Rationality 101.
I'm not putting "tags" in any model, so I'm not clear why you're asking me. Indeed, I'm not clear what your response has to do with what I wrote above at all, but particularly what it has to do with "tags". I didn't even mention "tags".
Arrrrrg. Hitler! Hitler! Hitler!
Yes, self-aware invocation of GodwinsLaw is always a good way to declare, "I lose, you win." :-)
Go ahead and think that.
Let me clarify my equivalence between "tag" and "type reference": It appears every time you use "tag", it's equivalent to a type reference. You can test this. Everywhere you use the word "tag" in your model, try replacing it with "integer, float, boolean, double, date, etc." and see if anything changes, conceptually. Does it? However -- and perhaps lack of clarity here has been the source of contention -- sometimes when I use "type reference" or "type" it's not equivalent to "tag". A good example is my description of <cfargument ...>, which certainly (and appropriately) makes reference to "type" but doesn't appear to create a "tag" (if I understand what you mean by "tag"). However, I'd describe that as not permanently associating a type with the argument value being tested.
I don't need "tags" to model CF. But you saying I need "types". If they are equivalent, then we have a contradiction. In fact, under parsing, something can "have" 2 more "types" at the same time. A model that requires 1-to-many associations is generally more complicated than one that only needs 1-to-1, or in tag-free languages' case 1-to-0, which is even simpler than 1-to-1. (See Observation 4 under TypeTagDifferenceDiscussion for an example.)
You appear to use "tag" to mean "type reference" but I don't use "type reference" to mean "tag", so there's no contradiction. It's a one-way equivalence, just like "all brothers are men" isn't the same as "all men are brothers."
Sorry, you lost me.
When you use "tag", it appears to mean "type reference". In other words, "all tags are type references". When I use "type reference", it doesn't necessarily mean "tag". In other words, "all type references are tags" is incorrect. Hence, "all tags are type references" isn't the same as "all type references are tags."
Sorry, not sure what your "1-to-many" and "1-to-1" refer to or what "tag-free languages" are. Whether you need "types" or not, surely you have to be able to explain what <cfargument ...> does and what its "type=..." attribute means, no?Back to the core point: It appears every time you use "tag", it's equivalent to a type reference. You can test this. Everywhere you use the word "tag" in your model, try replacing it with "integer, float, boolean, double, date, etc." and see if anything changes, conceptually. Does it?
I don't care what it "means" anymore. I'm just modelling results. We don't have to conjure up a "type reference" to simply say "it's parsed and it passes or fails the parse". Why create a thingy in the state, a "type reference" when we can simply say it's a processing step INSTEAD OF a state change? You just create more messy questions by inventing hard-to-test state objects, such as how long does the "type reference" remain in existence? How do we test for its existence? What does it affect and not affect? And when does it perform these affections? (Effectivities?)
You appear to be evading my question. Again, everywhere you use the word "tag" in your model, try replacing it with "integer, float, boolean, double, date, etc." and see if anything changes, conceptually. Does it?
If you're going to ask questions like "how long does the 'type reference' remain in existence?" or "how do we test for its existence?", then you have to ask exactly the same questions about your "tag", and we've already established that testing for the existence of a "tag" is fundamentally flawed -- it can't distinguish between a characteristic of a canonical string representation vs a "tag".
Further, programmers seem to know what parsing is such that we don't have to expand on that in a model. But exactly how your "type reference" works and how to examine it is unknown. Plus, it appears they STILL have to know about parsing even if "type reference" is used in your model, but I'm not fully sure because it's messy.
For "type reference", simply read it as "integer, float, boolean, double, date, etc." That's all it means, and it's so ubiquitous in programming language explanations that we usually just say "type" and everyone understands what is meant.
Okay, but it's not the same thing as tag-based typing ("explicit" typing). The idea that any given variable can hold a "value" having 2 or more types is odd and counter-intuitive. We need a more powerful model to explain such. Example:
a = "123";
write(typeName(a)); result in TL: String
write(isNumeric(a)); result: True
write(typeName(a)); result in TL: String
.
I don't know what "tag-based typing" or "'explicit' typing" means, but I certainly distinguish between Category S, Category D1 and Category D2 languages. However, the idea that a given variable can hold a value having two or more types is commonplace and trivial. A value of "12" can simultaneously represent values in types string, long, integer, float, month, day, year, size, quantity, cost, etc. Which one it is -- at a given point in the code's execution -- depends on the code and the language.Your code is easily explained: 1. Variable 'a' is assigned a value of type string. 2. We print out the type of the value in 'a'. 3. We look at each character in the value in 'a' to see if they're all numeric digits and they are, so we print 'true'. 4. We print out the type of the value in 'a'.
That's almost the tag model, it's just that you over-complicate it (for some languages) and haven't refined the specific parts. It also appears to contradict your description of the cfArgument example where you imply a "type reference" is created, but don't explain how it relates to your (excess) parts such as "variable", "value", "type of variable", "type of value", etc.
That's only "almost the tag model" to the extent that your "tag model" resembles real language behaviour. The above is simply what certain dynamic languages do. There are no "excess parts", because I've described precisely what all the relevant "parts" do in real programming languages. As for something "to contradict [my] description of the cfArgument example", I think you must have mistakenly assumed that discussion of type references in one of my later explanations -- either regarding internal real language optimisations or language semantics -- is part of my description at the top of the page. It isn't, or at least not the way you appear to think it is. Here's the actual bit related to cfArgument from the top of the page, again: "Operators perform parsing as needed to determine whether each argument value (which is a string of characters) represents an integer, number, date, etc. Sometimes, the parsing mechanisms are explicitly available to the programmer, such as <cfargument type= ...> in ColdFusionLanguage which can be used to reject operator invocations if arguments do not match their corresponding parameter's specified type." Where do you infer that "a 'type reference' is created"? (Of course, semantically and in typical implementations, there is a temporary relationship between the type specified in the "type=..." attribute -- which is, by the way, a type reference by any definition -- and the value being tested by cfargument, but it has no bearing here.)
Define/clarify "do". The tag model works and is simpler, as shown in TypeTagDifferenceDiscussion. If you want more specifics (StepwiseRefinement), just point out what to refine, and I will.
By "do", I mean it's not a "model" of language behaviour, it's a description of actual language behaviour. Your model isn't any simpler -- it has to deal with precisely the same elements whether you accept them or not (e.g., "variable-like" things are values, so you have to deal with values; "tags" are your personal alias for "types", etc.) so you're essentially trending toward describing actual language behaviour, but adding complexity or at least a learning burden for the user of your "model" by employing a PrivateLanguage.
Please clarify "description of actual language behavior". We can only observe input and output. You are NOT describing only I/O. The parts you talk of can only be inferred via a model of some sort. I believe you are mistaking your head notions for some objective reality. And like I've said repeatedly multiple times, a clear(er) PrivateLanguage is better than a fuzzy public one. If you fuss about "PrivateLanguage" again without addressing this trade-off, I'll kick your cat.
Threatening my cat(s -- which one?) is punishable by having to watch a video of them: Now, don't you feel guilty?
I told Calvin you were threatening to kick his ass. He's all, like, "Dafuq?"Personally, I wouldn't mess with anyone who looks like Victor Wong, whether he's a cat or not.
Regarding "the parts [I] talk of can only be inferred via a model of some sort", that's not true. We have language manuals, programming manuals, ComputerScience papers, textbooks, source code (including compiler and interpreter internals), and the verbal and written descriptions from people who create compilers and interpreters. These all allow us to do far more than infer "via a model of some sort"; they allow us to understand how languages actually behave -- based, ultimately, on how they're actually built -- and explain more than just I/O. I do understand that you want to create "The Top Field Guide to Programming Languages" based on a naturalist's observation of programming language behaviour, but as has been pointed out elsewhere on this page, that is both unnecessary and prone to error unless you (at least) take a scientist's approach by objectively using all sources of information available -- i.e., doing your research -- then rigorously justifying it and clearly, comprehensively, and completely defining and describing every aspect of it.
Re: "tags" are your personal alias for "types" - You keep flip-flopping between them being the same and not being the same. Your parse/validation arguments seem to contradict sameness. We don't need an explicit tag to emulate an interpreter for parse-based "type" validation. That is a FACT.
{Since I don't have a cat, I'll respond to save the other's felines some grief. A clear(er) PrivateLanguage is only clear(er) to those who know the PrivateLanguage. This will obviously exclude most people as it would no longer be a PrivateLanguage if most people understood it. Therefore, for most people, a PrivateLanguage, regardless of how clear it is to those in the know, is less clear than a PublicLanguage. Since you appear to be unable to communicate what the words and phrases in your PrivateLanguage mean, it's likely that the only one your PrivateLanguage is clear(er) for is you.}
I'm making a model, NOT a language. If you want to call the tags "types" or "snorketts", that's fine by me. Software and devices make their own PrivateLanguage all the time. For example, browser "cookies". Doing so hasn't killed any cats (although better words perhaps could have been chosen).
{Yes, we know you are making a model. Somewhere. The problem is that your description of it is in a PrivateLanguage, and, unlike when it's done for software and devices, you won't tell anyone else what that PrivateLanguage is.}
I thought I did. If it's not clear enough, ask questions.
One example of lack of clarity is that you sometimes appear to be describing an abstract a model -- in which a "tag" is a "tag", otherwise undefined -- and sometimes it appears to be concrete and you call it a "type-ID byte" or something similar. Another is that you sometimes you claim your "tag" isn't the same as a "type reference", but when you call it a "type-ID byte" it sounds exactly like a "type reference".
Sometimes the context is what an actual interpreter does "under the hood" (or an interpreter-like model), and sometimes just an abstract model. The context of the replies would hopefully make that clear, but I haven't reviewed such for clarity. Our discussion here tends to intermix attempts at "reality" modeling (such as the "what they DO" controversy), which leans toward implementations, versus abstract modelling. As far as "type reference", I have no clear definition of TR to use such that it's hard to pin down. Your side's model of parse based typing seems to "have" so called "type references", which have no tag-model counterpart (and need none).
As pointed out elsewhere on this page, for "type reference" simply read "integer, float, boolean, double, date, etc." Like the "tag" in your "tag model", "type reference" is undefined (as are "integer", "float", etc.) but "integer, float, boolean, double, date, etc." is undoubtedly familiar to the programmers to whom this matters. As I've suggested before, trying replacing "tag" with "integer, float, boolean, double, date, etc." in your model. Does anything change?
Several times, I've asked you to try replacing "tag" with "integer, float, boolean, double, date, etc." in your model and let us know if anything changes. You've not responded. Is there a reason why you've not responded to this simple question?
As for "reality" modelling, I suggest not using "type-ID byte" or reference to bytes at all, as it is inaccurate and somewhat misleading. There are certainly language implementations that do use "type tags" -- a byte or two that serves as a type reference -- but that's merely one implementation mechanism among a myriad. If you're going to insist on using "tag", please be consistent about it, as there's less inaccuracy in always using "tag" to represent "type" than bouncing between "tag" and "type-ID byte". It's a bit like using the term "carburettor" to refer to petrol/gasoline engine intake systems in general, but instead of consistently using the term "carburettor" you sometimes write "Rochester 4MV Quadrajet" -- which is a particular make and model of carburettor -- when, really, you should be using the term "intake system" because "carburettor" doesn't include fuel injection.
It's a model to provide predictions, and it's an accurate model for its intended purpose (or at least you've shown NO objective flaws, despite your false claim above). I am not going to get caught up in vocabulary here. The model would still work if I called the tag a "type" or a "flugjok". As far as different ways to implement languages, YES, that is indeed true. But it's not really intended to illustrate actual implementations, and since you admit there are many ways to do that same thing, it's mostly moot. The goals in this order are:
1. Simplicity - Easy to describe and digest
2. Prediction accuracy - Transforms input (source code + data) to output (bytes) that matches an actual implementation of the language.
3. Fits existing "type" vocabulary
4. Matches actual implementation(s) of the language
5. Expandability to other kinds of language categories (added 9/5)
You guys put #3 at top, but when I look at the results, it appears to conflict with #1, which I give a higher priority such that I reject a portion of existing vocab. And "type" is overloaded in practice. -t
You claim we've shown "NO objective flaws", yet just a few paragraphs below this point -- in response to noting that one of the various flaws (which we've pointed out) of your model is its failure to account for why 123 + "123" is 246 in some languages, "123123" in others, and an error in a third category -- you state, "I'm aware of the problems." Aren't "objective flaws" and "problems" essentially the same? Your claim of "prediction accuracy" is not sustainable when you can't "predict" why 123 + "123" is 246 in some languages, "123123" in others, and an error in a third category. That means your claim of "Simplicity" is suspect, because your model is incomplete. And what does "'type' is overloaded in practice" mean?
I think there was a misunderstanding on that. See PageAnchor 123123.
{Anyway, "description of actual language behavior" means exactly what one would expect it to mean. It means that it's an accurate depiction of the actions of existing languages. We can also observe a lot more than input and output. We can observe the language definition, the source code for the program, compiler, and/or interpreter, the internal state of the program, compiler, and/or interpreter. Why you chose to ignore those sources of information is a mystery to me.}
You haven't demonstrated it's an "accurate depiction of the language" in a clear way. By "input", I also mean the source code, by the way. Input is source code and data, and output is data (bytes). And as I've mentioned elsewhere, the traditional language for types used by language manual writers is not very good for the purposes of explaining "type subtlety" and creating the simplest possible model that still forecasts correctly.
{Huh? Go read a language specification, all of them will explain themselves in the described manner.}
They suck. We've been over this multiple times.
Correction: They suck from your point of view. That doesn't mean "they suck" in general. Do not make the mistake of conflating your personal opinion with general opinion.
{Finally, if there's any flip-flopping, it's because you haven't defined what you mean by tag. This means we have to guess what you mean. Since you don't appear to use it consistently, our guesses have to change to keep up with your latest pronouncement.}
Again, I'm making a model, not a definition. If you are waiting here for a definition to appear, then go home because it may never come.
{Again, we know that. And again, the problem is you are using a PrivateLanguage to describe it in, leaving us to guess what you mean by it.}
The purpose of the model is to predict, not give meaning. An interpreter probably has a lot of PrivateLanguage parts in it, but we usually don't care as long as it transforms the input (source code and data) into the expected output. We judge it by I/O, not the "meaning" of the parts the interpreter is made of. The model is (hopefully) transparent and approachable enough to see the innards doing their work to transform input to output. If you want to assign "meaning" to the gear-work, I don't know what to say, other than I have no obligation to (although I have tried hard). You can study the gear-work all you want; there is no mystery. But "meaning" is relative; I can't help you with your head in a rigorous way.
The problem with a PrivateLanguage, in this case, is not its lack of meaning. It's the lack of clear mapping between the familiar PublicLanguage parts of programming languages and your PrivateLanguage model. In the absence of such a clear (and hopefully unambiguous) mapping, we're left guessing at what parts of languages are equivalent to parts of your model. In particular, it's unclear how "tag" maps to "type", and it's unclear how your use of "variable" -- which from discussions here, appears to encompass more than variables -- maps to variables and expressions. I mention this because it's unclear how your model accounts for why 123 + "123" is 246 in some languages, "123123" in others, and an error in a third category.
I'm aware of the problems. I've weighed them already, as already described, and found too many issues with the traditional ones. It doesn't mean I'm dismissing your concerns, but rather OTHER concerns overshadow them. Trade-offs trade-offs. All else being equal, yes, fitting/using existing vocab would be the way to go. But the "all else" is not equal.
If you don't wish to constantly have to defend your choice to reject "the traditional [terms]", you need to strongly and compellingly defend your choice to use new terms and clearly map your terms to traditional terms. Otherwise, there will be perpetual disagreement and/or confusion from those who understand the traditional terminology.
In particular, I believe it's a human-interface-design mistake (e.i. "confusing") to use nearly identical vocabulary for parse-based typing and explicit typing. I bifurcate them by hacking off a big part of the model for non-tagged langs. You can't mistakenly use a limb that's been hacked off. Gone is gone. The model cannot get more clear than that: one team has no heads. "Look mom, no type tag!"
Your buddy in the picture has a head, it's just not associated with his neck. It's an excellent analogy for parse-based typing -- the type is there, but it's not associated with, i.e., it's not attached to, the value. There is, however, a relationship -- just like the arms are holding the head -- defined by performing a test to see whether a value meets the criterion of belonging to the set of values defined by the type. In short, how is it confusing to state that Category D1 languages associate types with values, and Category D2 languages don't associate types with values but can use parsing to check the type of a value if needed? That's what's described at the top of this page. You can't chop off a head if there isn't one in the first place, and you shouldn't chop off a head that's only borrowed briefly and returned before becoming headless again.
You seem to be agreeing they are different "kinds" of "type things" (for lack of a better word). That's a start. You just need to clarify in your model when and where and how long these head(s) appear. So far you have Ghost Typing. And because a given "value" (for lack of a better name) can pass multiple parse tests, you can have multiple heads floating around in your hazy ghost model.
There's only one kind of "type thing" being discussed here. There are different places where identifying sequences of characters as a type's value occurs, and different things done with the result of that identification.
So you say.
It's true. That's how languages are implemented.
You are mistaking (or being fast and loose with words).
I implement languages in all three categories, teach language development, have studied ComputerScience and SoftwareEngineering, and have studied existing language implementations, so I think I know something about how languages are implemented. Where am I mistaken?
Maybe you've gotten too cozy with your own framework or tool set such that you cannot envision other frameworks. If you've always driven a car, then all your plane cockpit designs may mirror car conventions even though that may not be the simplest way to build a plane cockpit.
That's certainly possible, though if I "cannot envision other frameworks", it looks like the rest of ComputerScience can't either. I did ask you to point out where I am mistaken. Have you done so?
Until you produce a runnable model, I cannot fully evaluate it. And GroupThink does happen.
If you'd like a runnable model, I have several simple Javascript-like languages implemented in Java that I use to show ComputerScience students how compilers and interpreters work. They're based on the same approach to TypeSystems, described at the top of the page, that most popular imperative programming languages use.
In particular, one has to keep "type state" in tag languages somewhere in RAM. That's a fact. a=123 creates a different state than a="123" because "quoteness" (for lack of a better term) affects FUTURE operations. But one does NOT have to keep "type state" for parsing on a pass/fail basis. That's a fact. An isNumeric() operation does NOT have to keep a byte(s) around that indicate whether something is "numeric". We don't need a "type byte" to implement parsing. You seem to be calling the result of isNumeric() a "type reference", but its nature is different than the nature of a=123 versus a="123". Perhaps we can force them to use the same mechanism, but it's not required. I suspect you are forcing out of habit of a framework. isTypeX() operations only have to return a Boolean flag, and this Boolean flag does not have to be associated with the original variable (argument) in RAM. That is a fact. (By "associated with" I mean using either an address pointer or a positional convention (next to argument var).)
Is there something in my description at the top of the page that contradicts this? For example, where do I state that an isNumeric operation has "to keep a byte(s) around that indicate whether something is 'numeric'"? (Other than for internal performance optimisations, where in some languages this does occur!) You appear to be slaying a StrawMan here.
Did you not claim that the cfArgument example creates a "type reference"?
No, it has a type reference which is specified by the "type=" attribute in the <cfargument ...> tag. However, as has been pointed out several times, that doesn't mean a type reference is subsequently associated with a value.
How does one objectively measure whether association-ness happens or doesn't?
Do you mean in general? Or in a specific context? Are you looking for an objective measure of whether "type=" is a type reference, or whether values in general have type references?
What's a "type byte"? Do you mean a type reference? By the way, "associated with" means that given a statement like "<x> is associated with <y>", it means that given <x>, we know <y>, i.e., given <x> we can answer questions about <y>. This has already been explained elsewhere on this page. Use of "an address pointer" or "a positional convention" is certainly used in some implementations, but not all. As already pointed out, "associated with" could mean (for example) the value is on the top of one stack and the type reference is on top of another stack at the same point in traversing the abstract syntax tree (this is common with static type checking in compilers), but there is no other connection between them.
Re: "the "type=..." attribute -- which is, by the way, a type reference by any definition" -- Which definition? Do you mean because the CF designers called it "type"? Would it change things if they instead called it "zorpmiff="? -t
It would make no difference what it's called. It's the semantics of what it does. Given a value passed as an argument to a function, it checks to see whether that value meets certain criteria, which means it identifies -- or at least refers to -- a set of possible values, which is -- by definition -- a type.
ALL validation does that. Thus, is all validation "types"? I keep asking that and never get real answer. (Validation does not inherently create "tags" in my model such that if validation "creates types", then tags and types are not equivalent.)
{Validation is an act. Types are not acts. Therefore, types are never validation. However, validation uses types to determine what is or isn't valid.}
It's not "generating" (observable) types either.
{So?}
So ALL validation is filtering "types" because validation "identifies -- or at least refers to -- a set of possible values" (your side's words).
No, I was referring to <cfargument ...> in particular. In <cfargument ...>, the "type=???" attribute references a type which is both a set of values -- e.g., int, which is a set of integers -- and operations on those values such as +, -, etc. Not all validation meets those criteria. In a hypothetical <cfargument ...> with a "regex=???" attribute, it would only be referencing a type if the specified regular expression identified a set of values that had a set of operations associated specifically with it.
How exactly is this "set of operations" tied to it? There is no observable "hard" link to any operations. Any association to operations is by convention only (which is a common feature of dynamic/scripty languages). This convention is "in the head". The language and interpreter does not care whether you follow conventions or not. (You just have more ready-made tools if you do.) Further, CF does not support a "native" Integer type (lumping them all under numbers instead). If cfArgument had a regex filter built in and we supplied a filter to test for integer-ness, would it qualify as a "type reference"? In other words, does following native "type" conventions make a difference, and what is the formal type-world rule that applies this native-ness test?
A set of operations is tied to a definition of a set of values by a dependency. In some programming languages, it's explicit. E.g., a function declaration like "int fn(int a)" explicitly says that fn is dependent on the set of values defined by "int" and 'fn' can't exist without 'int'. In other languages, it's implicit. For example, in most dynamic languages "/" is dependent on numeric values, such that "/"s operands must be numeric or it throws an exception. (There's a further restriction that "/"s second operand must be a non-zero numeric value.) It would not be unreasonable for there to be a language where a type's set of values could be defined by a regular expression, and a set of operations could be defined where operands must match that regular expression, thus creating a dependency.
Yes yes, I understand the general notion of such. The issue is applying it to specific languages or model. Like I said, in dynamic languages there is no clear-cut measurable dependency: it's generally convention-based. I want a clear-cut model, not one based on conventions of human thought and human activity. Specifically, the cfArgument statement does not "care" what one does with the parameters. It does not add any new and observable restrictions to the value/variables passed, other than allow or not allow, which is a general property of validation and not specific to "types". Thus, it's best to leave all that OUT of the model. Again, I am modeling I/O, not human notions, traditions, and conventions. By eliminating such, I simplify the model and avoid LaynesLaw messes. If you can work all that into your model withOUT making a mess, be my guest. So far I only see a mess tied to messy English and historical feelings/notions. -t
The dependency in all languages is quite clear: If you remove "int" from a language with ManifestTyping (mainly Category S), you'll have to remove all operators that make reference to "int", because their definitions depend on "int". If you remove numeric types from a language with dynamic typing (e.g., Category D1 and D2), you'll have to remove all operators that expect numeric values or they'll throw errors.
What do you mean exactly by "remove"? And how would one empirically test this conjecture? Note that I am not concerning myself with Category S for now. We have enough on our plate already.
You can observe this in a language by creating a user-defined type definition, then create some operators dependent on the user-defined type, then see what happens when you remove the type definition.
What exactly is a "type definition" in a dynamic language?
In most popular dynamic languages, new types are defined -- depending on the language -- either explicitly using OO classes or implicitly via prototypes.
I'm not sure whether your Dx classification is per language or per operator, as discussed below. How about a demonstration of this "removing"? I'm not following you.
The classification is per language. That's why the categories are labelled "Language Category S", and so on.
As for "removing", create an arbitrary class x {...} with operators referencing x. Now remove, i.e. delete, "class x {...}". What happens?
And you just go through stating there is "no type reference" (as I interpret it), and then you say there is at the end.
There is a type reference, as per my point above and in terms of the semantics of <cfargument ...>. That does not mean, however, that <cfargument ...> associates a type with a value in any way that's visible to the user-programmer, other than returning true or false for checking for a value matching a given type.
So it's "in the head" such that we don't need it in our prediction model to get predictions. Good! Settled.
{That's not what was said.}
Then I don't fucking know what the hell was said! I liked the part in BatMan? where The Riddler was caught and jailed because I hated fucking riddlers. The Joker was more visual.
{What he said is very straightforward. There is a type reference. That type reference is not associated with a value. Its only visible effect is to determine which values are valid and which are invalid. Doesn't get much simpler. No mention of heads at all.}
Yes, it's so simple that just about EVERY conditional falls under it. Lovely!
Sorry, I don't know what you mean by that. The reference was to <cfargument ...>. It determines which values are valid or not according to the specified type definition, presumably so that subsequent type-specific operations like +, -, etc., will occur without failure or undesired behaviour. No type reference is associated with the tested value (at least, not that the user-programmer can see; again, I prefer to avoid discussion of common optimisations in such languages) after the <cfargument ...> invocation completes.
"Presumably"? By who or what? I want to model languages, not head plans. See above with regard to "related operations".
By the programmer using <cfargument ...>. I presume the programmer is using <cfargument ...> to ensure that the argument meets a given type so that it can't happen that subsequent operations dependent on that type fail because they don't receive a value of that type. But that was, admittedly, extraneous commentary. I probably should have written: "Sorry, I don't know what you mean by that. The reference was to <cfargument ...>. It determines which values are valid or not according to the specified type. No type reference is associated with the tested value after the <cfargument ...> invocation completes."
How is this process different from any other "kind" of validation? When does such validation involve a "type reference" and when does it not, and how do we tell? I am not getting a clear answer on this. Again, you guys appear to mistake your head models for objective/observable reality. Like I pointed out elsewhere, dynamic languages generally rely on convention for consistency of type usage, not language-enforced connections to a "catalog of types". (If it was language enforced, the process would resemble the usage of a tag, NOT parsing, putting you back to square one. The language would need to have values/variables that carry around an explicit pointer or ID of the type it belongs to. Granted, this may not be the only implementation/model of such behavior, but it's the most strait-forward way to normal humans.)
Informally, validation involves a "type reference" when it identifies a set of values upon which operators are dependent, i.e., said operators will fail if they do not receive operands consisting strictly of a member of that set of values.
Let's skip "informally". I already know the "notion" version of "types".
Would you prefer a formal definition? I'm not clear what you're objecting to, here?
Arrrrrrg.
It strikes me that you're presupposing that your "tag model" must be reality, and are trying to force-fit the data -- your observations about languages -- to that model. Claims that "there should be a fairly clear-cut byte associated with variables and variable-like things" simply doesn't match language implementation reality, unless by "clear-cut byte" you actually mean "type reference". BTW, what's a "variable-like thing"? Aren't there more familiar terms you can use instead of something as vague as a "variable-like thing"? Do you mean a class instance in typical OO languages? A mutable instance can perhaps be considered -- as a whole -- a variable-like thing, though it's really (typically) a composition of 1 or more variables.
I don't claim the tag model "is" reality, I only claim it predicts reality (output). As explained below, matching actual implementation may add artifacts that make a model complex or confusing due to a focus on "machine" issues. A roughly comparable analogy would be using a Model T to explain to students how gasoline car engines IN GENERAL work. The Model T is neither efficient nor fast, but it's a simpler model of a "gasoline car engine" than a 2013 Mercedes. Now if we wanted a model of a fast car...
What's at the top of this page predicts reality using reality (sans optimisations, as I've often mentioned.) If you believe the description at the top of this page adds "artifacts" that make it "complex or confusing due to a focus on 'machine' issues", please point out where. Regarding your Model T analogy, I'm all for using a simplified reality to illustrate behaviour. What's presented at the top of this page is a description of just such a simplified reality. However, your "tag model" is a distorted reality, like using a Model T engine to illustrate how gasoline car engines work, but replacing the carburettor with a fuel-soaked sponge that you call a "tag" and then claiming that engines use "tags".
Your description is vague. Exactly how is the tag model "wrong" in terms of predicting output from input? Remember, I'm not giving a shit about spoken language anymore. I don't care about labels or categories other than getting the model to run. I'm tired of Jabberwocky word play.
Your "tag model" might predict output from input, though I don't know -- the "tag model" has not been clearly articulated, nor has "tag" been clearly defined -- so it's impossible to tell. However, what's fundamentally wrong is that you're introducing an unnecessary, non-existent and novel construct -- a "tag" -- that requires extensive definition, explanation and illustration, when it appears that something familiar to every programmer can trivially replace it. It seems you can replace "tag" in your model with "integer, float, boolean, double, date, etc.", and nothing changes, except now we're using familiar concepts and language. So why use "tag"?
You are projecting faults of YOUR model onto the tag model. And I don't have to define "tag"; it's a model, not a definition. I'm outta the fucking definition business this year.
Is your rudeness necessary? And what faults of "my model" am I projecting onto your "tag model"?
Vagueness.
If you were to clearly articulate your "tag model" and clearly define "tag", then I would be unable to accuse your "tag model" of being vague, right?
The model is fairly clearly illustrated in TypeTagDifferenceDiscussion. A variable "data structure" is defined, and examples are stepped through and explained in terms of how they change or don't change the data structure, almost like running a debugger where we can see the guts of a variable as we click through each step one at a time. If you find vagueness there, please point it out.
I find it unclear how values are handled, independently of variables. E.g., how does the "tag model" account for 123 + "123" = 246 in one language, "123123" in another, and throw an error in a third? I note values are frequently used in the examples, but appear to not take part in the model. I find it unclear what the intended relationship is between "tag" and "type". It appears to me that you could replace "tag" with "type" and instantly gain familiarity and the model would work exactly the same way, but attributes like "tag='number'" that leave me scratching my head -- at least, as a potential "user" of this model in terms of explaining it to, say, students -- could become "type='number'", which makes sense. Etc.
As far as definitions, I'm giving up on definitions for now as LaynesLaw keeps kicking such attempts in the gonads. Nobody has produced a clear and usable definition of "types" either. I'm just focusing on prediction modeling at this point.
You don't have to define anything, but it's going to be an awkward model to use if it has entirely undefined components. How will the newbie programmers -- whom it's designed to help, I presume -- react to the fact that a "tag" is, like, just this thing, like, that you can't see? But it must be there, because it effects things. And so on.
If there is "no reality to types", what does the "int" mean in "int x;" in C, C++, C# or Java? What's the difference between 123 and "zbq" in C#? Or between 123 and 123.0? What's the difference between 123 and "zbq" in Perl? Why does '123 + "zbq"' fail in some languages and succeed in others? Why does '123 + 123.0' fail in some languages and succeed in others? What does the TYPE keyword do in TutorialDee? In C#, with a method defined as 'myProc(int x)' and another as 'myProc(string x)', which one is invoked for myProc(2) and why? And so on. For something with no reality, it certainly seems to have an impact on the answers to these questions. To correctly answer these questions without undue circumlocution, any programmer will either have to mention "type" or be forced to use some PrivateLanguage.
JS and Php vars probably have an explicit tag byte or two while running in RAM, while CF and Perl don't (at least for scalar aspects). Parsing will not attach such tag bytes to variables in those languages. That can be objectively observed. (Granted, one could make an interpreter that temporarily added such tag to the variable to indicate the results of the parse, but the programmer cannot sample that tag such that it is not objectively observable to the programmer, and is thus swappable with other models that produce the same result.)
Now you're talking about implementation, rather than a model. If you're talking about implementation, you should talk about reality. In reality, your "explicit tag byte or two" might indeed be a value associated with a variable in some languages, but that's not how it works in Javascript and PHP. In PHP and Javascript, variables have no type references. Type references are associated with values. In some dynamically-typed languages, a value is a tuple consisting of a reference to a region of memory that represents the value and a reference to a region of memory that represents its type. The type reference could -- depending on the language -- be an integer 'n' that refers to the 'n'th type definition in an array, a string naming the type, or a pointer to code that represents the type definition. The value = {value_representation, type_reference} tuple could -- depending on the language -- be implemented as a contiguous region of memory, or the value_representation and type_reference could be found as the topmost items on two unrelated stacks, or it could be something else entirely. There are myriad ways to implement TypeSystems, and myriad ways to implement the same type system. Therefore, the only accurate thing you can say about (for example) PHP is that variables don't have types but values do, i.e., Variable -/-> Type, Value ---> Type.
I've explained why it's not synonymous multiple times, particularly your treatment of "parsed-out-types". I don't understand why it's not sinking in.
Parsing has a fundamental role in programming language TypeSystems. A literal value appears in source code, or user input at run-time, as a sequence of ASCII, EBCDIC, Unicode, or other characters. Therefore, the following process must occur in every language:
Given a literal value represented as a sequence of characters, for each type defined by the language, pass the sequence of characters to an operator whose purpose it is to return 'true' if the sequence of characters represents a literal of that type. For example, imagine a hypothetical language with three types: Integer, Float, and String. Assume each type defines its own boolean operator called isValidLiteral(c), where c is a sequence of characters. Given the literal character sequence "123", we invoke Integer's isValidLiteral(c). It will parse the sequence into '1', '2' and '3' and return true, because all three characters are digits and therefore it is an integer. However, if the sequence is "Dave", Integer's isValidLiteral(c) will stop parsing the sequence at 'D' and return false, because if the first character isn't a digit or '+' or '-', it clearly isn't an integer. So we move on to Float and its isValidLiteral(c) also returns false because "Dave" doesn't represent a floating-point value. Finally, the String type's isValidLiteral(c) will return true because the sequence is a valid String. (In fact, its isValidLiteral(c) is almost certainly defined as "return true", so we don't even need to invoke it when we get to String.)
A notable aspect is when this process occurs for literals found in the source code:
In Category S and D1 languages, it occurs prior to executing any code, as part of LexicalAnalysis. For types like Integer, when an integer literal is found the sequence of characters will usually be converted into an equivalent -- typically more compact and more efficient to manipulate -- binary representation. At run-time, only the binary representation is kept and used. In Category S languages, type references -- determined by which type a literal value belongs to -- are associated with variables and values, and these references are often used heavily prior to executing code. In some of these languages, such as C, no type references are kept at run-time. In Category D1 languages, type references are associated only with values -- variables don't have them -- and they are used both prior to and at run-time.
D1 languages may also parse at run-time and/or change their "tag" at run-time. They tend to use both an explicit type tag AND parsing, depending on circumstances. (Different languages use them at different times such that the pattern is not necessarily consistent across D1 languages. Empirical experiments would need to be done to tease out which is used when.)
7 and "foo" are parsed and associated with types during LexicalAnalysis. This only confirms that in D1 languages, variables do not have a type; every value has a type; and variables may be assigned any value of any type at any time. There's no evidence of parsing at run-time in this example. If there is a language that allows x=readline();print(typeName(x)); and it can print something other than String -- such as Integer, Boolean or Float -- then readline() is doing run-time parsing and type determination. However, if there are D1 languages that support this, the description of D1 languages (see top of this page) is still entirely accurate.
It's an example of changing tags at run-time, not of parsing.
It was never in dispute that D1 languages' variables may be assigned values of any type at any time. You claimed that "D1 languages may also parse at run-time ..." Can you give an example of this?
I don't have such for reasons that I entered here, but got wiped away for some odd reason. Somebody's editing in funny ways; it keeps happening.
I've seen the whole history of page edits on this page. Nothing got wiped away, unless you wiped it yourself. Why can't you give an example of "D1 languages [that] may also parse at run-time", given that you made the claim?
I hit the Save button and it acted liked it saved it, and displayed Ward's signature. Anyhow, I don't remember the language name. I cannot answer that question at this time. I think it was a proprietary scripting language purchased for the shop's Prime minicomputer.
Your claim that "D1 languages may also parse at run-time" appeared to imply that it was a common characteristic. Now it appears to be very rare. If it's very rare, then perhaps your claim (and this subsequent threadlet) can be deleted, as it's covered under the statement -- right at the top of this page -- that "individual languages may belong to more than one category depending on particular language features"?
Php's is_numeric() appears to parse strings to test for number-ness. I may be able to present more examples later.
That's fine, but I still think the "individual languages may belong to more than one category depending on particular language features" statement is sufficient to cover such cases.
Such flip-flopping creates a messy model. See PageAnchor Heisenberg01.
In Category D2 languages, it normally occurs at run-time inside certain invoked operators, but also can occur outside (see "[a] point of contention...", below). All values, literal or calculated, are represented as sequences of characters. Each operator invocation that expects arguments consisting of values of types like Integer are (effectively -- there are optimisations that are not relevant here) responsible for invoking the above process on the relevant arguments. If the operator returns values, these are converted back to sequences of characters from whatever internal representation might have been used. Normally, this is invisible to the user-programmer. From the user-programmer's point of view, values are always sequences of characters.
You'll note I've used the phrase "sequence(s) of characters" several times. There's a shorthand way of describing a "sequence of characters" -- we normally call it a string. Thus, where I wrote (above) that in Category D2 languages, "values ... are represented as sequences of characters", I could also write "values are represented as strings" or "values are always strings" or even "values are always of type 'string'". They all mean the same thing: Strings are used to represent values of every scalar type. So, you should now be able to see how there is no confusion or contradiction in D2 languages between a value being both a string and an integer, because the sequence of characters -- i.e., string -- is all digits and thus represents a literal integer.
A point of contention appears to be the role of operators like <cfargument type=...> in D2 languages like ColdFusion. It should now be clear that <cfargument type=...> simply allows the user-programmer to explicitly invoke the process described above. Whether the process is invoked explicitly by <cfargument type...>, or implicitly inside a numeric operator like '*' or '/', it is the same.
But it's still different in behavior/output than explicit typing. It's still parse-based "typing" or validation or whatever, and it's not the same animal as tag-based processing even if we call it "types", "sloogs", "maffs", or "goatbutts". Whether they both "should" be under the umbrella of the label "types" is secondary to correctly and distinctly modelling the proper outcome, which to me is the primary goal here. Duty to vocabulary habit may have to take a back seat.
Do you mean it's ManifestTyping as opposed to ImplicitTyping? I.e., "int x;" is different from parsing? Whether the type is inferred once during LexicalAnalysis (Category S or D1), inferred possibly more than once (Category D2) at run-time, or explicitly specified (Category S, mainly), the end result is the same.
No.
To which statement and/or question are you responding with "no"?
Both terms are vague. I don't want to get into that vocab issue at this time.
I believe you are making the mortal sin of mistaking your head notions for objectively reality. You keep bringing up your type "reality" as if we can go walk on top of it in the back yard.
Everything above is objective reality, as it describes what actually happens when real code is executed in real programming languages.
If we go with interpreter modelling, as I've pointed out repeatedly, the internal type tag is not equivalent in the interpreter RAM patterns to a temporary variable holding a parsing Pass or Fail result. (We could force them to be equivalent, but it makes unnecessary steps.)
I'm afraid I don't understand this. What is "interpreter modelling"? What is an "internal type tag"?
Abstract Interpreter Instead of Verbal
It seems interpreter modeling may be the better way to go. Verbal descriptions are just not working, at least they have been ambiguous and seemingly contradicting to me. Interpreter modeling hopefully will avoid the pitfalls of language-centric approaches. It doesn't have to be a real interpreter, only produce the right results (match language's actual output). Thus, we don't have to consider efficiency. We can also perhaps avoid bit-level details in some cases, but something tells me at least for the value portion, we may have to be explicit about that.
Again, what is "interpreter modeling", and how does it help? If I infer correctly, it sounds like you're intending to use source code to... Understand source code?
"Source code" is like C or Perl. An interpreter model would resemble the machine language (interpreter) that runs the source code. For example, we'd have an explicit representation of variables and their content. We don't have to use fuzzy notion-esque English because we are talking about bytes and following the system's examination and alteration of the bytes step by step like a CPU executing machine code. We may examine snippets for parameter passing and validation, for example. We may use some medium-level abstractions to avoid outright machine language for many parts to keep the example simple, unless those become contentious, in which case we use StepwiseRefinement, down the the bit level if necessary until every bit and every step of contention sections are fully defined to both party's satisfaction (at least in terms of clarity).
So, effectively you're saying we'd find out how a language is actually implemented (so that the emulator can "resemble the machine language ... that runs the source code") and then emulate it (to give "an explicit representation of variables and their content") and see what the emulator does? Couldn't we short-cut the process by (a) stopping after we understand how the language is implemented, since we apparently have to do that in order to build an accurate emulator, or (b) read the language's reference manual to find out how its TypeSystem works, or if we still have questions, email one of the developers? I think (b) might be the most productive for least effort. Both (a) and (b) seem more productive and less effort than your approach.
If you can find a good manual, be my guest. Usually they use the same kind of fuzzy notions you do, perhaps because they all reference/copy the same popular-but-fuzzy authors. The best manuals give examples, but examples are not a model.
Good manuals simply describe how their TypeChecking works, without artifice. If you find that confusing, either ask one of the language implementers or ask someone who knows the language better. I can't imagine that would be more difficult than (apparently) implementing emulators and whatnot. Anyway, how do you plan to implement a language emulator if you don't know how the language works without having a language emulator?
They use a combo of tags and parsing, but use long-winded round-about language to describe it. They must have gone to the same College of Verbal Bloat that you did.
In other words, they describe it in terms of types and values and variables, and don't mention "tags and parsing", do they?
Nope, they don't. Again, I value model simplicity over fitting historical conventions for the purposes of use already described.
Why do you think they don't?
They just paste in slight variations of the same convoluted crap they heard from their professors or from another book. Sometimes it takes assholes like me to wake people up from the status-quo nap.
Do you genuinely believe they "just paste ... crap they heard", or do you think it's possible that they understand language behaviour in terms of types, values and variables?
They might understand it that way, I don't know. Whether they are just regurgitating or "think that way" is hard to test.
How well do you think your attempts to "wake people up from the status-quo nap" are working?
The OOP hype eventually died down, and I believe my pressure helped increase the speed of the fall. However, that's difficult to prove in an objective way. On many web forums that debated OOP, my site and usenet postings often showed up on their own such that it appears my writings did bounce around in the heads of those curious about OOP's benefits. I was even quoted and cited in an Oreilly book.
Do you think being an "asshole" is the best way to "wake people up"?
It has a hit-and-miss history. But "asshole" is relative. Those who are invested in the status-quo generally view attacks on their stability as hostile.
The following can often be intermingled with the concept of "type":
Explicit types (AKA "tags")
Parse-based type assessments or value validation
Validation
However, these three tend to act somewhat different as far as program or language behavior, and if we want to model/predict program behavior accurately and clearly, it's best to make a clear distinction between them.
You try to squeeze them all under the same umbrella: "types", and that creates confusion and unnecessary "parts" of explanations. It's best to separate them in the model rather than try to make them resemble variations on "types". -t
As shown in Language Category D2, they are trivially covered under the same umbrella, accurately, without confusion, and without any "model" needed. The actual language semantics are sufficient.
No, you added an unnecessary type hierarchy.
I've "added" nothing that the language designer, language implementer, and language user wouldn't recognise as already there.
It's not needed.
I'm explaining what the language does; if you feel there's an aspect that's not needed then you need to speak to the language designer. Who owns ColdFusion these days? You need to complain to them.
The tag model explains it just fine, and without the need for a "type hierarchy".
Actually, on DefinitionOfTypeTag there was some discussion over whether your "tag model" adequately addressed <cfargument ...> if you included it, but you could exclude it without much harm though your model would be incomplete. By the way, what is this "hierarchy" you refer to? There is no type hierarchy, and Language Category D2 doesn't imply one. In the <cfargument ...> section, it only says that the string type may represent sequences of characters that represent literals in actual types. That's not a hierarchy.
"the value's most specific type" implies a hierarchy, otherwise you wouldn't need the phrase "most specific".
I've taken out the phrase "most specific", as it is was not necessary.
An improvement, but there's still a problem. You state, "Every value has the same type, typically a string of characters" and THEN go on to describe how parsing can result in different types such as "integer, number, date, dandelion". Which is it, always the same, sometimes the same, the same on odd Tuesdays except during eclipses? (Okay, I made up the dandelion part.) -t
It's precisely correct. In D2, every value has the same type, typically a string, i.e., a sequence of characters. By parsing these characters, we can determine if a given sequence of characters actually represents a different type. For example, "123" is a sequence of characters that represents an integer. "12/12/13" is a sequence of characters that represents a date. And so on.
I'm sorry, but I find it contradictory and confusing. I've read it 4 times and it's still that way. See "Boldy" below. If you find it clear and settled, I don't know what to say. It appears to me that you are simply sloppy with language. Either that, I'm missing something; and it's possible somebody else may be making the same alleged mistake. Thus, an alternative being available might help.
See above, starting with "Parsing has a fundamental role in programming language TypeSystems." Hopefully, that makes it clearer.
Both your approach and my approach bring up "parsing" such that neither can claim they've folded away that issue into another existing part. You just convert it into a middle-man type that I simply don't need.
Parsing is a fundamental feature of most TypeSystems, in order to identify character sequences that represent literal values belonging to various types. In many languages, parsing also forms part of type-based operators that turn character sequences into alternative value representations in order to facilitate performing operations on those values. For example, in most languages a character sequence 1236 in the source code which represents the integer literal 1236, will at some point be parsed and converted to a 32-bit binary integer because it's more efficient to perform arithmetic operations on binary integer values than on strings of ASCII or Unicode characters. Said parsing is obviously related to types.
Yes, but its behavior is a bit different than tags (explicit typing). By calling one "tags" and the other "parsing", I avoid any cross-confusion. It doesn't matter if "parsing is obviously related to types", it's different enough that the model should clearly make it something different. Again, forecasting accuracy is valued above fitting historical vocabulary in the model. The model's purpose is to differentiate type-like behavior, not encourage umbrellatizing type-like things to mostly just satisfy historical vocabulary habits.
What is needed here is not a more complex model with imaginary elements, or some artificial distinction between parsed types and un-parsed types (I guess...???), but a greater attempt at understanding. The above is a clear and accurate description of actual implementations. For the sake of not complicating or obfuscating reality, it makes sense to endeavour to understand how it is.
Imaginary elements? Such as "associated with" ghosts? Again, there is no reality: ProgrammingIsInTheMind. Our head-model can be any fucking thing we like (or hate). I just want to find one that is not confusing and has unambiguous predictive capabilities, at least for certain aspects of languages.
Is your rudeness necessary? By "imaginary elements" I mean "tags", which are not real from a user-programmers point of view. Types, values, and variables have syntactic elements that relate to them, i.e., they are explicit in the language and they "do" things in the language and in the machine. There is nothing in the syntax (or the language reference manual, that I know of) of any popular imperative programming language that refers to "tags".
There's no universal law of the universe that says explicit types and "parsed" types should be forced to look like the exact same thing. There may be historical habit, but I have no qualms about kicking tradition in the ass if it gets in the way of progress or a specific analysis. Sacred Cows, be warned: I have a hot grill and ketchup.
I think my explanation above, regarding the role of parsing in TypeSystems, should address that.
We are going around in circles. Seems time to give this one a rest. I'll go enjoy a hamburger or something...
Boldy
I have copied D2 above and emphasized what I see as the problem phrases and contradiction (-t):
Variables do not have, or are not associated with, a type.
Every value has the SAME type, typically a string of characters.
Variables may be assigned any value at any time. Upon invocation, operators perform parsing as needed to determine the value's type, i.e., whether it is a string representing an integer, number, date, etc. Sometimes, the parsing mechanisms are explicitly available to the programmer, such as <cfargument type= ...> in ColdFusionLanguage which can be used to reject operator invocations if arguments do not match their corresponding parameter's specified type. ...
[end copy]
To me it's:
1. Every variable only has one type.
2. And every variable can be different types, such as integer, number, date, etc.
That's like:
1. Our zoo only has apes
2. But our apes can be lions, tigers, and bears!
It's more like:
1. Our zoo cages can hold any animal.
2. An animal can be a lion, tiger, bear, weasel, ocelot, olinguito, etc.
But that's not how it actually reads. Why have the word "same" if that's what you meant?
I have tweaked my D2 description to hopefully make it clearer.
Thank you. It's gradually improving. But what is an "argument value"?
An argument is what you pass to a parameter. Sloppily, argument and parameter are sometimes used interchangeably, but it's more accurate to say that a function defines parameters and arguments are what are passed to the parameters when the function is invoked. E.g., function zot(p, q) {return p + q} defines parameters p and q. zot(3, 4) calls zot with arguments 3 and 4.
That's perhaps too much detail or to specific for the early part of D2.
Given that it's where it's determined that a value represents an integer, float, etc., it seems somewhat important.
But a bigger issue is below where you say values do not have a type, "Variable -/-> Type, Value -/-> Type" but then say that arguments have "types". So are arguments not variables? Why mention arguments? Again, I'm focusing on what is examinable. We (programmer) cannot examine that part of a program, only the results of parameter passing: variables. By saying "type stuff" happens during parameter processing is unnecessary to our prediction model. Another way to say this is that I can implement the same thing without "arguments" that "have types". In the tag model, it's "whatever happens in between only has the value to look at since there are no other parts to look at (and distract one), nothing else". We don't have to model parameter processing other that state what it's not doing. You are complicating things unnecessarily.
Note that it perhaps may not be necessary to mention strings. We don't know how stuff is stored internally and it may only cloud the issue to model it that way. However, I agree it makes it easier to create a machine-language-like model where the details of processing can be made more explicit in illustrations.
Arguments are values. Parameters are variables.
How does one observe them being values?
In the invocation zot(3, 4), what are the 3 and 4?
Constants. In most dynamic languages, they pretty much act like read-only variables.
And in the invocation zot(1 + 2, 3 + 1), what -- in general -- are the results of 1 + 2, and 3 + 1?
In most dyn langs, (results of) expressions act like read-only variables also.
You're using the term "read-only variable" (no oxymoron there, eh?) and "constants" in essentially a PrivateLanguage way. "Value" is the recognised term for what is passed to parameters, assigned to variables, results from evaluation of expressions, and is represented by literals. A "constant" usually refers to an identified -- typically by name -- value that cannot change, as opposed to a "variable" that can hold different values at different times.
The existing terms are not clear, so I'm forced to invent new ones.
{Using terms that already have a meaning without disclosure of what you mean by them. Yep, that'll clear things up.}
80% clear is better than 30% clear (your stuff). If you find a vague spot in "act like read-only variables", I'd be happy to clarify with further detail.
{Apparently, "clear" is another word in your PrivateLanguage that you haven't told us the meaning of. What makes "act like a read-only variable" unclear is that you've left us to guess what you mean by it.}
A very vague "public language" is not an improvement.
At least a "public language" is well-understood in general, even if you don't personally understand it. There's nothing wrong, in principle, with defining new terminology to address vagueness in popular terminology. Academic papers do this frequently. However, the defining characteristic of such work is that it makes an almost overwhelming effort to be clear about each new term. If you're going to introduce non-standard terms, or non-standard use of standard terms like "constant", or introduce seemingly-contradictory phrases like "read-only variable", then you must go to extensive effort to make sure your terms and phrases are unambiguous and clear. You can't simply assume we know what you mean.
Re: "At least a "public language" is well-understood in general" -- Bull! Being public does not imply "well-understood". Like I said before, "Country Music" is a "public term", but has fuzzy boundaries.
{And yet, the term "Country Music" is well-understood.}
Oh boy. That's a very odd statement. However, strangely fitting: most have a "notion" of what it is, like "types", but there are no clear-cut "rules".
Not odd at all. People choose to listen to songs and radio stations, watch television programmes and/or YouTube videos, and purchase and/or download music on the basis of definitions of "Country Music", so it appears to be well-understood. However, what we're talking about here -- types in programming languages -- is far more clear-cut. Indeed, in the majority of programming languages it's trivial to identify explicit type definitions and explicit type references from the language grammar alone. Most implicit type references can be identified from a trivial understanding of conventional imperative programming language semantics. Border cases, exceptions, and peculiarities can be identified in reference manuals and by talking to developers who use these languages.
I have disagreed with a similar point you or your partner have made before. Explicit declarations are the easy part, but there are various areas where what's "really" going on is not so clear and not so obvious. Programmers tend to use trial and error or "defensive" habits such as "when it doubt, explicitly convert". A better model would be helpful. Confusion and errors related to JavaScript's "+" operator is fairly common, for example. Most answers to such problem resemble, "do such and such to make sure they are strings (or numbers or whatever)". That's wall-papering over a deeper understanding, or at least a better prediction model. It may also generate an understanding of why such operator overloading and tag-based typing suck eggs such that the next generation of dynamic languages do it right and clean.
Ah, so your goal is not only explanation, you also have a political goal of promoting a particular TypeSystem approach and deprecating the others. <sarcasm>That sounds very unbiased and balanced.</sarcasm>. How can we trust your "tag model" to be accurate and not reflect your biases?
If "providing more user-friendly mental tools" is a "political goal", then I am indeed guilty. I want to help "average programmers", while you seem to want to punish them for being average (AynRandDesignPhilosophy). Maybe that makes me a progressive, I don't know.
That seems predicated on a rather bold assumption that a particular approach to TypeSystems is easy for "average programmers" and others are difficult. Do you have any evidence of it?
I don't pretend or assume all heads are alike. WetWare varies. It's nice to have a choice of models.
Eminently reasonable, as long as the models are accurate and do not distort reality.
Meaning they model I/O correctly? Yes.
Are you sure it's "wall-papering over a deeper understanding", and not simply defensive programming? A good example of defensive programming is always using parentheses in arithmetic expressions to force an explicit order of operations, rather than rely on the language's implicit order of operations. Sure, that can result from lack of understanding, but it's also good practice. Furthermore, are you sure your "tag model" better explains Javascript behaviour than the description of actual language behaviour at the top of this page, or a Javascript manual?
Re: "Sure, that can result from lack of understanding, but it's also good practice." -- That's what I used to do, but in some cases it created distracting, repetitive bloat under high quantity of use such that I decided to experiment with languages. As far as your explanations at the top, I still find that problematic for reasons explained in on-going sub-threads.
You find good practice... Distracting?
Defensive programming is not inherently "good practice". It can get bloaty. If we better understand the "type" rules, sometimes we can write clearer code. Contrast:
// "normal"
x = a + b + c + d + e;
// "defensive"
x = conversionStuff(a) + conversionStuff(b) + conversionStuff(c) + conversionStuff(d) + conversionStuff(e);
.
The above is a contrived, extreme example. It is true that defensive programming can result in greater verbosity, but the gains in reliability are often worth it. (I think this is wandering OffTopic...)
I encounter the pattern fairly often. But yes, it is OffTopic. Maybe another day.
However... This has deviated from the original point of this threadlet, which you have not yet addressed: Again, if you're going to introduce non-standard terms like "tag", or non-standard use of standard terms like "constant", or introduce seemingly-contradictory phrases like "read-only variable", then you must make extensive effort to make sure your terms and phrases are unambiguous, clear, and understood. You can't simply assume we know what you mean. On the other hand, we don't need to define terms like variable, value, and type because these are familiar to ComputerScience and SoftwareEngineering -- and are well-defined in those fields -- whether you agree with the definitions, or their clarity, or not. You can assume that if we're using such terms, we're using their conventional meanings. If you're using non-conventional terminology, then you must clearly and extensively define and show what you mean, because until you've done so the only person who will understand your words is you. For that reason, it's usually easier to use conventional words and their conventional meanings, unless there's a very strong reason to do otherwise. If there's a very strong reason to do otherwise, then you must clearly and compellingly show why, otherwise the only person who will appreciate your words is you.
The existing terms are either vague, are unnecessarily complex for modelling certain languages that don't "use" much of it, or require absorbing boatloads of material to make sense of. I'm looking for a tool that "ordinary programmers" can absorb relatively quickly, I'm not trying to write a Nobel thesis.
But don't "ordinary programmers" already have knowledge of types, values and variables? How are you going to explain "tags", given that they are invisible, not found anywhere else in the literature, and (according to you) differ from type references in some subtle -- and, as yet, unexplained -- way?
As I've stated a good many times, such knowledge is often a rough notion, insufficient for explaining the somewhat subtle difference between ID-based (tag) typing and parse-based typing. And most application programmers are not going to read an entire interpreter building book just to be able to predict type-related behavior better. They want something quicker to digest. The ratio of compiler/interpreter writers to all programmers is probably at least 1 to 100. Whether they "should" become interpreter building experts is another debate; in practice they won't. (It may be like expecting cab drivers to become mechanics because knowledge of car engines may help them be slightly better drivers.) --top
Who says programmers need to read an entire interpreter building book? What's described at the top of this page is both sufficient and accurate, relies on no PrivateLanguage or extra terminology, and (in Category D2) fully accounts for the "somewhat subtle difference between ID-based (tag) typing and parse-based typing."
Sorry, but I don't find it a clear description, per issues currently being argued over. It's almost like String Theory in that it has so many parts and vague words and dimensions that it can magically be shaped to fit just about anything observable.
Nobody explains EXACTLY how it applies to existing languages. Maybe it can be done in a clear way, but nobody's done it yet; and certainly to hell you haven't. They've yet to convert it into a mechanical/visual model with clear boundaries to the boxes and clear rules for when the parts come and go and have an effect on each. My model does it well with fewer parts. I'm sure yours can be cleaned up if a smart tech writer got a hold of it, but it still has more parts and odd rules than the tag model. For example, if parsing occurs to make a branch decision, the tag model does NOT have to create any intermediate/temporal "type" object. That's a fact.
Are you sure you're not striking out at an "intermediate/temporal 'type' object" StrawMan here? There's no mention of such a thing in Category S or D1, and Category D2 states: "Operators perform parsing as needed to determine whether each argument value (which is a string of characters) represents an integer, number, date, etc. Sometimes, the parsing mechanisms are explicitly available to the programmer, such as <cfargument type= ...> in ColdFusionLanguage which can be used to reject operator invocations if arguments do not match their corresponding parameter's specified type." I see no mention of an "intermediate/temporal 'type' object" (obviously, implementations' internal optimisations are not considered here!), so I'm not clear what you're referring to.
If you don't need them, then remove them from your model of D2. I just see what appears to be waffling.
Remove what?
{Since every programming language ever implemented did convert "values", "variables", and "types" into a mechanical form with clear boundaries, your claim that it hasn't happened is clearly false. As for "parsing to make branch decisions", you've have indeed stated that no tags are involved. (Unfortunately, you haven't developed your model to the point where we can verify for ourselves that what you say is indeed true, so I'm having some trouble accepting that as a fact.) You also have to have, something that tells the implementation which values branch one way, and which the other. In the long run, you'll end up with more parts, since you are introducing multiple parts where we would use one.}
But you arbitrarily point out some part as being a "type reference". The reader cannot see the rules for what is and what isn't a type reference to verify it's not arbitrary. There's no rigor or consistency in your labeling. Sure, ColdFusion called their attribute "type", but would it make a difference if they called it a "zoonsock"? It shouldn't change anything if they called it "zoonsock" instead.
{They can too see the rules for what is and isn't a type reference. They read the definition of the language, that will tell them exactly what is or isn't a type reference. (And no it wouldn't change anything if they called it "zoonsock" except for the confusion it would cause by not using the established term.)}
Such are usually vague for reasons I've already given: stale tradition. I get far more use out of actual code snippet examples than their descriptive writing (for lang docs that provide them.)
When "stale tradition" is the same as "almost universally understood", it's not a bad thing.
We are back to the "Country Music" argument. I won't repeat it here.
Are you sure you're not projecting your personal difficulty with conventional explanations onto programmers in general?
No, I'm not 100% certain of anything. I can Freud-up an explanation of your behavior also. Like I've said multiple times, most programmers don't seem to give a lot of thought to these issues in my observation: they are relatively subtle and can be papered over via "defensive programming", which you appeared to agree with (D.P.). Another WikiZen even stated "who cares?". I didn't write that post. --top
One can replace a variable with an expression and the expression displays examinable traits just like variables. For example, functions such as typeName() or isDate() return results on the expressions just like they do with variables. We can everything with them that we can do with variables, except change them; thus they ACT LIKE read-only variables. There may be per-language exceptions such that it's not a 100% perfect analogy, but so far nothing is. If you want perfection-or-nothing, then kill yourself now to quicken the inevitable result of following that rule.
The reason you can replace a variable with an expression -- in places where you are allowed to replace a variable with an expression -- is because they're both expressions that evaluate to a value. In a statement like print(p), p is an expression that dereferences variable p, i.e., it retrieves its value.
We can only observe "output". We don't know if we are seeing a "value" and only a "value" in the output, or least you haven't shown how such is verified as being a "value" and only a value. "Value" is a construct in your head with fuzzy rules so far. (Granted, I use "value" in my model also, but I only claim it to be specific to the model, not a universal truth.)
Values, types and variables can be trivially distinguished by experimenting with replacing any of the three with the other in simple programming examples. For example, you can assign to a variable but not to an expression (except in PL/1 :-) Do this, and you'll quickly see that there are three distinct categories -- values, types and variables. (I provide a slightly more extensive example later in this page.) However, it's even more trivial to simply acquire an understanding of conventional imperative programming language semantics, i.e., recognise that values, types and variables -- plus flow control, I/O, and (maybe) operators -- are the fundamental building blocks of conventional imperative programming languages. They aren't part of the natural world, so we don't have to arduously gain understanding through observation, hypotheses, testing, and model formation. We can simply look at how they're built. The descriptions at the top of this page are based on how conventional imperative programming languages are built.
{If you limit yourself to only observing output, then you won't be able to tell the difference between most turing-complete languages. (You can do some, because there are languages that have crippled or no IO. Those you can tell from the ones that have complete IO. But I didn't think that was a distinction we are interested in at the moment.) But even if you can only observer output, we are not so limited. We can also observe the source code, the language definition, and often the run-time state of the program.}
I should have said input and output. We give it code and sometimes data as input, and observe that output. TC has nothing to do with it. We are studying languages, not applications. TC/TE tests don't inspect source code; we do.
{"Value" is an abstract concept. However, the rules for what is or isn't a value for each language are very clear cut. The have to be, or the compiler/interpreter wouldn't be able to translate/execute the code. After all, the semantics of the language depend upon those rules.}
Ideally they should only be defined by input and output, not implementation. However, defining by implementation can simplifies things sometimes. For example, it's easier to say the "bit patterns and computations of the following set of operations of and on floating point follow hypothetical standard IEEEE997" or the like. We then have ready made libraries and testing reference implementations. Note that we still may be able to have a different representation, such as strings of digits, as long as the operations produce the same output for the same input as the IEEEE997 standard. It may be a bloody annoying exercise to match it up, including rounding error patterns, but probably doable.
What would we gain by doing this?
Proof of equivalency.
Proof of equivalence of what?
I'll first ask for clarification on what "this" references.
The same thing as the "It" in the last sentence you wrote before I wrote "What would we gain by doing this?".
Never mind, let's table the issue of floating point until we solve the more important stuff.
Granted, it may perhaps (sloppily) be making a distinction between "variable" and "value", but the programmer cannot "see" them separate such that the distinction helps nothing. If it's happening under the hood, then it's not observable and/or replaceable with alternative models, probably better ones, that don't need the distinction.
Programmers do normally see variables and values as separate. Given a statement like 'v = 2 + 3', we know 'v' is a variable, but what do we call the thing that results from evaluating the expression 2 + 3?
Another way of looking at it: What do we put in a variable?
Be we cannot see any "extra" parts of a variable to see how it's the same or different than a value or expression. It doesn't help the model. Plus, why are you not considering expressions as distinct from values as distinct from variables if you want to be thorough? For the most part, expressions, values (constants), and variables usually have same examinable features that variables have in the common dynamic languages under question such that considering them different is not necessary. (A side note may help just to clarify that.)
{To this point we haven't considered expressions, but if you want to include them, I'm game. Expressions are almost always of the same type as the values they evaluate to. (I say almost always even though I'm not aware of any exception. I don't know every language ever devised.) Since we already have example languages on this page where values and variables don't have the same features (see any language in D1), that claim is patently false.}
Maybe C#-style languages that have two "level" of types, but it's disputable whether it's "dynamic" so I'm excluding it from this discussion. I'm only considering "fully" dynamic languages, not hybrids like C#.
{Fine by me. Doesn't change anything though.}
Or perhaps there is a multilevel type capability of some sort, but that's not described, such as how many levels are the limit, what combo's are allowed in the tree, etc.
I hope my explanation above -- see the paragraph starting with "Parsing has a fundamental role in programming language TypeSystems" -- makes the that clear.
I don't dispute that, but "parse-based" typing approaches are different enough from tag-based typing approaches that our model should make such distinction clear. (If we can call parsing part of "types".)
{The only significant difference (for our discussion) between what you are calling "parse-based" (e.g. cfargument) and the others is that we have to explicitly program when the type checks occur.}
No! One looks at the tag, the other only looks at the value (or both in some cases). We can do experiments to determine what is affecting the output. It has nothing to do with explicit coding. (If the lang summarily has no tags, then its model is obviously simpler: any "type" determination is parse-only, i.e. value-only sampling. Itsa no-brainer; no tags to confuse. Hug simplicity, it's your friend, unless you like obfuscation as puzzles.)
{What tag? You've yet to define what you mean by "a tag", and your "experiments" apparently can't differentiate between "tags" and "flawed CSRs". Since I can't even tell if there's a "tag", it's certainly of no use to me. And why should I learn two terms for these things when one term happens to describe both things better than your two terms combined?}
CSR's have only been shown flawed in C, not the languages of our comparisons, and the flaw wouldn't ruin the big picture even if it was.
{They haven't been shown to be flawed in C. You just claim they are so you don't have to say that C has tags. You gave no reason or justification for that claim. In addition, you still haven't answered my question about how I can differentiate between "flawed CSRs" and "tags".}
C has no clear CSR's, so it's a non-real-issue. If we find flaws in the model for a specific language, we can always adjust it for special cases. Explaining how such comes about is going to complicate any model and likely force us to add bit-level issues to the model. There's no known magic abstraction for such yet. Bits is bits.
{Yes, C really is a non-issue here. The problem exemplified by the C example is a general issue though. But, by your response, it appears that the only way to tell if a language has "tags" is to ask you so you can tell us if it's one of your exceptions or has a "flawed CSR". As far as I'm concerned, that makes it unusable.}
{To show it's not a C issue, take any language you consider a "no tag language" where every value is a string. Modify the CSR so that strings that match the regular expression "[0-9]*.[0-9]*" prints with exactly six digits to the right of the decimal point. According to your "experiments", that should indicate that the language uses tags. According to your response to the C example, it's a flawed CSR.}
One can make a stupid language that breaks any model or definition if they set out to. SoftwareGivesUsGodLikePowers. You are trying to make an issue out of a non-issue. Besides, I am focusing on getting a good model right now, NOT on candidate definition #3.
But actually the big picture is still the same: there is the readily examinable parts of variables (or expression results) and parts that are not readily examinable, but create subtle or indirect changes in program behavior. Typically this "hidden" info is used for type indicators, but CAN be used or misused for other things. Good languages have good, simple tags and stupid languages have complicated, multi-part, or fscked-up tags. Any concept can be made unnecessarily complicated or flawed. That's not news.
{So, I need a purely dynamic language that isn't stupid and doesn't have a flawed CSR in order to use your "model". Since I have no way to determine if a language is either stupid or has a flawed CSR, I have no way to determine if a language is even eligible for your "model". I wonder how many other conditions we are going to encounter as we attempt to pin down what you mean?}
Like I said, a bad CSR will fuck up any model because it makes it difficult to examine "value" with any reliability, or at least in a strait-forward way. It complicates testability of ANY known model because there is no clean way to extract the "value" portion. Science is about examining.
Further, if a simple model works for language A but not for language B, that's not necessarily a significant loss. A One-size-fits-all model would likely be a mess anyhow.
It would appear that not only does your "tag model" fail in C, it has the potential to fail in any language with a CSR operator (and how would it be tested in a language without a CSR operator?) because there's no way to distinguish between detecting a "tag" vs detecting a characteristic of the CSR operator. That is a fatal flaw of the "tag model", at least in terms of using CSRs to identify "tags".
{Furthermore, the model you are trying to replace works for languages that aren't purely dynamic, that are "stupid", or with "flawed" CSRs. It also appears to successfully differentiate between the languages your model is "working" for (it's hard to tell for certain since your model is so poorly defined). So why should we replace a simple, working model with a complex model that doesn't work?}
Sure, if you add enough parts to your model, it can probably handle anything. There's no free lunch. The difference is that I only add parts when needed for a given lang, not just to fit historical habit. By throwing in enough parts and rules, both models can be made to work. I don't question that. It's a matter of parsimony of models. If a given language doesn't have the rounding flaw, there's no reason to have "fixes" for such in THAT language's model.
{There may be no free lunches, but there are certainly plenty of overpriced lunches. Your model falls in the category of an overpriced lunch. In order to use your model, we need to know about CSRs, variables, values, expressions, and pure functions. In order to know when your model applies, we need to know about dynamic languages, stupid languages, and flawed CSRs. In order to use our model, we need to know about types, values, variables, and expressions. In order to include things like cfargument, we need to add representations. You need to add both representations and parsing. In order to handle dynamic languages, stupid languages, and flawed CSRs, we need to add nothing. You need to add ??? (but there will have to be something added, since it won't handle them as is). Sure, once you've added enough stuff to your model, you might end up with an obfuscated version of our model, but why should we go there?}
You are confusing the definition attempts with the model. They are different things.
{An explanation of where the confusion lies would be nice.}
No mention of CSR's is necessary to use/run the model, for example.
I thought your model was dependent on CSRs in order to detect "tags". Isn't it?
No, it's not. Back to Tag School for you.
{As far as I can tell, once you remove everything dependent on the CSR, the only thing left is a couple of names. In fact, from what little I've been able to pick up, your model is equivalent to (for those languages covered by it) "a language uses tags if and only if its CSR isn't an injection."}
Prediction models are not about names, they are about results.
{And you have only names once you've removed the CSR.}
I don't know what you are talking about.
{We're talking about what's left of your model once you've removed everything dependent on the CSR from it. To date, all we know about your model is what we can guess from your attempts to define it. In all those attempts, the CSR has had such a prominent role that there isn't anything significant left without it. In addition, all your predictions so far have been of the form, "in tag languages, there will be at least two values with the same CSR but for which pure functions can give different results." Without the CSR, all your predictions (aka results) are gone. Without the CSR, all that's left of your model are the names.}
It appears you don't understand the tag model. I've tried, but so far failed to communicate it.
{Well, try again, then. As a bit of advice, what you will need is a complete list of the parts of your model, a complete explanation of how they can be put together, and a complete list of what properties they have. Any examples included in your explanation should be strictly optional reading.}
That advice applies to both sides. Hopefully the "abstract interpreter/machine" approaches in the works will help.
How will the "abstract interpreter/machine" help? I'm not clear what it is, let alone how it will help. My last comments and questions on "interpreter modeling" -- I assume that's related? -- went unanswered.
I thought I explained that already. Apparently I failed to communicate the value of such. English sucks eggs.
You explained a little of it, but then my subsequent comments and questions on "interpreter modeling" -- I assume that's related? -- went unanswered. I was left, therefore, without much understanding.
{If you're referring to the section titled "Abstract Interpreter Instead of Verbal", then no, that won't help and it's not the fault of English. The first problem is that it's also a language centric approach. The second problem is that every code fragment of the source language will have infinitely many possible translation to your interpreter. It's unlikely that you're going to be able to find some property of the translated code that will accurately reflect the distinctions you are trying to make.}
Yes, it is "language centric", but at least at a finer level. "Variables" will have more explicit representations with clear sub-compartments rather than the fuzzy blobs with semi-permeable walls they've been resembling. I'm not sure what you mean with the "infinitely possible" part. Please elaborate. I don't know that such will help with communication, but it can't be worse than the English-centric approach tried so far. We need "boxes" and "slots" with clear rules about what goes in, when it goes in, when it goes out, etc. None of this "has a relationship with" shit.
See the diagrams at the top of the page, with precisely the "boxes" (to the extent that one line of ASCII can represent a box) you want and clear rules about what goes in. The rule is that where it says "Variable", you can only have a variable. Where it says "Value", you can only have a Value. And so on.
But the association seems to correspond with something only in your head. You haven't shown how it clearly maps to anything objectively observable. We cannot take variables apart to see if they are indeed composed of "values", we can only observe certain output. This kind of reminds me of particle physics: we can only observe a limited set of behavior, by cannot directly "see" sub-atomic structures. We cannot crack a variable open and observe it under the microscope. (Except maybe by examining RAM, but that has other pitfalls, per above.)
It's not just in my head; it corresponds with how conventional imperative programming languages are implemented. We don't need to "take variables apart" because we can take languages apart (especially the OpenSource implementations) and read their manuals, and ask their developers how they work.
It's possible to implement cfArgument in an interpreter without an explicit set of bytes that directly represent a "type". However, you appear to be loose with language and call the parts used for validation "types". It's arbitrary labeling: "this is type stuff because I say it's type stuff". At least the under-the hood mechanism used for "explicit" types is probably different under the hood than parse-based "typing". Whether the second "is" "types" or not is a language (labeling) game dance. Explicit typing will probably be represented as a byte sequence kept with a data structure that represents a given variable. In RAM we'd probably see that this "type byte" set is usually the same or similar distance from the value bytes (for languages with tags, which CF is not) because the structure representing variables is fairly uniform. With the implementation of cfArgument and other parse-based "typing", where the "typeness" bytes are kept would probably be unpredictable (if they even exist, being that I question your labeling), or at least different than the explicit typing. They'd "look different" in RAM. The cfArgument statement doesn't add anything to the in-coming parameter variable that resembles a tag, and thus we don't need anything lasting in RAM to represent that it "passed" the validation stage. If a tag-like set of bytes is used internally under the hood temporarily, we cannot observe it. I don't see how the existence of such is even required to mirror the language's behavior, but you call stuff "types" or "type-related" for reasons unknown to me. Your labeling does not appear to be based on objective criteria when examining all possible working implementations at the RAM level.
In short, the actual implementations of explicit-typing and parse-based typing probably have a different pattern in ACTUAL RAM (our "reality" for sake of this section). Whether one or the other or both "are types" is secondary to the fact that they are different in RAM and per results (I/O) such that our models should make a clear distinction for illustrative reasons at least. I don't really care what we call the distinction as long as the model works. Fitting historical vocabulary is of secondary importance.
Speaking as a language implementer, the mechanism that parses a sequence of characters and identifies whether it belongs to a given type or not is precisely the same -- i.e., it's the same routine -- whether it's explicitly invoked in "cfArgument" or implicitly invoked by the language parser during LexicalAnalysis to identify the types of literals. I note in your paragraph above that you're speculating on how languages might work. I implement them; I know how they actually work. They work (essentially -- I've left out optimisations) how I've described at the top of the page.
Why would it NECESSARILY be the same? I can see how one could potentially use a tag to TEMPORARILY store the parsed "type" (it's not kept for the user/output to see), but that's not the only way to do it. Just because YOU implement it that way does not mean it's the best or only way to implement or model it. Plus, in tag-free languages we don't need a formal value/variable tag. It's extra clutter. We can live perfectly fine with a stand-alone Boolean flag. Pseudo-code:
func internalValidation(param, validateType) { // type-name comes from programmer's XML
useRegx = symbolLookup(validateType); // get corresponding regex expressn
// Note: no error-handling for lookup fail here because XML parser already checked it
return isMatch(useRegx, param);// return pass (True) or fail (False) based on regex parse
}
. [dot is formatting bug work-around]
In a given implementation, it's not NECESSARILY the same routine, but doesn't it seem a rather ungainly violation of OnceAndOnlyOnce to have two routines that do precisely the same thing? However, that's an implementation issue. Conceptually, from a model point of view, parsing to identify the type of a character sequence during LexicalAnalysis is the same as parsing to identify the type of a character sequence at run-time. Note that the descriptions at the top of the page explicitly avoid mention of run-time vs compilation or LexicalAnalysis, which leaves us with parsing to identify the type of a character sequence, period.
Some type-ish operations parse and some don't because they only look at the tag (in tag-possessing dynamic langs). Thus, it's not a violation of OnceAndOnlyOnce because variable/value tag inspection does not need regex's. (Plus, the interpreter is more efficient if it can look at a type-ID byte(s) rather than parse each time it has to "ask" about types. For example, an isNumber() function can check the tag (type ID) and if that ID encodes say "integer", then it doesn't have to parse to see if the value can be coerced into numeric for the expression because it already is a number.)
I thought we were talking about parsing in particular, here. Why are you now bringing tags into it? A common optimisation (which I've studiously been avoiding mentioning, as optimisations only complicate things from a "model" point of view) in Category D2 languages is to associate a type with a value when the value is successfully parsed as a specified type. So, for example (though I don't know if ColdFusion in particular does this), if <cfargument type="float" ...> returns 'true', it immediately internally converts the value to a binary float under the presumption that subsequent operators will probably operate on the value as a float. That way, operators that expect the value to be a float don't need to perform parsing and conversion. However, these internal optimisations are invisible to the user-programmer. Is that what you're referring to? If so, it's an implementation issue. It has nothing to do with the fact that in Category D2 languages, it always appears that values are always passed around as strings, and it always appears that they are always converted to binary representations, as needed, inside operators. However, in the internal implementation reality, a given value represented as a sequence of characters may only be converted to a binary representation once -- which occurs either during LexicalAnalysis or explicitly when something like <cfargument ...> is invoked, but the end result is identical either way.
In some cases we cannot tell the difference and some cases we can. Those cases where we cannot tell the difference can be modeled either with or without one or the other, and this gives us freedom to simplify the model because the model can choose to "pretend" it uses one or the other for the cases where we cannot observe the difference. As a reminder, languages are or should be defined based on observable traits, not implementation.
That was my point, hence I wrote "the end result is identical either way."
Types, values, variables and the relationships between them are used to implementTypeSystem behaviour in popular imperative programming languages, so by definition our "model" can handle anything because it's not just a model -- it's how it's actually done. How will you determine whether a given language has a "flawed CSR" or not, and how will you distinguish that from "tag" detection?
The "Type system" under the hood for Perl is probably going to be much simpler than that of C-sharp's. Implementation is the same issue: a language with a complex type system is going to have a more complex interpreter than one without. Do you dispute this? Reality of interpreter implementation backs me on this also. (Or at least the variables' data structure will be more complex. In some cases tag-free languages will push some of the complexity into processing instead of the variable structure in a bit of WaterbedTheory.)
It depends on how you measure "simpler", but speaking as a language implementer, they're not appreciably different. One imperative, conventionally object-oriented language is pretty much like another. Static TypeChecking doesn't add much technical complexity over dynamic TypeChecking, and indeed it eliminates some run-time considerations that dynamic type systems need to retain. Notably more complex is something like Haskell's TypeSystem. By the way, what is the relevance of your point, and what does it have to do with my point above it?
Maybe you are biased because you've been doing it a particular way for so long that you are mentally hard-wired to always see things through the same model/framework. Anyhow, we can have an interpreter show-down at the Not-OK Corral to see which interpreter model is simpler.
What does that mean, exactly?
Back to the original question: How will you determine whether a given language has a "flawed CSR" or not, and how will you distinguish that from "tag" detection?
Let's ignore CSR's for now. They appear to be a distraction from modelling.
By the way, there are a number of questions to you throughout this page that you've not answered and comments intended for you to which you've not responded. It would be nice if you'd read the lot and address some of the clearly un-concluded threadlets. DeleteWhenCooked
Many of them appear to be wondering off topic such that I leave them unfinished for a later time AFTER the key issues settled.
And "Types" poorly describes the two kinds of type encoding, at least the way you present it. It's either missing from your description, or gummed up inside with non-observable terms like "value".
Sorry, I don't follow you. What do you mean by "two kinds of type encoding" and "non-observable terms like 'value'"? Values are certainly observable: What does a literal represent? What is the result of evaluating an expression? What does a variable contain?
Then how do we directly observe values and know we are looking at values and only values? The only way to see the "result of evaluating an expression" is via output.
We don't have to "directly observe values" if we understand conventional imperative programming language semantics. However, we can also see that given an expression like "3 + 4", we can't use it to replace the "int" in "int x;", so it's probably not a type. We can't use it to replace the "p" in "p := 5;", so it's probably not a variable. It appears to be something other than a type or a variable. In fact, wherever we can use it, it seems to be equivalent to "7". Indeed, we can replace "3 + 4" with "7" everywhere "3 + 4" appears, and the program behaves exactly the same way. So what's a "7"? Now, do many more experiments like this. That's how we empirically arrive at the notion of "value". Fortunately, we don't have to empirically arrive at the notion of "value" because it's already well-understood in -- at least -- language implementation terms. I've little doubt that most language users understand it too.
That's why I say they "act like" read-only variables in most dynamic langs we consider. (It's not a contradictory term because they are alterable (variable) at coding time, similar to static languages.) And languages that allow declarations such as "int x;" generally allow things like "(int) 3 + 4", which for all observable purposes, act identical to a variable, other than mutability. Yes, there may be exceptions for some languages or edge cases, but I don't think we should get bogged down in such minutia just yet.
Expression "(int) 3 + 4" is performing a typecast on integer value 3 (and, in this case, therefore likely redundant). It's casting, or converting, the value 3 to an integer value 3. How is that like a variable declaration, other than in "int x;" it's declaring that the variable is of type integer?
How do we objectively verify that this "casting" is different than "declaring"? In most dynamic languages they act the same to external observers. Being SystemsSoftware experts, I suspect you've had your "heads inside the guts" for so long that you no longer view languages like scientists, but instead like an engineer or mechanic.
In the majority of imperative programming languages, a variable declaration assigns a name to a variable and it's added to a lookup table of variables that identify their properties -- e.g., scope, memory location, type (if a Category S language), etc. -- for use when the variable is referenced in expressions. Expressions, on the other hand, have no name and no scope, and don't necessarily have a memory location because they're typically constructed dynamically on a stack. Of course, I know this because I view languages like a computer scientist, not a natural scientist. A natural scientist might observe that variables have names and can be assigned to by name and referenced by name. Expressions have none of these.
Okay, but that seems largely an efficiency-geared decision. An "accurate" interpreter can also be created by treating the result of expressions as local anonymous read-only (at run-time) variables. Whether doing such runs fast or not is not my concern here. But anyhow, let's focus on variables for now and come back to expressions later. The topic is getting too fat.
A variable is a more complex entity than a value, so it doesn't make sense to use a more complex entity when a simpler one will do. Furthermore, a variable is a container that can hold one item at a time. If a value is a variable that can't change, i.e. it's a container, what does it contain?
But then you are adding more parts to the model. If we can piggy-back on variables, we don't have define an entirely new thing: Re-use. I agree it's a tricky balancing act: variations on a theme versus different thing, but I vote to piggyback on variables because they summarily make the model simpler, per my judgement. Regarding containing, it contains the same kind of thing(s) variables do, per observations about their output, as already described. When you toss out words like "containing", make sure it's an aspect or feature we can measure and observe, or at least be able to test for the presence or absence of contain-ness. "They contain because I say they contain" is insufficient.
Perhaps "contain" was a poor choice of word. It's one I like to use because I often use the analogy of a bucket to represent a variable, but the usual term is "store". A variable -- at least in imperative programming languages -- is said to store a value. If a value is a variable that doesn't change, what does it store?
Even constants have to be "stored". Constants are like variables except they have a lock on the door, and the system knows the combo but not the programmer.
A variable stores something, and can change.
A constant stores something, and cannot change.
What is the "something" that they store?
The value, or at least something resembling a value.
Yes. Now, what's the result of evaluating an expression?
Let me rephrase it. The result of evaluating an expression and the "result" of a constant is a variable-like "result" (for lack of a better name). It has features of variables such as values and types (if the language has them).
(Addendum: for "Type" and "value", the long form would be "produces a type" and "produces a value", and "Write" would be "run-time write".)
You mean to say an expression, variable or constant produces a type? That doesn't make sense -- "produces" implies to me that it generates a type. Do you mean an expression, variable or constant has a "type" property or attribute?
Perhaps "produces" was a poor choice of words, but "has" is also problematic because it implies state we cannot observe until the var "produces" it. (It's possible it's calculated under the hood). English just sucks for this kind of thing.
{I seriously doubt that English is the problem. After all, it's been used successfully for this kind of thing since 1927. And why in the world do you think "has" implies a state we cannot observe?}
We can usually models those as variables with "features switched off". That provides conceptual re-use and a way to clearly compare.
Granted, we can do the same with variables and say certain features are switched off on tag-free languages, and the result would look somewhat like your model. However, that tends to cover up the fact that some languages outright don't have some of the features throughout such that things are simpler to compare if we outright omit those from their model.
No, this is at best a confusing conflation and complication of familiar concepts, and at worst outright wrong (constants don't store variables, for example.) Rather than awkwardly (and potentially incorrectly) conflate variables and values, simply keep them separate. It simplifies all models, reflects actual language understanding -- programmers are taught that expressions evaluate to values, not that expressions evaluate to variables or constants -- and accurately describes how imperative programming languages are implemented and actually work. I.e.:
A variable stores a value, and can change.
A constant stores a value, and cannot change.
An expression evaluates to a value.
A type defines a set of possible values.
That's not saying anything clear or measurable to me. If it means something to you, great. But I am not you.
It's what imperative programming languages do. It's how they work. It's how they're built. If that's not clear or measurable, I'd be interested to know how you think imperative programming languages work; not how you'd model them, but what you understand happens internally.
You mean how the interpreter is built? I doubt that, per below. But that may not matter: if we are modeling results (I/O), then as long as the model produces the right answer (matches actual output), it doesn't matter if the virtual model actually recreates the interpreter or not. As we already agreed, often actual interpreters have to consider efficiency and borrow existing libraries to avoid re-inventing the wheel. Thus, they have "artifacts" that may not be well-suited for a mental model.
What I described is precisely the conceptual basis for constructing real imperative programming language interpreters and compilers. This is confirmed by both source code and descriptions in texts on the subject. I have, as usual, left out optimisations -- what I presume you mean by "consider efficiency and borrow existing libraries" -- but the essence, as I've described it, is correct.
I don't dispute that your model can make an interpreter. My claim is that it doesn't NEED an explicit RAM-inspectable "type" for certain operations or behavior. You haven't proved it's necessary (or even helpful in some cases).
I never said "constants store variables". I'm curious about which text of mine provided your mind with that interpretation. Perhaps my statement "they produce variable-like results" is ambiguous. Perhaps this is clearer: "they [constants] produce results that are very much like the results that variables produce". (It's odd how such phrases have multiple interpretations that we don't necessary notice when we write.)
Sorry, I wrote it backwards! Above, I asked, "If a value is a variable that doesn't change, what does it store?" You replied, "Even constants have to be 'stored'", which implies that variables store constants. Above, I meant to write "variables don't store constants, for example".
And again, the "familiar concepts" are fuzzy. You seemed to admit that with the "country music" analogy.
Not at all -- read that section again. What I wrote -- and it seems relevant here -- was that "[p]eople choose to listen to songs and radio stations, watch television programmes and/or YouTube videos, and purchase and/or download music on the basis of definitions of "Country Music", so it appears to be well-understood. However, what we're talking about here -- types in programming languages -- is far more clear-cut. Indeed, in the majority of programming languages it's trivial to identify explicit type definitions and explicit type references from the language grammar alone. Most implicit type references can be identified from a trivial understanding of conventional imperative programming language semantics. Border cases, exceptions, and peculiarities can be identified in reference manuals and by talking to developers who use these languages."
It still stands that the examinable output of constants and expressions share the same elements and features that variables do in most dynamic languages. I'm focusing on observable traits, and the observations are that the "output head" is indistinguishable from variable's output heads. Remember, I'm approaching this like a scientist would: what's observable and what models can mirror (predict) such observations and which models are simpler.
It's generally understood that the examinable output of a constant, variable or expression is a value. Values are also the examinable input to variables and constants. Describing values as variable-like or constant-like or whatever adds nothing but confusion, especially given that a beginner's explanation of a programming language typically starts with values. I suspect your model might actually be strengthened by dispensing with any notion of variables or constants, and dealing strictly with values. Or, perhaps even better, start with a minimal model that describes behaviour strictly in terms of values, and then build a model on top of it that deals with variables.
Re: "It's generally understood that the examinable output of a constant, variable or expression is a value" - We also have "typeName()" or "typeOf()" like operators that indicate a primary "type". I wouldn't be so quick to call this part of the "value".
I would. In some languages (Category S and D1), values have a type. In other languages (Category D2), the type is determined (i.e., parsed) from the value on a per-operator basis.
That's fairly close to the tag model. Note that D1 are usually mixed in terms of using the "tag" versus parsing. But that depends on how "determined" is defined. Different operators or techniques may use one or the other or both. It gets back to "are types in the mind" or something "real"? If we are modeling results, it matters much less. I wish to model output, not so much heads.
There is no "types in the mind" aspect to this. Throughout this page, I have only described what real languages do, and how they are implemented.
You appear to be gravely mistaken about the implementation. There are NO observable "type bytes" for parsing in the interpreter RAM unless one is fast and loose with language. And your "do" so far is only in your head, not objective reality. We only get bytes as output, not "variables", "values", and "types". Only fricken bytes: THAT IS THE REALITY. Any meaning or classification about these bytes are only in the human head. Objective reality doesn't give a shit about "meaning". ("Bytes" is also a human abstraction, but hopefully at least a UsefulLie we can both agree on....which is rare around here.)
{Bytes are no more reality than the more abstract views we usually use to discuss programming. It's unclear why you think one is acceptable and the other isn't. Nevertheless, even if we do treat the input and output as only bytes, the point that "There is no 'types in the mind' aspect to this." still stands. The types are in the language being used to tell the computer which input bytes to map to which output bytes.}
I already agreed in the prior sentence that bytes are an abstraction. They are sufficient for our purpose as long as both parties agree to the same abstraction. It's kind of like currency: as long as both trading parties agree to accept the currency in trade, it's a useful abstraction (a tool). You have not demonstrated clearly how "the types are in the language being used to tell the computer which...", especially since type is a vague concept. You need to model the specifics of your vision of "type", otherwise clear communication will not take place. I've given enough info to approach a machine-language-like model of tagging; you have not attempted the same. (I believe I could force something like your approach to work, but it would be more parts than the tag model.)
{But you haven't mentioned why the one abstraction is okay, and the other isn't. As far as types being clear, it even meets your gold standard for clearness. Every programming language ever implemented has implemented a type system using a machine. That's what you usually claim it takes for something to be clear. I wonder why you reject it now when it's inconvenient for you.}
Abstractions being "okay" is often relative to need. A map is an abstraction of territory and is usually quite useful. However, being an abstraction it leaves out a lot of information which may be needed for purposes the map was not designed for. And "a type system" implies a uniformity that may not exist. There are different ways to do "type things" and they may not be connected in any clear or objectively provable way, or only by convention. For example, a language may use an explicit type byte (tag) in some places, and parse-based "type analysis" in another. We could lump them under the verbal umbrella of "type system", but that doesn't mean there is an objective "wire" that links them in the interpreter's source code. The "connection" is in the head of a human, not the interpreter.
What it an "explicit type byte"? Aren't "tags" meant to be a model, rather than a description of implementation? Anyway, there are certainly different ways to do "type things", but in abstraction they -- or at least thing things we've discussed here -- do the same thing. And, internally, there is very often a strong connection -- call it an "objective 'wire'" if you like -- that links them in the interpreter's source code. Almost invariably, the routine that determines whether a sequence of characters matches a literal value of a given type -- integer, for instance, or float -- can be invoked both by the parser during LexicalAnalysis, and by a user-programmer at run-time via a user-programmer invocable isInteger() or isFloat() function.
This is in response to the above "I have only described what real languages do", which lead toward focusing on actual implementations instead of abstract models. ("Do" being vague, but seems to be tied to RAM models of interpreters to your side). Explain what this "strong connection" is and how to objectively measure it, or even observe it in RAM, since we are playing the "reality" card in those sub-thread. If I were writing a reference interpreter for say Php, I would have an explicit "type byte" associated with each variable (each variable's data structure). Operations that use the (apparent) type byte, such as "gettype", would of course use the type byte. However, functions that don't use the (apparent) type byte, such as "is_numeric", would only parse the "value" bytes and wouldn't have to sample the type byte at all to mirror actual Php behavior. The ACTUAL implementation would treat these two "kinds" of "type-ness" different (I used quotes because we don't have an agreed on name for the concepts). Granted, that's not the only way to implement such, but it shows an observable difference in a model that "works". Thus, the model objectively works (predicts properly), is objectively observable as an implementation, and objective implements both types of "typing" different. What more do you want?
I was talking specifically about parsing, not general implementation. However, what you're describing is quite simple: PHP associates types with values. Operations like "gettype" print the value's type's name. Operations like "is_numeric" return true if the value's type is numeric, but if the value's type is string it parses the value to see if the characters match PHP's definition of numeric. (Peculiarly, the PHP manual states that the parameter to gettype() and is_numeric() is a variable, despite showing examples where the parameter is an expression. Such muddles are typical of the PHP manual; one hopes (without much hope) that the PHP internals are better constructed.)
You are strongly hinting that there are indeed two "kinds" of "type" mechanisms, and that they can even contradict each our (such that one suggests a variable "contains" a String and the other a Number, at the same time.) I want a model that illustrates clearly why (in a mirroring sense) this is the case, or at least makes it easy to trace the mapping of input to output through the model and apply the difference in a clear way. Your "associates with" is fuzzy in 1) time, 2) space, 3) scope, and 4) quantity; and doesn't clearly distinguish and highlight the difference. Again, I could fix it (or something close to it) and add more "mechanical rigor", but the result would have more parts and more complicated rules than the tag model. I want a model that shows on the X-ray machine how air goes down one pipe and food down another pipe in the body, figuratively speaking. I want the difference clear clear clear clear in the model. Do you understand? Not a fuzzy difference, but a clear difference. Blatantly obvious.
There is no contradiction. Imagine a variable contains the string "123". Would you consider that to be a numeric value?
Depends on the language. And "numeric" can be ethereal in some languages or models. One could say in some contexts, "it possesses properties of a number".
I think "it possesses properties of a number" would be interpreted as "numeric" by most readers. Can you give an example of a language where "numeric" is ethereal?
By readers, yes, not necessarily by the language unless asked explicitly to make that decision via a function such as Php's is_numeric() function, which appears to be parse-based, not tag-based. It would thus not make sense to model such an attribute as state. Remember that it's possible for "0" to be a string, number, and Boolean at the same time if we ask "can be interpreted as". It would complicate a model to constantly "track" all these "can be interpreted as" for every operation.
Where did I write that the language will "constantly 'track' all these 'can be interpreted as' for every operation"? Again, you appear to be slaying a StrawMan here. (There are, however, performance optimisations in some languages that do keep track of "can be interpreted as"s, in order to reduce the number of future parse operations. That is not, however, relevant here. Forget I wrote it. Please.)
I'm not sure what you are getting at. As far as "Would you consider that to be a numeric value?", my answer is that is it's relative. And this is consistent with many dynamic languages in that the specific operator gets to determine such that it's relative to the "user" (operator) of that object/thing/value.
"Numeric" is rigorously defined by any reasonable language (i.e., the definition of numeric does not vary over time) whether it's well-documented or not, though it may vary by operator -- e.g., floating-point numeric may be recognised by one function, but only integer numeric by another. Is that what you mean by "it's relative"?
You asked if I (a human) would consider it "a number", not the "language definition".
I don't think any human of reasonable technical experience would have a problem identifying "a number", given appropriate context. However, I thought we were talking about computer languages, not human interpretations.
It could be a password, a license plate character set that just happens to have all digits, etc. That's domain semantics. Machines and interpreters don't "care" about that.
Sure. If a password is all digits, or a license plate is all digits, it's a numeric value.
My point was that some operators use only the info in the "type indicator" (what typeName() shows) and some use only the info in the value "representation" (for lack of a better word), that resembles/mirrors parsing.
I don't think that was ever in dispute, though I don't know what "resembles/mirrors parsing" means. It's either parsing the representation, or it isn't.
My use of "associated with" is exactly the same as your use of "associated with" in (your words, here) "[a]n assignment like 'a=123;' creates a byte(s) ID for 'number' closely associated with 'a' or a's value" and "I would have an explicit "type byte" associated with each variable." If it helps, where I write that "a type is associated with a value" read "for each value, the language keeps track of its type, so that given a value 'v' we can answer the question 'what is the type of v?'" and where I write "a type is associated with a variable" read "for each variable, the language keeps track of its type, so that given a variable 'v', we can answer the question 'what is the type of v?'". If you see the phrase "... are not associated with a type", read "the language does not keep track of the type of ..." Etc.
But your description of the cfArgument examples appears to contradict this; you saying there "is" a "type associated with" or similar.
What phrase(s) in the description at the top of the page are you taking issue with?
It's incomplete. It's like, "Here's your parts, now YOU put them together". I want to see real rules.
What's incomplete? There are no "rules" outside of what has been stated. Remember, the only cause of observable state change -- once all definitions are present -- in a language is variable assignment. Given a <variable ...><value ...>x</value></variable> as shown at the top of the page, the only thing that ever changes is the <value ...>x</value>.
Studying programming languages like a naturalist is an odd approach, unnecessary at best and likely to produce error (like not recognising values) at worst. Natural scientists are forced to study the world via observation, because we don't have "insider knowledge" about how the natural world works. We don't have to study programming languages by observation, because as computer scientists we have "insider knowledge" -- we know how they work. If our explanations of how they work are inadequate, then it's our failing to write clearly about what we know actually happens. We don't need to create fiction like a "tag model" -- along with new terminology -- when all we really need, perhaps, is better writing about how programming languages actually work.
Parse-based "types" do NOT create actual tags the way an explicit type declaration (or quote-ness) does under the hood. I'm pretty sure if we inspected the interpreter and machine code we would confirm this for most dynamic languages. We may argue about what is called a "type", but the design is different between both regardless.
I'm not clear what this has to do with my point. Where did I suggest that "parse-based 'types' create actual tags"? What's a "quote-ness"? In Category D2 languages, values don't have specific types (at least, not observably, and aside from being strings.)
Arrrg, we keep coming back to this. An assignment like 'a=123;' creates a byte(s) ID for "number" closely associated with "a" or a's value (I won't get into the distinction here) for D1 langs. But something like 'a="123";x=isNumeric(a);' does not create the same kind of "byte ID' in RAM inside the interpreter. There is no need for such whatsoever. "x" may receive a Boolean tag, but it's not a "numeric" tag. No explicit byte(s) representing "numeric" needs to exist in/for the second snippet, yet would be in the first. And if "isNumeric" is used in a conditional instead of the assignment shown here, then it won't even generate a (measurable) Boolean tag (or whatever you call them). Inside the interpreter, they are different animals, unless you are wasting processing to kiss up to tradition.
Again, I'm not clear what this has to do with my point. What is it, in the paragraph I wrote to which you're (apparently) responding, that you take issue with?
I will agree that a model can be created using all the "parts" you talk about: variables, a variable's type, values, a value's type, etc., and use it to model the languages we've been talking about. However, it has unnecessary parts for many of the languages; the parts would either sit unused or used to keep redundant state info. -t
How so? The descriptions at the top of the page use all the "parts" -- values, variables, types, and the relationships between them, plus relevant items related to operator invocation. What parts "sit unused" or are "used to keep redundant state info"? Note, again, that the descriptions are based on simplified descriptions of how language implementations -- compilers and interpreters -- are actually built, sans extraneous detail related to optimisation and the like. If there are parts that "sit unused" or are "used to keep redundant state info", then the same applies to language implementations. As a language implementer, I believe that isn't the case. Could you explain and illustrate?
I've given plenty of examples and descriptions of unnecessary parts. You just play word games and point something rather arbitrarily and call it a "type reference".
Your "unnecessary parts" wouldn't be those that would explain why 123 + "123" is 246 in some languages, "123123" in others, and an error in a third category, would they?
For "type reference", read "integer, float, boolean, double, date, etc."
I challenge you to model your variables as XML-like structures similar to those in TypeTagDifferenceDiscussion and show and explain step-by-step how they are read and changed by a hypothetical interpreter that follows your model during the observations.
I have amended the descriptions at the top of the page.
Okay, but you didn't "run" them through the observation scenarios.
See the "Actions" subsections. This is really simple stuff, so no need to be verbose. If you're unclear on anything, please ask.
No, man, you just fuzzed it up further by using words like "appropriate" and "compatible", and "The type of a value may be inferred or explicitly specified". If you believe that to be clear writing, then we are worlds apart in terms of English interpretation and what "good" technical writing is.
I've changed "appropriate" to "compatible", defined "compatible" under Category S, and provided an example of "inferred or explicitly specified". The description at the top of the page is intended for readers familiar with programming languages -- i.e., the typical WardsWiki participant. It's not intended for rank beginners.
I didn't get involved in defining/modelling how operators work in general in TypeTagDifferenceDiscussion. Rather, I focused on specific observations and asked the scientific question: "can we model such behavior/observation without a type tag" or the related: "Can we determine if a tag is being used?" You are complicating things by dragging polymorphism into it. But the existence of polymorphism still doesn't answer specific questions about type-like behavior since it's generally up to the operator builder to decide how to interpret/process the value and/or the tag; and I'm not assuming uniformity of treatment unless empirically demonstrated (such as a lang that has no detectable tags anywhere). In short, "compatible" is in the head of a given operator implementer.
The absence of operators in the "tag model" is a limitation, and your "scientific question" is trivially answered by examining language implementations.
It's not trivial to exam actual implementation. Plus, empirical testing is a good thing. Don't assume.
Actually, it is trivial to "exam" actual implementation. Empirical testing is always a poor alternative to internal examination. For example, imagine how well "biology" would work without any knowledge of anatomy, physiology and chemistry.
That's why biology makes for a shitty programming platform (for human interaction). Hopefully languages are built around cleaner models, and so far the tag model fits common dynamic languages well. If and when it fails, THEN I'll consider direct dissection.
A clear failing -- for a model apparently based purely on empirical observation -- is that the only objective, observation-based means for identifying the existence (or not) of "tags" is indistinguishable from behaviour of functions that return a string representation of values.
Please clarify "only". It can forecast output based on inputs (source code and data). We can also talk about dissecting languages and their interpreters if you want; I'm sure the languages in question have the equivalent of a tags. Tag-based type processing generally requires state in RAM while parse-based does not, at least not beyond the parsing operation. (It could perhaps be modeled with state, but would be damned messy.)
By "only", I mean "that's the one way you've given to do it."
What do you mean by "equivalent of a tags"? Have you understood the multiple explanations here about how type references are actually implemented? Have you read the descriptions at the top of the page, which explain what is or is not "modeled with state" in various languages?
No I don't understand. Your writing is confusing. I suggest a more mechanical/visual approach with clear boxes, clear nesting, and clear rules about when things go in or out of those boxes. And associations are clearly drawn and clearly labelled with clear rules about scope and duration and clear connections to the clear rules. Clear? Did you go to University of Fuzzcloud or something?
The description at the top of the page assumes the reader is a typical WardsWiki participant, familiar with popular imperative programming languages. Therefore, I have kept the descriptions quite terse under the assumption that the terminology and mechanisms are familiar. If I were writing a textbook for students, I would make no assumptions and would be more verbose.
Perhaps that's a good reason to switch to the tag model: it can be clearer without excess verbosity.
{If you ever explain it, there is some chance that it might. I'm not holding my breath on it being explained clearly or resulting in a clearer model with less verbosity. There's been no sign of either yet.}
I thought it did. It looks clear to me upon review. If you have a specific question, ask away.
Polymorphism, in particular its use with canonical operators like "=" and "+", is fundamental to understanding precisely the language behaviour your "tag model" appears intended to address. Note that it's only Category D2 languages where it's up to individual operators to "interpret/process the value". In Category S and D1 languages, dispatch is (for the most part) done by the language implementation; which operator gets invoked is dependent on value types. "Compatible" is only "in the head" of the operator implementer in Category D2. In Category S and D1, "compatible" is explicitly defined by the TypeSystem.
Polymorphism only tells us that two or more possible different "processing paths" exist based on some kind of analysis of the parts of a variable/value. By itself that fact gives no details about how these paths are determined. And empirical analysis should be done to verify multiple paths exists, not just take somebody's word for it. -t
The "paths" are determined by the TypeSystem and the operator dispatch mechanism. However, we can trivially identify polymorphism: It exists anywhere a given operator has different behaviour dependent on the types of its operands. The operator "+" is an obvious example: In many languages (and for better or worse) it can either mean string concatenation or addition, depending on the type of its operands.
That's just a fancy way of saying that output varies based on inputs. Remember, I'm only modelling around what we can examine, and that's I/O. Polymorphism is a head model.
A program is something where "output varies based on inputs." Polymorphism is particularly related to operator behaviour and operand type. These are objectively observable.
Polymorphism is difficult to clearly define. I suggest we avoid reference to it.
Re: "In Category...D1...which operator gets invoked is dependent on value types" -- Do you mean the tag? Some D1 languages will interpret or "convert" as needed for some operations. For example, "write('1' + '2');" can be handled different ways. A hypothetical D1 language could parse both operands to see if they are interpretable as numeric, and if so, go ahead and process "+" as addition instead of string concatenation. If one or both can't be parsed as numeric, then concatenation is selected.
I don't know what a "tag" is, so it's not what I mean. What you describe is a distinguishing characteristic between D1 and D2 languages. Only the latter "parse both operands to see if they are interpretable as numeric" as per the description at the top of the page.
What if a language did parse-only analysis for operation X but did tag analysis for operation Y? How would you classify it?
As I wrote at the top of the page, "individual languages may belong to more than one category depending on particular language features".
Perhaps it's better to make the classification on an operator-by-operator basis rather than per language. However, I generally classify langs as "tag-based" if ANY operation displays taggish behavior (at least per realm, such as the scalar realm). This is because the programmer can "sample" the tag in such langs even if it's not being used for any given operator. (Something like a typeName() function is usually the easiest way to sample).
What is "taggish behavior"? Classification on an operator-by-operator basis would make no difference.
It's also goofy model-wise to change the variable's structure after the fact based on what a particular operator does or examines (such as looking at the tag or only parsing). That's a Heisenberg-like model of vars. If I am interpreting you correctly, then a given operator is a "D1" operator if it examines the tag, but is a "D2" operator if it only parses. That would mean you use one data structure (your XML representation) for D1 operators and another for D2, implying the representation of a variable changes throughout program execution between a D1 variable data structure and D2 variable data structure: a part would only "exist" if examined. -t
No, the only observable state change -- once all definitions are present -- in all three language categories is variable assignment. Given a <variable ...><value ...>x</value></variable> as shown at the top of the page, the only thing that ever changes is the <value ...>x</value>. Hence, in a language with mixed D1 and D2 operators, the D1 operators would examine the value's type attribute whilst the D2 operators would not. The variable structure (and the value structure) remains constant.
Your descriptions at the top don't say that. They make it appear that the data structure of the variable changes per operator "type". Please review. You are loosey goosey with the scope and duration.
Yes, they do say that, just below the end of Language Category D2.
You mean the data structure changes per assignment? That's an odd way to do such models. What rule changes it to what?
No, I mean the <value>...</value> changes inside the <variable>...</variable>, as described at the top of the page and elsewhere on this page. The "rule" that changes it is called "assignment to a variable", in which the old <value>...</value> is discarded and replaced with a new <value>...</value>.
So the very existence of the "type='...'" attribute comes and goes depending on whether a D1 or D2 operator "processes" the variable? I'm considering a language which uses both tag-based and parse-based "typing" (which at least Php is). Your description seems to imply the structure changes depending on which "kind" of operator is involved, which would be really odd. As originally written as language-scope classifications, it made sense (or at least was consistent). But now that you agreed to re-interpret your descriptions as being operator-centric, the structures given don't make sense, unless they magically change per operator.
No, the "type='...'" attribute is constantly present in those languages that have one. The description at the top of the page is about language categories, and the structure of a variable does not change depending on which "kind" of operator is involved. Dispatch of some operators (like "+") may depend on the type of the operand value(s); other operators (like "isNumeric()") might not reference the operand type, though I imagine "isNumeric" would be more efficient if it was implemented as a polymorphic function that unconditionally returns "true" for values of numeric type, parses the value for values of string type, and returns "false" for values of any other type.
I agree that something like "isNumeric" can be more efficient machine-wise if it checks the type tag first, and then only parses if the tag is not "numeric" (for example, to see if it's a string that can be interpreted as a numeric). But it's simpler to model it by saying it always parses (or act like it always parses). Being that I'm looking for the simpler model as a priority over mirroring actual implementation, for this discussion I will generally assume parsing if that assumption accurately predicts behavior (I/O).
But your statement, "individual languages may belong to more than one category depending on particular language features", still puzzles me and seems to contradict "The description at the top of the page is about language categories". How a language works with your description and can be multiple categories "at the same time" is still puzzling to me. How does one know which to apply to a given language? Php, for example uses (or can be modeled as) both parse-based typing and tag-based typing, depending on operator being used for a given statement/operation. Thus, how do you classify it? (I classify mixed typing as "tag based" because it's a simpler model to say the tag is "always there" even if a given operator doesn't happen to use it because it's using parsing instead.)
{The example (given above) is C#. In C#, all values are associated with a type. This puts it into either the S or D1 categories. However, variables may or may not be associated with a type. In particular, variables declared as 'dynamic' do not have a type associated with them, while all other variables do. Hence, C# exhibits traits of both S and D1 languages. One knows which to apply by looking a the language definition. As far as I know (I'm not a PHP expert), PHP is entirely D1. The isNumeric function is defined as returning true if the value is associated with a numeric type or represents a numeric value. There also appears to be another whiff of hypocrisy, you've complained about "unnecessary parts" in our description (though you never seem to get around to pointing out what those parts are), yet here you are advocating that the "tag" is present even when it's unnecessary.}
I'd like to avoid analyzing C# here because it's at least partly static, and I'm limiting my model to dynamic languages for the time being. Php's "isNumeric" function uses parsing in my test. For example, 'is_numeric("123");' returns True.
{We weren't talking about your model, you expressed confusion about how a language can be in more than one of our categories. I gave a specific example of how that could happen. The fact that your model isn't expressive enough for that situation is beside the point.}
I could make it model static and semi-static languages, but that would over-complicate it. Maybe you are trying to be too general, to "teach" the reader about interpreters & compilers in general instead of explaining/modeling a specific aspect. Anyhow, could you please use an example from dynamic languages to keep things specific to dynamic languages?
If by "too general" you mean the descriptions at the top of the page encompass all popular imperative programming languages, then I guess it is "too general." There's nothing at the top of the page intended to "'teach' the reader about interpreters & compilers" per se, and in fact that distinction is explicitly excluded. It merely intends to -- quite reasonably, I think -- describe an aspect of behaviour in all popular imperative programming languages. It seems unreasonable to exclude the whole category of statically-typed languages without compelling justification, especially as the description is simple, and it's instructive and enlightening to compare statically-typed languages with dynamically-typed languages.
I decided to keep the scope to dynamic languages to keep the model simpler. C#'s two-level typing requires a fairly complex model.
That's a weak reason. You can make the model even simpler by excluding static and dynamic languages.
Why is it weak exactly? Why should we model everything just to explain/model something specific? You seem to be arguing that since 100% of x is bad that 50% of x is also bad, where x is "simplification" or a narrower scope. (What's the formal fallacy name for that? It's not really "slippery slope" because we are not talking about political policy.)
{It's weak because you appear to be interested in using it to model static languages. You have, after all, indicated that including them is a possible future direction. You've even attempted to classify a static language as having compile-time tags. Given that, and the fact that its competitor does handle those languages simply, it appears that the restriction to dynamic languages is nothing more than a desperate attempt to keep some "advantage" to your model.}
That's another issue.
Desperate? Your anti-me bias is showing. I've explicitly ranked simplicity of the model as the top priority above a while back. I agree it's a trade-off, but I'm describing the trade-off priorities I'm using and have been using. If you want to use a different ordering, that's fine, but it will make comparing both side's models harder.
{Why would it be more difficult? Just compare them in each category and let the readers decide how important each category is to them. But since you haven't ever clearly communicated your "simple" model let alone the more complex model for static languages, we don't really have anything to compare yet.}
{As for your desperation, remember that you excluded static languages only after a problem with your model was pointed out using a static language. Furthermore, the problem wasn't restricted to static languages as you were asked about a dynamic language with the same problem. Your response to that was to call the language "stupid" and to exclude them as well.}
What "problem"? If you mean CSR's, again my model does NOT rely on CSR's in any way whatsoever. (More on this below.) As far as "clear", to me my model is clear as bell and has plenty of examples of application (which your lacks). I honestly don't know where the communication gap is. I cannot read your mind.
Let me clarify that to avoid yet more CSR drama: A language may provide CSR's as part of its output "kit", and we use this output kit to get results in order to test our models. But this is true of ANY model testing if the purpose of the model is to mirror I/O and if we want empirical verification. -t
{The problem of how to differentiate between anomalous output (e.g. truncated floating-point) and output that indicates "tags". As far as communicating your model, you were given a suggestion on how to go about it. You refused. BTW, how can we be lacking in examples? Every last programming language ever created uses it, so just pick one.}
See below for CSR discussion. I don't recall such a suggestion. Bookmark reference? If you use the interpreter itself as the model, then your "model" is 2,000 pages of source code while my is a 3-page description with examples. I already stated the audience of my model.
{Search for "what you will need". (And no, there is no need to look at source code. Just look at the language definitions.)}
It already has everything it needs. Remember I'm just modeling "type issues", not an entire interpreter. If something is missing for that purpose, state what it is and I'll add it. If I failed to clearly explain how the niff gets to the groggle, point that out and I'll clarify how the niff gets to the groggle.
{You weren't asked to add anything to it. You were asked to describe what you already have in a particular way.}
I would suggest you use your favored technique on your own model, and then I'd have a better idea of what such a documentation style actually looks like. Being that we tend to interpret a given word differently from each other, including "complete explanation", an actual specimen may be in order. I don't have an objective Complete-A-Scope to measure "complete".
{See section 2 in http://lucacardelli.name/Papers/TypeSystems.pdf (Note: This describes a formalism for S languages. D1 langauges would need to delay the type judgments until run-time. D2 languages would only make type judgments if an explicit request is made, e.g. cfArgument.)}
{He's been working himself in that direction. He's already added the additional requirements that the language not be "stupid" and that the CSRs aren't "flawed".}
STFU. Your snide "input" is not welcome.
{Welcome or not, it's true.}
CSR's are not relevant to the topic. You are mixing things up. Let me re-clarify it to be double sure: My model does NOT use CSR's.
Doesn't your model rely on CSRs in order to detect the presence of tags?
It relies on experiments based on input (source and data) and output. Whether that involves CSR's or not is a verbal labeling issue. I'm not labeling such here. And testing ANY model will involve looking at I/O, unless you are modeling based on implementation (interpreter source code or RAM image).
As has been stated all along, the descriptions at the top of this page are based on how popular imperative programming languages are built. Unsurprisingly, this accurately predicts input and output, but does not rely on CSRs or any other verbal labeling issues.
Good! Neither relies on CSR's.
Your identification of tags (or not) relies on CSRs, does it not? If not, how do you determine whether a language employs tags or not?
No. Examining output requires examining output, obviously, but whether the output "ports" are "canonical" or not is immaterial. We observe ALL known output, canonical or not, because we are building an output predictor, NOT a canonical-only output predictor. Canonicalness plays absolutely NO role in the model.
Canonical or not, tag detection relies on string representation of values, does it not? And, therefore, as the example in C demonstrated, detection of tags can be "fooled" by quirks in the string representation of values. Is that not the case?
I'm not sure I'd call all output "strings". That's a definition issue. The floating-point anomaly itself was detected (demonstrated) by examining output, no? It's not hidden. It's true in that case examining the value directly may be difficult, but we can examine the side-effects of it in output. It's roughly comparable to not being able to directly examine the core of the Earth, but we can examine properties of Earth that give clues to the core's nature, and we use these clues to build models, which are further tested for accuracy as more clues/data come in. Again, you are mixing up a definition attempt with a model. Not the same thing. (And C has no CSR's, so it's double moot. Your rudeness to me was wasted on nothing. Think through stuff next time you insult others over it.)
How is my point rude? Are you perhaps confusing my responses with someone else's? On this page, my responses are always in italics. Responses from others are in curly braces.
Then my comments are to Curly.
Regardless what you call output, is it not the case that it is impossible -- in some cases -- to distinguish tags from characteristics of output?
I inserted the Earth's core analogy above.
{Which doesn't answer the question.}
As long as the model mirrors correctly, why should we care whether it's a "characteristic of output"? As far as the C example, I don't see where your model solves anything related to floating point anomalies at the empirical level. Floating point modeling is an issue outside of my model, I would note, and is largely language-specific. Language-specific adjustments can be added as needed.
{The problem it exposes is you can't tell us how to tell whether it's an anomaly or indicates a "tag". Everything you've given us about what indicates a "tag" says the anomaly indicates a "tag". Except you've declared it otherwise. Why? What is it that makes the anomaly not indicate "tags" while your other examples do?}
I'm not sure I understand. How about an illustration. If a particular output port/device/mechanism adds "noise" or annoying artifacts, then we'd have to tweak with our model to take that noise into consideration on a language-by-language basis.
{And how do we tell if it adds "noise"? How do you decide that the model needs to be tweaked? As for a illustration, use any tag-free language you wish, but one whose output routines truncate strings that represent numeric values.}
a = "1.234567";
b = "1.23";
print(a); // This prints 1.23
print(b); // This also prints 1.23
if (a != b)
print("Tag detected"); // this branch is taken.
.
That's not necessarily tag detection. The example is misleading.
{And how do we tell if it is or isn't?}
We'd probably need more tests to find an appropriate model. For example, one possible explanation is that "print" truncates strings to 4 characters. (Remember FORTRAN?) But we'd need further test to verify such a hypothesis. Science!
"Science" here would be to dispense with I/O-based speculation about languages, and refer to actual ComputerScience which tells us how they really work.
Then you may miss finding a simpler model (perhaps for a specific purpose). And so far the tag model is not incompatible with typical interpreters: they use actual tag bytes.
If there was a simpler model than variables, values and types -- that was as effective in describing every aspect of popular imperative programming language behaviour vis-a-vis variable assignment, operator dispatch and evaluation of expressions -- we would embrace it. So far, no such model has been presented. What are "actual tag bytes"?
In your sample structure: <variable name="splat"><value type="int">3423</value></variable>, it is still not clear when the two key parts (type and "value" value) are inspected and changed during run-time. Granted, I rely on operator-specific testing on common or representative (typical) type-centric operators to see whether an operator looks at and/or changes the type tag, and perhaps you are making that same kind of assumption, but not stating it.
The only change that ever occurs is the replacement of <value ...>...</value> during variable assignment. The 'type="int"' attribute is normally examined by the language internals as part of the operator dispatch mechanism, and is frequently examinable by the user via a 'typeof()' or equivalent operator.
Further, in your sample structure, there would be an equivalent actual interpreter "slot" for the attribute value (...) of your "type='...'" template, no? In other words, variables would have a data structure associated with them, and one of the slots in this data structure would be the implementation of your "type='...'", correct?
I assume by "data structure" you're referring to implementation rather than a conceptual model, yes? In some languages, a variable or value is a data structure with two slots, one for a value and one for a type reference. But, not all languages do that. In some languages, the source code is parsed to produce an abstract syntax tree and then there are two subsequent traversals of the abstract syntax tree. The first traversal only performs TypeChecking and resolves polymorphic operator references. The second traversal evaluates expressions, assigns values to variables, invokes operators, and branches as appropriate. In such languages, whilst variables and values are indubitably associated with types, and although there are certainly variable and value data structures during the second AST traversal, during the first AST traversal there are only type references. During the second AST traversal, there are no type references. In short, in such languages, the variables' data structure does not physically contain a slot with a type reference. However, it is accurate to say that in such a language, variables and values have a type. Indeed, such languages often use ManifestTyping and are StaticallyTyped, so the association of types and variables is explicit in the source code.
Yes, different ways to skin the cat. That's why it's nice to have a clear and consistent prediction model regardless of implementation path chosen.
Indeed. The descriptions at the top of the page provide precisely that, using concepts and terminology familiar to programmers.
I disagree for reasons already stated.
What evidence do you have to support a view contrary to almost the entirety of related texts in ComputerScience and SoftwareEngineering, and virtually every programming language reference manual?
You mean their view that their own writing is clear? Because the ivory tower types are often clueless about the real world. I'm not the first to make that observation. They often form their own language and their own way of doing and describing things (clique) and they end up becoming detached from the world outside of their WalledGarden thought-bubble.
No. Above, it is written, "the descriptions at the top of the page provide [a clear and consistent prediction model regardless of implementation path chosen], using concepts and terminology familiar to programmers." You responded, "I disagree for reasons already stated." Your disagreement is noted, but disagreement is easy -- anyone can claim the sky isn't blue. Do you have evidence that the descriptions at the top of the page do not provide a clear and consistent prediction model using concepts and terminology familiar to programmers? (Assuming, of course, that's what you disagree with -- it's not clear what you disagree with.)
I have already described how typical programmers I encounter think about and deal with dynamic "types" based on my questioning of them and experience with their code. I see no reason to repeat that here. In short most either organically "wing it" or use defensive programming to avoid potential ambiguities. It's my anecdotal experience against yours. Neither side has an OfficialCertifiedDoubleBlindPeerReviewedPublishedStudy about how typical programmers react to types. The tag model is my attempt to get away from 1. organically winging it, 2. defensive programming bloat, and 3. linguistically-twisted type goobledyguk typically found in manuals and books and over-educated WikiZens.
Do you have any evidence that your "tag model" has successfully achieved any of your three goals?
It's just a toddler. Gotta give it time.
As far as "hypocrisy", by "fewer parts" I also mean "fewer rules", rules are "parts" (and I have stated both in some places). If tag attribute pops in and out of existence in the model during run-time, then we need to give rules for the in-pop and out-pop, which is obviously goofy and more complicated. (Your model appears to have the same issue.)
In the descriptions at the top of the page, no "tag attribute pops in and out of existence in the model". All constructs described have a static structure. The only thing that ever changes is the <value>...</value> inside a <variable>...</variable>. One could imagine, however, an "isNumeric" operator that only looks at the contents of a <value type=...>...</value> construct, and ignores the "type=..." attribute. That doesn't mean the "type=..." attribute, "pops in and out of existence in the model during run-time", because it doesn't. If I don't look at my chair, does that mean it pops "out of existence"? Likewise, if isNumeric() doesn't look at the value's type, that doesn't mean it pops "out of existence." (Though, in practice and as described above, isNumeric() is probably dependent on the value's type.)
In that case our models appears to be growing fairly similar. By isNumeric() being "dependent" on the "value's type", do you mean implementation-wise, or results-wise? My tests show one cannot detect it using the "tag"; thus I'll model it as being parse-based if it simplifies the model(s) for Php. (I still believe your description and/or classification system needs to solidify the scope.)
I mean implementation-wise, which is why I wrote "in practice" and mentioned it parenthetically.
As for your perception that "our models" appear "to be growing fairly similar", I can only interpret that to either mean your model has changed or your understanding of actual language behaviour has changed, because our description of actual language behaviour has not changed. It's only grown explanatory text.
I made no changes in description of actual behavior. The experiments I did before still produce the same results. (Although, I didn't know Php's is_bool() was screwy, but this is specific to an operation.)
Then your understanding of actual language behaviour has changed?
And your writeup still has confusion in the per-language versus per-operation department described above. I haven't seen that fixed.
I'm not sure what confusion you mean. The descriptions at the top of the page refer to language categories. That's why they're headed, "Language Category S", etc.
I ask again, what category would Php fall under, given the clarifications above? (As a reminder, some operations/functions use parse-based "typing" and some don't (AKA "tag-based")).
Category D1. The operations/functions you refer to parse strings to see if the sequence of characters represents another type.
Why do you describe a fair amount about parsing under D2 but no mention in D1?
I describe what predominates and characterises the languages' TypeSystems: Type references predominate in Category S and Category D1. Parsing predominates in Category D2.
I suggest you don't make your descriptions based on frequency between those two.
Suggest what you like. That's what they are.
I'm just trying to make it clearer. I'd suggest something like, "Here's how D1 languages handle parsed-based typing..." and "Here's how D2 languages handle parsed-based typing...". If they are the same, then factor it like a sub-routine.
Re: Your "unnecessary parts" wouldn't be those that would explain why 123 + "123" is 246 in some languages, "123123" in others, and an error in a third category, would they?
There are different ways to process "+", and they vary per language. One needs to experiment to see which combo of values and tags and parsing best explains them on a per language basis. I can think of several different rule sets (alternatives) for how to process "+" around values and tags and operand sides. (Some languages look at the right side first and some at the left side first when making "typing" analysis.) The tag model is just the kind of clean and crisp model to test against because it doesn't rely on hazy vocabulary. The combinatorial mess of alternatives may also be an object lesson on why both tags and "type" overloading suck.
The various ways of handling "+" (for example) is entirely accounted for by the descriptions at the top of the page.
That may or may not be true. I cannot process your word salad in a "mechanically clean" enough way to really know.
Are you sure your vested interest in your "tag model" isn't influencing your appreciation of the descriptions at the top of the page?
It's honestly vague to me. You defined data structures, but don't describe clearly when and where these data structures and their parts are examined or changed. I tried to do that in TypeTagDifferenceDiscussion so it's clear WHAT part of the data structure is being examined, WHEN it's being examined, and about what info is "kept" and what is not kept. If there is state change associated with the variable, I clearly show that. There is no other (undefined) "thing" on the outside that stores that state. I don't see that from you. To avoid the pitfalls of English, I believe we probably have to model things at almost a machine-code level and go step by step very carefully and be very clear about what is happening and any state that's coming or going and what rule makes that state come and go and make sure our data structures show that state and/or are cleared when the state disappears. Thus, everything's "on the table", and we know what changes, when it changes, and what rule changed it. None of this ghosty "associated with" stuff. If there is "associated with", then make that association clear in your XML. If the association disappears for whatever reason, make the rule AND TIMING for that disappearance clear, and show the XML after that change.
Maybe we can establish some modeling rules:
1. Any "association" is shown as an XML structure.
2. Any changes to any part of the XML structure, including attribute value changes, are clearly documented and shown as new XML (state).
3. It's made clear WHEN such changes happen. i.e. the statements that triggered it during execution.
4. It's made clear WHY such changes happen. i.e. a reference to a rule number or ID is given with the change.
The only observable state change -- once all definitions are present -- in any of the language categories is due to variable assignment. Given a <variable ...><value ...>x</value></variable> as shown at the top of the page, the only thing that ever changes is the <value ...>x</value>, and its structure remains the same even though its contents and attributes do not.
That seems to contradict what you said above about languages being a combo of D1 and D2. And you've said that cfArgument creates a "type reference". Where is that reference explicitly modeled in XML? When, where, and by which rule?
What aspect does it contradict? cfArgument doesn't create a type reference, it has a type reference specified by the "type=..." attribute of the <cfargument ...> tag. That does not, however, mean a type reference is associated with the value being tested by cfargument. This has been discussed already.
So the structures you give, such as the XML representation of a variable, have NOTHING to do with cfArgument-like validation attributes (which may use the word "type")?
Correct -- cfArgument-like validation attributes like "type=..." do not make reference to variables. The XML structures shown above are for variables and values only.
This appears to contradict earlier statements. But I'll set that aside. One of the reasons I use "tag" is avoid conflict with the parts of the model and language features like this named "type".
What conflict would using "tag" avoid? The type that the characters in a value represent is precisely the type that the "type=..." attribute refers to.
The "type" in your model/writeup does not handle the "type" attribute in cfArgument, yet they share the same name.
I'm not clear what you mean. Please explain?
Your XML model of a variable uses the word "type". cfArgument also uses the word "type", yet there is no relationship between the two, creating a potential source of confusion to the model user/reader in the sense that they may feel there is a relationship or should be a relationship between them. I suppose "tag" can be confused with say "XML tag", so neither is perfect. But since it's a model of "typish" behavior, I feel I should avoid the risk of collision and give it a name not reserved for other "typish" things. For example, the isNumeric() function is "type related" by most accounts, yet will not use the "type" attribute (or whatever one calls it) in the model.
{Huh? Where do you see anyone saying that the types referenced by the XML in question and the types referenced by cfArgument have no relationship? They would be the same thing. What's different is in how they are used by the language. In the variable case, only values of the proper type may be assigned to the variable. In the cfArgument case, only values that can be converted to the proper type are considered valid.}
Oh crap! We're back to square one. Your model/description does not demonstrate how that explicitly happens.
It doesn't need to. A "type" is a "type" wherever "type" appears.
"Type" is ill defined. Attempts so far model human minds, not languages.
On this page, it isn't defined at all and doesn't need to be. Notably, "tag" is equally undefined, but "type" is familiar and generally recognised and understood by programmers. "Tag" is not.
We already had this debate above near the PrivateLanguage mentions. I won't repeat it here.
However, you still have not shown how "tag" relates to "type". You claim your "tags" are not equivalent to types, but cannot show how they map to types. Until you do that, the "model" is effectively useless. A model is only of value if its components can be mapped -- unambiguously and on a one-to-one basis -- to the real world that it models.
I can only answer that if and when I figure out your model. And I consider "real world" to be input and output. As I described already, this may or may not match internal interpreter design because I'm trying to help programmers, not interpreter builders and thus rank simplicity of model over matching of interpreter implementation. If my model doesn't match the I/O of an actual interpreter, THEN you have a reason to complain about the diff.
There's no model to "figure out". It's simply how popular imperative programming languages work. If you're familiar with C, C++, C#, Java, Python, Perl, Ruby, PHP, etc., it's practically self-evident. Your model doesn't match the I/O of an actual interpreter, because it can't account for why println("34" + 34) is 68 in one language, "3434" in another, and an error in a third. Your model also doesn't match the concepts that are used in popular imperative programming languages -- like variables, values and types -- and relies on an undefined new term "tag" that is not clearly mapped to the concepts that are used in popular imperative programming languages. In short, your model is incomplete and it's not clear how it's intended to map to the real world.
Note that the models do appear to be growing similar, but there are still some fuzzy areas such that I cannot declare them equivalent just yet.
Since the descriptions at the top of the page have not changed since they were first written and only explanatory text has been added, it must be the case that either your model has changed (where?) or your understanding of the descriptions have changed.
I remember at least two changes to it, including the XML addition.
Yes, that's the "explanatory text" I mentioned. The structure has never changed.
The entire thing looks like "explanatory text" to me, and not very good explanatory text at that.
You've already told us you find everything ever written about types, values and variables to be vague, so I'm not surprised you'd find my "explanatory text" vague too. However, its structure has never changed. If something has changed, it's either your model or your understanding of types, values and variables.
And, remember, all "associated with" means is that given a statement like "<x> is associated with <y>", if we're given <x> we can answer questions about <y>. There's nothing "ghosty" (?) about it.
What is "E.I."?
I fixed it. Had the letters backward and wrong case.
I dissected one of the paragraphs above, showing the problem areas: -t
In all three language categories, operators may interpret their operands as they see fit, including recognizing values of various types that may be encoded within their operand values.
What "see fit" and "encoded" means specifically is not clear. It's not defined in a clear way in the model in terms of the XML representations in a step-by-step way. Illustrate encoded-ness so that there is no ambiguity.
For example, in PHP, the "is_numeric()" operator may be used to test whether or not its operand is numeric,
How is-ness ("is numeric") is defined and how to measure it is not clear. For example, if getType says x is a string but is_numeric says x is a number, which "is" is it? Both "is" a number and a string? When you say an "operand is numeric", is that independent of how/what getType says? If so, how specifically does this work? Using the XML, what are these operators looking and not looking at specifically? What's the algorithm they are using per XML models? Why not have descriptions that "process" the XML instead of just nebulous English words like "is"? "Is" needs to be clarified in terms of is-ativity on the XML, or avoided altogether. Calculate is-ness per XML representation. Why is one operator tied into is-ness but the other one not? What rule of English or your model does this tying? Why is typeName() excluded from the is-ness party? Or is it?
which can include both operands of numeric type and numeric strings.
"Include where"? Do you mean include in the set of "is numeric"?
(E.g., 123 is of numeric type, "123" is a numeric string.)
How does this "work" in terms of your XML model? And why are you using "of" here but not above?
In ColdFusionLanguage, <cfargument type= ...> can be used inside a function definition to reject invocation if the argument (which is always a character string) does not match the type named in the 'type' attribute.
What exactly is matching? How does it work in terms of the XML model? What is the algorithm for processing the XML representation to "run" matching? With an algorithm operating on the XML representation, words like "matching" can be better illustrated in tems of the "parts" of variables so that we clearly and unambiguously see what parts are participating, what parts are not, and their roles if they are.
When I say "psuedo-machine language", I generally mean processing the XML models in a step-by-step fashion. You abandon your XML for much of your description and "process" mere English instead. Process the XML in your descriptions, not words like "is". That is a sin in the religion of clarity.
Like a textbook, the descriptions are text augmented by diagrams, rather than diagrams augmented by text. XML is used only to illustrate values and variables. The internal workings of operators, which may do whatever they like (that's what "as they see fit" means) with values, are not shown diagrammatically. That's because what operators do with values isn't central, or even important, to the explanations. It's obvious that individual operators can do whatever they like with values, including recognising that the string "12122013" is numeric, or the string "12122013" is a phone number, or the string "12122013" is a date. That's what is meant by "encoding". I know beginning programmers have no difficulty with it.
Re: " because what operators do with values [and the parts of variables] isn't central, or even important, to the explanations." -- I cannot believe you made such a claim. It's largely what we argue about. And "encoding" can mean to or from binary, HTML "&" or "%" references, EBCIDIC, etc. Such may mean something specific to YOU when you state it, but the reader cannot read your mind and has to guess which specific kind of encoding is being talked about. This should be dirt-obvious to anybody with a college education.
It may be largely what you argue about and I respond to, but that doesn't mean it's central to popular imperative programming language TypeSystems. Indeed, operators recognising values encoded in values is barely a footnote to the core operation of TypeSystems. In a language like ColdFusion, its TypeSystem is nothing more than variables are untyped, all values are strings, and all operators do whatever they like with their string-typed operands.
And yes, "encoding" can mean "to or from binary, HTML "&" or "%" references, EBCIDIC [sic], etc." That's it. You've got it, and it's not limited to those, obviously.
Re: "...and all operators do whatever they like with their [...] operands" -- That's true of just about any dynamic language: some operators look at (parse) the value, other operators only look at the tag.
I don't know what "look at the tag" means, but "all operators do whatever they like with their [...] operands" is true not only of dynamically typed languages; it's true of any statically typed language too. Of course, static typing obviously limits the kinds of operands that can be passed to operators, so there are inherent limits on what values can be encoded in values of certain operand types. (E.g., not a lot can be encoded in a boolean "true" value.) What it means overall is that encoding values of type x in values of type y is not a distinguishing characteristic of imperative language TypeSystems, and so it's worth mentioning only as a footnote. For the purposes of distinguishing language categories, it is sufficient to observe that statically typed languages associate types with variables and values, whilst dynamically typed languages associate types only with values.
Sorry, I copied the wrong text and have since adjusted the quote. Explaining/modelling the encoding is important to modeling the behavior of interest properly.
The explanation is simply that strings (which are mainly what we're talking about here) can be used to encode values of any other type. E.g., the string "123" may be encoding an integer, a house number, a boolean value, an extension number, a quantity of litres, a sum of money, etc. The string "true" may be encoding a boolean value, an answer to a question, a name, etc. The string "110110" may be encoding a Morse code message, an ASCII character '6', the integer 54, a portion of a monochrome image, etc. What the string encodes depends entirely on how the function making use of the string is designed to decode it.
That may be true, but we still have to model/explain/predict the specifics of all that if our goal is to model/explain/predict the specifics.
Those are the specifics. Further explanation either involves in-depth detail about data representations, or reference to specific operators and functions in specific languages.
A stand-in function such as parsableAsNumber() is often sufficient. StepwiseRefinement can be used if issues/questions still remain.
How does "a stand-in function such as parsableAsNumber()" address your concern that "explaining/modelling the encoding is important to modeling the behavior of interest properly"?
That's good enough for me, but if you want to flesh out "parsableAsNumber()" for your model, experiments, or personal curiosity, that's fine. Our level of StepwiseRefinement depends on what we are interested in modelling/studying. Note that for modeling simplicity, I find it easier to keep all values as string in the XML representation even if the language may internally compress such into binary etc. I agree such may result in rounding differences, but if rounding issues are not of concern to us, then we can skip binary modeling of floating point out of simplicity's sake. Type issues are the concern, not floating point rounding/truncation issues. It's intended as a mental model anyhow (thumbnail type model), and we may want to toss out some of the "weight of reality" to keep our mental boat light. If one does become interested in the intersection of types and the rounding of floating point, then the "binary" issue may need to be fleshed out in more depth, at the expense of a more complicated model. -t
That's fine, and internal representations are not what I'm talking about here. I'm referring specifically and solely to how a string may be interpreted by the operators that receive it as an operand, and not whether or not the string is compressed or represented in ASCII or Unicode or whatever. "Mental boat"???
Some ops will "look at" the tag (A above) and some will look at the value (B). We may want to make a pseudo-code reference such as getTag(varName) and getValue(varName) to "extract" parts of the XML model. If something appears to parse the value, then our pseudo-code may resemble:
// snippet 4792
if parseAbleAsNumber(getValue(thisVarName)) {
result = parseAsNumber(getValue(thisVarName));
} else {
raiseError("Cannot parse % as a number", thisVarName);
}
Etc()
That seems like an awkward way of saying a variable has a type, if that's what you intend. Wouldn't it be simpler to describe a variable as "variable = {value, type}"? However, in dynamically typed languages, variables don't have types. Only values do.
"Has a" is awkward. And "value" is only an abstraction, perhaps a UsefulLie, and there are different ways to model "values". If you can find objectively observable (to programmer) evidence that "values" exist in the way you claim they do, please present it.
A value is a UsefulLie? Huh? What do you call the result of 2 + 4, or the result of x + 7? Here's evidence that values exist and have types:
writeln(4 / 3)
writeln(4.0 / 3.0)
3333333333333
The type of the operand(s) is(are) used to determine which operator is invoked from two (or more) operator definitions, one for floating point operands, and one for integer operands.
I call what you showed "output". Assuming a language is not defined by its implementation, we have no direct way to observe the contents of variables; we can only observe the resulting bytes of I/O operators acting upon such variables/constants. Any speculation as to the nature of "values" is thru indirect affects of such theoretical entities. (Note that diff langs will give diff answers to equivalent snippets. I believe most dynamic langs will give the same answer to both your snippets.)
Yes, writeln() is an operator that produces output. That's not the point. The point is that the choice of operator is determined by the values' type, and/or the format of the output is determined by the expression result value's type. There are no variables involved.
How does one empirically verify your claim? And those are not values, they are constants. (Or at least I prefer to call them "constants" [correction below]. If constants and "values" are the same thing, please provide evidence.)
How does one empirically verify that variables aren't involved? Look at the source code. I don't see any constants used either, by the usual definition of constant, i.e., a named value. PI and E and EPSILON would be constants in most popular imperative programming languages. Using familiar terminology, what you call constants are normally called literal values, or literals for short.
Here's another example:
writeln(foo())
writeln(bar())
writeln(fizz())
writeln(buzz())
writeln(foo() / bar())
writeln(fizz() / buzz())
333333333333
I see no variables, literals or constants; only function calls. What does a function return if not a value? How do you account for the last and second-last writeln() invocations producing different output?
"Literal" is name I'm looking for perhaps, not "constant". My mistake. The above can be modeled as "anonymous variables" in most dynamic languages, as I already described somewhere else that I forgot the location of. I didn't want to get into this discussion yet until we solved the "variable" modeling issue, but you seem to want to press it. I think this all distracts from the two-chamber model of the internal of variables: the type tag and the value "representation" (as you call it).
"Anonymous variables", as familiar terminology, refer to a concept quite distinct from a value. Fundamentally, a variable has or contains something that can be changed; what is that something? If you describe a variable as having a (anonymous) variable, you're caught in an unresolvable circular definition. It is resolved by simply modelling a variable as a container for a value, where a value has a representation and a type, a representation is a sequence of binary bits, and a type is a set of values and associated operators. This model is simple; requires no unfamiliar concepts, definitions or PrivateLanguage; has no unresolvable circular definitions; and fully and accurately can account for all dynamically-typed language behaviour. Statically-typed languages merely add a type to the variable -- it is modelled as containing value and having a type.
// Version A
writeln(fizz() / buzz());
// Version B, equivalent in most dynamic langs
var x, y;
x = fizz();
y = buzz();
writeln(x / y);
is not much, naming choices aside. It's mostly a matter of preference, for they both "work" in simple models. The second just has fewer parts and that's why I like it. I agree the first is sometimes more extendable to more complex modeling, such as arrays etc., but it may be overkill for our purposes, and each language will dictate the specifics of the more complex language aspects anyhow. For example, many arrays don't allow each "value" element to have its own type indicator; rather it's "shared", and thus your "value" tag is not re-use-able as-is for such. In other words, the first XML snippet is PrematureComplexity.
Your model doesn't even work for trivial illustrations like writeln(foo() / bar()) when there are no variables involved. If you're going to get around that by claiming values are variables without names, you're going to force the beginning programmer -- to whom these explanations are presumably targeted -- to embrace a rather abstract notion of variables without names whilst strangely avoiding mention of values, which anyone with a modicum of familiarity with programming, mathematics, or even owning a pocket calculator, will expect to see. Where are these "variables without names" declared? If anything, for beginning programmers you'd be better off dropping variables from the model and dealing only with values.
And you still have not shown that "value" is something objectively measurable. (It's a UsefulLie in my model, but I don't pretend any of the parts are "real". They are primarily intended to predict I/O, not mirror anything "real" about interpreters, if there even is such a thing.)
It's measurable the same way your "anonymous variables" are measured.
They are not. They are merely a UsefulLie in the model. They could be the epicycles of programming; it doesn't matter other than being a part of the model. "Variables" and "values" are in the head. There are potentially different models with different names for the parts that all predict I/O properly. I don't pretend there is a God of Programming that standardizes all that. Your head is not the center of universe.
The last group didn't go through "extraordinary efforts", at least not for those without a PhD. Why the double standard? If anybody complained, they'd say, "Sign up for my expensive class/school and it will eventually make sense after a big lump of time and money." The greedy bastards are protecting their turf.
Academic and technical writing that deviates from standard terminology always makes such efforts.
I thought you admitted there is no "central" standard.
There is no central standards body. There are certainly standards by convention.
Is "representation" part of this "convention"?
Absolutely. Google for "value representation". It's fundamental, though it's frequently skipped in language references -- but invariably covered in texts about language implementation -- because in most language implementations, the representation of a value is encapsulated by the operators associated with the value's type and so is hidden to the language user.
If it's not common to language users, mostly only interpreter builders, then what does it matter? Less than 1% of those will go on to write interpreters for a living.
I think it helps explain how values work, especially as the bright students inevitably ask questions like "Where is binary used; I thought all computers used binary?" or "What does a value look like inside the computer?" or "How does addition work?" It seems to help with overall understanding. For the purposes here, there's little harm in appropriately eliding "representation" from the illustrations, but it makes answering the inevitable questions a bit more difficult.
My model can do the same, except perhaps explain ACTUAL implementation. "Representation" is long and unfamiliar. "Value" is vague, but has familiarity.
And "value" is a vague/overloaded word also. "Literal" and "value" are often used interchangeably in the office, for example.
"Value" isn't vague or overloaded and it has a precise meaning -- it's the combination of a representation and an (explicit or implicit) type. Do not conflate casual (mis)use of language around the water cooler with some general vagueness. "Literal" and "value" are used interchangeably because a literal is a character representation of a value.
Where is this Master IT Dictionary you speak of, oh Dweller of the Deep Dark Caves?
In your imagination, I suspect, for I mentioned no such thing. Indeed, I wrote that there is "no central body that standardises terminology". However, there are certainly established conventions.
Also note we have things (tokens) in the language code we call "variables" and things we call "literals". But there is nothing (no tokens) in most such languages called "values" (although language designers can call them whatever the hell they want.)
Whilst we don't have tokens for values, they are essential to the operation of languages. What do we store in a variable? What does a named constant make reference to? What does a function return? What does an expression evaluate to? What does a literal represent? Despite not having a token for a value -- though a literal is a character representation of a value -- we are inevitably forced to consider values in language semantics.
It's called whatever one wants to call it.
If you like, but anything other than "value" is going to be PrivateLanguage.
What is a private language among interpreter builders?
Re: "Your model doesn't even work for trivial illustrations like writeln(foo() / bar()) when there are no variables involved." -- How do you recon that exactly?
There are no variables in writeln(foo() / bar()), but your model is defined in terms of variables. How does your model account for writeln(foo() / bar())? Again, if your goal is a simple model, wouldn't the simplest model of dynamically-typed languages be to associate a type (or a "tag", if you must) with each value -- which can account for all behaviour -- and simply (perhaps even glancingly) note that a variable is a named container for a value? That way, you wouldn't have to rely on invented constructs like "anonymous variables" to explain language behaviour.
If one "executes" such on paper, there will likely be a reference to the "result" from each of the functions for convenience. That "thing" can be called different names in different models. My model doesn't lack such a thing, it merely names and packages it differently from yours. Thus, your statement that it "doesn't work" if flat out wrong.
I can't see where your module names it and packages it at all. Where have you provided a new name for that which is stored in variables and constants, returned from functions, specified by literals, and obtained from evaluating expressions?
I'm not recommending one build an entire interpreter. The issues at hand are about types, not parsing function calls. For better testing, it's best to put the "values" (for lack of a better name) in named variables so we can examine them later using multiple techniques. "writeln(foo() / bar())" is a poor form for doing that. (Granted, it's possible to make a language where "embedded" expressions return a different result than those moved to variables first, but those are few and far between in dynamic-land such that it may not make sense to complicate all languages' models "just in case".)
I don't see how that answers my question. If you put the result of function calls in named variables, you've still got to deal with what that "result" is and what it is that you're putting in a variable. That sounds like adding complexity to your model, rather than reducing it.
I guess I am not understanding what you are asking for.
You appeared to claim that the "result" from each of the functions can be "called different names in different models" and your "model doesn't lack such a thing, it merely names and packages it differently". If so, where is it?
It still strikes me that your model can be as simple as you like and cover all the cases if you simply associate types (or "tags", if you must) with values instead of variables, with essentially no other significant changes. Why are you reluctant to do so?
You haven't shown where it's a problem for the stated purposes of the model. Thus, I skip the nested structure.
It appears to be a problem for explaining things like "writeln(foo() / bar())", which is precisely the sort of example that beginners ask about.
Not any more than your approach. If your "value" needs an identity for a model, then you have to add an identity attribute or wrapper tag. If don't need one, then it's not really different than an anonymous variable (a variable tag withOUT a name attribute). Your outer tag only provides one name. We can stick that same name in your "value" tag as an attribute and have the same info. Your outer tag is unnecessary. Factor out extraneous levels to keep things simple.
Sorry, what are all these tags you're referring to? Are you referring to XML? The only time an "extra level" is added is when it's necessary to represent a variable -- which is done from the beginning, because the top of this page is specifically about illustrating the relationship between types, values, and variables. If you have a language without variables (entirely reasonable, by the way, for simple illustrative "calculator" languages) there's no need to mention variables, but all else -- including user-defined operators -- can be illustrated with just values that have types.
You have not shown how it's "necessary". It may be necessary in YOUR model, but not in all possible models that forecast properly.
How do you explain "writeln(foo() / bar())" without values?
Again, anonymous variables can function exactly like your "value" tag.
If they're exactly like a value, then "anonymous variables" are being used in a PrivateLanguage sense -- just call them "values". I'm not clear why you apparently wish to avoid associating types (or "tags", if you must) with values. It would make everything much simpler (except variables, which you could otherwise almost ignore, and would merely be <variable name="blah"><value .../></variable> anyway) and you wouldn't need any odd terminology.
What exactly would it "make simpler"?
You wouldn't need to use PrivateLanguage which (presumably) you'd have to explain, and you wouldn't need to introduce variables (or whatever) to explain constructs like "writeln(foo() / bar())".
Your language is a PrivateLanguage, only known to a certain group of interpreter builders who use certain reference implementations. The average developer isn't going to give a flying shit about that small group. And variables are already "introduced", I'm just re-using them.
What part of my language is a PrivateLanguage? As opposed to the language found in every language reference manual?
Your own words, with my highlights: "It's fundamental, though it's frequently skipped in language references -- but invariably covered in texts about language implementation"
Ah, you mean "representation"! As I mentioned before, I could have left it out without harm, but I think it adds much to the usefulness of the descriptions in terms of answering inevitable questions, and I explained it. It's certainly not an unusual or "alternative" use of the term "representation", and it's consistent with every use of "value representation" that I've seen in similar contexts.
It's still a PrivateLanguage, although I'll grant that "private" is perhaps continuous. But, I'm considering the environment of a typical developer, and to them it is or is equivalent to a PrivateLanguage.
Shall I take it out reference to "representation"? It changes little, but it does raise a question: In illustrations like "Value is [ Representation | Type ]" -- which is how values are represented in many dynamically-typed languages and all statically-typed languages -- what shall I put in place of Representation?
That's up to you; I don't have a good suggestion so far when using the two-layer approach you prefer. (One of the side-effects of having too many parts is that you have to name the extra parts.) I don't need the equivalent of "value is..." in my model so I can use it instead for the so called "representation".
But that is precisely the flaw in your model that makes it awkward to explain common constructs like "writeln(foo() / bar())".
No it does not for reasons already given.
Don't you have to introduce variables to explain it? Or claim that values and variables are the same thing, i.e., that values are "anonymous variables"?
I already have variables. It seems to me it's easier to have one "kind" of thing that can serve two purposes rather than have two kinds of things, at least in this case. Just add or remove attributes as needed.
But if you have one "kind" of thing, and that thing can be divided into two distinct groups -- one that has a set of things that always has a "name" attribute filled in, and the other group never has a "name" attribute filled in -- isn't that two "kinds" of things?
See below.
I don't know what you mean by "two-layer approach". Layer???
Two XML tags, one nested in the other.
You mean in variables? Why is that a problem?
Extra layers.
Why is an extra "layer" in one thing a problem?
The single XML statement is because:
1. It's less text, objectively.
2. Subjectively it groks better in my WetWare, and without a real survey, I'll take my subjective preference over yours.
3. The two-layer approach makes one wonder if this is possible:
Certain kinds, perhaps. But often arrays don't allow mixing types. And it would be helpful for illustration purposes to have an index integer.
Some languages allow arrays to freely mix types, and an "index integer" isn't needed unless you're illustrating associative arrays. Linear arrays are trivially indexed by simply accessing the nth value. Therefore, a variable that potentially allows multiple values certainly seems like a good starting point for illustrating arrays.
Like I said, "helpful for illustration purposes". Anyhow, the "ideal" structure probably depends on the nature of the language. And, in my opinion we shouldn't pre-complicate the structures before we are bothering to model more complex issues, like arrays.
If using XML leads to "pre-complication", then perhaps there are better notations.
No, it's you causing the problem. We don't need a nested structure at this point.
See below, starting with "You mean a distinction between variables and values?"
By not having the 2 layer approach, I avoid that potential distraction/question from the reader. You'd have to explicitly state such, but I don't because I'm surfing on existing XML rules.
Should the quirks of XML dictate how a model works?
Every data structure has quirks. We work around them when they are an issue. So far, I don't see any here.
XML isn't a data structure, it's a markup language. An obvious quirk is that the "layering" that XML encourages appears to be having an influence on your approach to modelling. Otherwise, "the 2 layer approach" wouldn't be an issue.
Where are you getting your definition of.......nevermind. Let's not go there.
What, "XML"? Or "data structure"? FOLDOC (http://foldoc.org) has good definitions for both that reflect actual usage.
How about representing a variable's value as an attribute, instead?
Please explain.
Instead of representing a variable as <var name="foo"><value .../></var>, how about representing it as <var name="foo" value="<value type='int' representation='3423'/>"/>? No "nesting".
That's Lisp, not XML. And we don't need either.
You mean an S-expression? No, it's still (essentially) XML. However, wouldn't it be much easier to read if you simply used a tuple notation like Variable = (Value) and Value = (Representation, Type)?
Why have two levels? We don't need two levels.
You mean a distinction between variables and values? I think we do need it, in order to explain how variables relate to the result of expression evaluation, what functions return, and what literals represent. It allows us to easily explain what happens in a statement like "a = b + 3;" -- e.g., "the value represented by the literal 3 is added to the value in variable 'b', and the resulting value is stored in variable 'a'" -- without requiring awkward concepts like "anonymous variables", or artifice like defining values as variables with the "name" attribute left blank. It also most closely resembles a common beginner's analogy that likens a variable to a bucket that can hold a value.
Concept? It's no different than your "value" when there's no name, so it's the same concept, only a different name. Maybe I should use the word "object" instead of variable, but the way I suggest the tests be ran, the model user will be dealing mostly with variables anyhow. It's optimized based on usage patterns.
Why would you use the name "object"? How is your approach "optimized based on usage patterns"? In short, I'm not following your response at all.
Most of my testing suggestions involve using variables because that makes them easier to examine through multiple transformations and output techniques. Embedded expressions are not easy to examine in small test snippets. One can do, "let's examine variable foo after op1, op2, AND op3 (etc.) transforms it." One can "do more things" to a given variable than an embedded expression unless you rely on copy and paste, which is ugly and error-prone in such tests. Does that make sense? Thus, embedded expression issues are only a side-note in my model because they are rarely be used in actual tests. Plus, it's easier to reference things by a name in descriptions and samples, and variables have a name already, so I piggy-back on that fact.
To me, it doesn't make sense in terms of presenting a model -- designed to help beginning or weak programmers -- of certain aspects of language behaviour. However, although I teach beginning and weak programmers on a daily basis, I accept that I am neither a beginner nor a weak programmer (I hope!) so my views may inherently be skewed. Have you yet had a chance to expose your model to your colleagues, or others, to gauge their reactions to it?
Briefly. I drew a big box with the "representation" (value) in it and a small box (tag) with the type indicator in it, and it seemed to click for that particular issue. It wasn't a detailed interview, however.
I target "typical" developers of the kind I typically encounter. Whether they are "stupid" or "bad" or "dumb" I am not going to put a value judgement on it. Humanity is what humanity is. Fight it all you want, but unless you resurrect the Nazis' and their breading programs, we are "stuck" with that. I provide "tools for the masses" as-is. (GodwinsLaw trigged yet again?)
Nazi breading programs? 6 million buns baked in Nazi ovens?
Chuckle. All white bread too!
Tools for the masses are fine, but I think if this thread is to become worth continuing, you need to show evidence that your tools are working. A single case, which could be argued is a description of the conventional model ({representation, type indicator} sounds like a conventional description of a value!), is insufficient to tell.
You haven't shown evidence that the existing explanation techniques are clear and useful to the majority of programmers.
I don't have to. The sheer volume of code written on a daily basis in companies across the globe, the number of working mobile apps released, games of kinds, OpenSource code on GitHub and SourceForge; all of these point to a vast community of programming success.
Like I said gajillion times already, most use trial-and-error and/or defensive programming when dealing with dynamic type issues. It's GoodEnough most of the time, but sometimes one wants a more exact model. Why do I have to repeat this yet again? If you are listening properly, then you'd anticipate and ADDRESS my likely response BEFORE I make it yet again. This suggests you are close-minded and stick to your guns out of some personality flaw.
The problem -- and I'm not sure it even is a problem -- lies with the PHP documentation (we are talking PHP here, aren't we?) and not with any general misunderstanding of TypeSystems in popular DynamicallyTyped imperative programming languages. I have no intention of trying to anticipate your responses, especially when your responses are poorly defended. I'm certainly not going to avoid making a point just because I think you're likely to respond with a weak argument.
Projection. Your arguments are weak. And this has very little to do with Php. I don't know why you are bringing Php into the discussion here. I would note that Php is quite successful code-volume-wise despite having crappy type-related documentation. (Both of us agree it's crappy, correct?) Thus, good type-related documentation is not a prerequisite for language success.
Writing "projection" whenever you have no comeback is not an effective debating technique. I'm bringing PHP into the discussion because everything points to all of this being motivated by frustration with PHP and its documentation.
I felt my first few replies to the same question were adequate as-is. I think your answers are stupid, vague, and/or weak and you think my answers are stupid, vague, and/or weak. Complaining about such over and over is not useful. We both hate each other; live with it. Most of both our evidence on other humans' thought processes are anecdotal only. You seem frustrated by that, but it is what it is. Complaining about that not going to conjure up a Harvard research paper.
I felt your replies were weakly defended, so I opposed your points. I regard the associated debates as unresolved. Furthermore, I don't hate you. I don't even know you. I merely respond to text on my screen. It involves no emotion.
Please make an attempt to bring up the alleged unresolved issues at the original, rather than repeat it elsewhere.
No. That's too much effort.
Anyhow, the tag model was well underway before I ever touched Php. I'm not young.
What language was its inspiration?
Multiple.
I've never heard of a language called "Multiple". What's it like?
It's a dialect of stopJackingWithMe++
And I know people who are "intellectually" dumb as rock, yet are quite skillful with hand tools. They don't run exacting physics models in their head, but rather organically learn it via trial-and-error and experience. Thus, successfully using a tool is not the same as using a rigorous model/formula/algorithm.
I'm not sure what point you're trying to make.
Successful use of a tool is NOT evidence of a rigorous "head model" of the tool user. You keep pointing out that people successfully use languages as if that's evidence they have a rigorous/clear understanding of types. It's flat not.
That's a good point. I suppose people could be subconsciously using types effectively, but not understanding what they're doing. How likely is that, though?
It's probably a combination of planned thought with specific "head models", and intuition. That's how almost every job is. No news there. Some rely more on intuition and "feelings" from experience, some rely more on explicit models (if available). "Understanding" is not necessarily discrete. What exactly is "understanding" anyhow? That's a long and winding philosophical topic. Possibly related: WhatIsIntent.
I would like to quote from somebody else in TypesAreTypes:
"Types are difficult...types are controversial. People don't know how classes and types interact, and there's still the question of whether a square is a rectangle, a rectangle is a square, both are the either, or never the mane shall tweet...Caught in all this is the poor student, who has to learn to use types as embodied in C, C++, Java, Python, Ruby, PHP, and other systems, with a total hotch-potch of "types", classes, coercion, inheritance and lack of true type-safety."
Yes, a classic. But you make nothing easier by proliferating models instead of offering explanations.
Proliferating? Aren't we exaggerating just a tad? And your "explanation" is very poor in my opinion. Come up with a good one and I may change my mind.
Well, yeah, but every proliferation starts with one new thing. My explanation may indeed be poor -- I am not a technical writer. Poor writing does not justify new models. It justifies better writing.
Until somebody finds an approachable way to describe your model (or whatever you call your thing), I'm offering something here and now.
Your suggestions, in the form of your tag model, is rife with flaws. This page and others are testimony to that. Go back and work on it, and keep working on it until everyone agrees that it's good. Don't be so arrogant as to believe your first, glancing stab at an idea is perfect.
You have not shown any real "flaws" in terms of the stated goals. Tradeoffs are not necessarily "flaws", or at least it's misleading to call each node of a tradeoff a "flaw". "Everyone"? Everyone doesn't have to agree, only those looking for something that fits the way they think. Different models work better for different WetWare. Perhaps you are making the false assumption that OneSizeFitsAll or should fit all.
Your conflation of variables and values is a flaw. I see no tradeoff there.
"Values" are not objective things. They are an invention of your model. We've been over this already. (Variables and literals are usually objectively defined per language syntax, but there is no equivalent for "value".)
Values are most definitely objective things, even though they don't have language tokens associated with them (other than literals, which are character representations of values), but they are what functions return, what expressions evaluate to, and what literals represent. If you use a debugger, you can see values being pushed/popped onto/from stacks, stored in and retrieved from registers, and copied to and from memory.
You are talking about specific implementations. I do NOT define languages by implementation, as we've been over many times. Anyhow, prove objectively that "what functions return" are NOT "anonymous variables". Prove those things you see in your implementation-specific debugger are "values" and not "anonymous variables" (or hidden variables or something similar). As far as I'm concerned, the debugger is showing me some kind of "output". Whether that output it's composed of values, thubnikks, or gronklemeisters is a secondary issue and probably relative. Incidentally, we don't need stacks to implement many parts of languages. That's an implementation choice only and the language could use a different approach. In fact, I once wrote an experimental interpreter that processed expressions similar to how one usually does it by hand in math class.
Stacks are irrelevant; I only mentioned them as an example of the kind of activity with values that you can trivially observe. It appears that you're merely using the term "anonymous variable" in place of "value", which is peculiar terminology and disconnected from the term "value", which is one of the first terms that a beginning programmer encounters, along with expressions, literals, and operators.
It's vague and/or overloaded. It's only a naming issue such that I'd change it in the model if I found a better alternative. "Representation" is also "peculiar" and long. It may be worth it to live with peculiar model vocabulary rather than create a nested structure. The structure is more important than the words used in my opinion, but it may be because I'm a visual thinker and not a linguistic thinker. A linguistic thinker may be bothered more by such. I'm weighing the trade-off, and vote for the simpler structure path. If you vote different, so be it. I make assumptions about the WetWare profile of the target audience, and so do you. Neither of us has formal surveys/studies to back our assumptions, only personal experience and anecdotes. (Nor do I necessarily target every developer. Different models may be a better fit for different people.)
I suspect your use of PrivateLanguage will significantly limit the uptake of your model. I bet comments like "what's he talking about?" will be commonplace.
You mean like I do with the existing shit?
Probably.
I'm de-emphasizing English anyhow because the model would still work if we called the parts zots, flozzo's, and muuki's. It may even help by reducing risk of accidental mental overloading with fuzzy or overlapping existing terms. For example, when one uses the word "value", some programmers are thinking of literals such that "result of the expression" may be more fitting for them.
It will be interesting to observe how well that works.
Regarding the one-level versus two-level XML models:
Let me see if I can put this in terms of proportion: About 95% of the tests will or should be based on variables for the reasons given. Your two-level approach may (arguably) be a better fit for the 5% or so of the tests. We should optimize our "structure" design for the most common use cases as long as it still works satisfactorily for the less common cases, which is the case with the single-level structure. I don't see how it is "economically" (in a mental sense) worth it to optimize the structure for the 5% at the expense of the 95% use case. If it made the 5% case VERY difficult or complicated, then I can see a potential justification for such. But that's not the case here.
As an analogy, if 95% of your driving is on-road and 5% off-road, then it would probably make sense to buy an on-road vehicle because they are generally cheaper, ride smoother, and more fuel-efficient than an off-road vehicle. The down-side is that you have to drive slower on the 5% off-road. However, if you fairly often got stuck in that 5% off-road terrain (not just have to drive slower), then it would probably be worth it to buy an off-road vehicle even if it's a bit wasteful (overkill) for the on-road trips.
Why do you think 95% of the tests will or should be based on variables? Furthermore, interaction with variables is simple. Interaction with function invocations is where the subtleties lie, and those clearly need explanation of values.
I explained that earlier. In short, one should make it easy to do multiple tests on the same "object" (for lack of an agreed-upon term). And I don't know what you mean by the second sentence. Perhaps an illustration is in order. Typically tests will resemble:
If you "embed" the "value" directly, one cannot do such without repeating the "value", which is error-prone, and a violation of OnceAndOnlyOnce.
You mean "repeating the 'literal'"? How do you account for what's returned by f1() .. f6() to be printed by write(), and what is it that [some expression] assigns to x?
It could be an expression and so I didn't want to use "literal". In my model, "some expression" creates variable "x" if it doesn't already exist (depending on lang), and then populates the attributes with the appropriate type tag (fills the tag attribute) and updates the "value" attribute of that variable (the XML tag for that variable). I don't have a shorter name for that process at this time: it is what is in the model. And I'm not sure what you mean by "account for".
"Account for" is a synonym for "explain". So what does [some expression] produce that is used to update the "value" attribute of 'x'?
I see no need to model "produce" so far. I won't model stuff not necessary for prediction to keep life simpler. I don't care what the epicycles are made of as long as they produce proper planet positions as output from the model.
All we need to know is that assignments can result in one or more of the following state changes:
1. Creates a new variable (structure) if it doesn't already exist (depending on lang)
2. Can update the "tag" attribute.
3. Can update the "value" attribute.
Which of these 3 actually happen depend on the language.
It will be interesting to see if there's any adoption of your model. I suspect your use of PrivateLanguage will seriously limit it, as your audience will have difficulty understanding how your terminology relates to familiar terminology.
That's your assessment, not mine. I would note you are also using a private language, because most developers are not interpreter writers by trade and don't care about the lingo actual interpreter writers use.