Type System Categories In Imperative Languages Two

Continued from TypeSystemCategoriesInImperativeLanguages...

{See section 2 in http://lucacardelli.name/Papers/TypeSystems.pdf (Note: This describes a formalism for S languages. D1 langauges would need to delay the type judgments until run-time. D2 languages would only make type judgments if an explicit request is made, e.g. cfArgument.)}

That is not a very approachable document. It sounds too much like you guys; which is the last way I'd document anything for typical programmers, the target audience. The tag model is more "mechanical". It essentially uses a kind of semi-abstracted machine language in which we empirically test one of two options for a given operator: the op "uses" (examines) the variable's tag, or it only uses the value, such as for parsing. (There may be ops that use both, but I cannot think of any at the moment.)

We "run" the abstract machine language for both assumptions and see which one empirically matches the actual language output. For languages that never use the tag based on experiments (have no detectable tag), we can know to ignore the idea of a tag and that any "type determination" is done by examining the value and only the value.

TypeTagDifferenceDiscussion provides a starting catalog of experiments to try, if applicable to a given operator. This catalog can be expanded to make it gradually more thorough.

I am not claiming it's necessary the most perfect or most thorough way to test/model languages, but it's the best known optimization of simplicity and forecasting power for common dynamic languages. Or at least it's an alternative if one finds your model hard to digest. Like I said, different model may fit different WetWare differently.

If the abstract machine language is too abstract for your taste (the abstraction leaves fuzziness), we can turn it into a more concrete (virtual) machine language by essentially creating a chip emulator or writing a full interpreter, but I hope we don't have to go that far.

So what's missing as to make it not sufficiently clear to you?

A formal description of your model would help us to understand it, even if we're not the ultimate audience. A "user friendly" or target-audience-oriented document can certainly be derived from the formal description.

Some things that are missing or not clear:

      a = "123"; 
      writeLn(typeName(a)); 
      a = 123; 
      writeLn(typeName(a)); 
      a = 1.23; 
      writeLn(typeName(a)); 
      a = {1/2/2003}; // or whatever the date syntax is for given lang 
      writeLn(typeName(a)); 
. These are questions your target audience will inevitably ask.

If they are as forgetful as you.

I'm not "forgetful" at all. I'm summarising the limitations of your model, and pointing out the fact that these limitations will inevitably be recognised and probed by your target audience.

You have not shown an objective limitation, only word games.

The above are objective limitations.

Not modelling static languages, fine, I'll give you that one, but that's your goal, not mine. UselessTruth. No 100% guarantees, yours has the same limitation from an empirical perspective.

{Additional questions raised by your most recent description.}

                     <var name="a" tag="number" value="123"/>
                     ..............TTTTTTTTTTTT.VVVVVVVVVVV
.
     Test app: a=7;b="7";print(typeName(a),typeName(b));
     Text input: None in this case
     Actual language run results: Number String
     Tag model results: Number String
     Non-tag model N1 results: Number Number
     Non-tag model N2 results: String String
.

You haven't provided such for your model.

There's no need, because our "model" describes programming languages using standard terminology and concepts. The above points are based on your suggestion that we "'run' the abstract machine language" (and the like), none of which is suggested for or required by our "model".

Standard terminology is vague. We've been over this already. Claiming it's "clear" 48 times does not make it so.

Beware of conflating "it's vague" with "I find it vague." The former requires evidence. Do you have evidence? More importantly, however, you are suggesting a regime involving an "abstract machine language" that adds considerable complexity to your supposedly "simpler" model. That seems unreasonable, given that the observational speculation that your model is based on is completely unnecessary. To understand how languages work, we need only look to multiple sources (including their authors!) that describe how they actually work. We don't need to construct models based on speculation about I/O observations.

"Type" has been a vague, abused, overlapping, and overloaded concept/word. I don't know why it has been that way, it just has. Nobody has figured out how to write clearly about types yet. Until somebody figures it out, I'll use a "mechanical" style model instead.

{Any evidence of that? (And it's clear enough for a computer to figure it out.)}

Exhibit 1: The "type=" in cfArgument that never touches the "type" in your model/description.

Computers process machine instructions, not fuzzy English.

{I don't see anything vague, abused, overlapped, or overloaded there. Try again.}

Both call different "things" types.

{No, that's a falsehood on your part. It's been explained to you many times that the types in cfArgument and the types in the rest of our model are exactly the same thing.}

How do we objectively know that?

{Since, this is a model designed by people, you ask the people who made the model. They have complete control over such things.}

I think they died.

{So check their writings on the subject.}

It's too convoluted and round-about. Makes no sense to me.

You mean you've never found an introductory textbook, on-line tutorial, language reference manual, language implementer's 'blog post, course instructional materials, or face-to-face discussion with a language developer that made sense to you?

But back to the question, your model's XML representation of a variable has something called a "type". cfArgument has something called a "type". How EXACTLY are these two related or not related in a way that we can objectively test the relationship? I know the fuzzy "head model" that types are category-like things, but that's in the human head, not necessarily in the interpreter.

It doesn't matter. There's nothing in the descriptions at the top of TypeSystemCategoriesInImperativeLanguages that relies on definitions of "type". We simply use the familiar term "type" in precisely the way it's used in every introductory textbook, on-line tutorial, language reference manual, language implementer's 'blog post, course instructional materials, or face-to-face discussion with a language developer.

So the ColdFusion creators didn't read the correct holy books because your model doesn't touch their cfArgument "type=" thingy despite using the word "type" also. You claim it's covered in the Holy Type Books, but YOU don't cover it.

{We never said that. And why do you think it's not covered? It's been covered repeatedly, it's used to determine which values are valid and which aren't. See? Covered.}

Sorry, it didn't appear that way to me. It touched nothing called "types" in your model. Your typeness doesn't match up with theirs.

{Where does it differ?}

I don't know if it's a matter of "differ", it's at least a matter of confusion. You have part A called "type" in your model, and you use it to model language X which has part B also called "type". Yet there appears to be no clear connection between them. Unless a rational and objective connection can be found between the two, it seems logical to avoid such overlap and give part A a different name. Anybody who spent a fair amount of time writing and had feedback on their technical documentation should know this rule of thumb: Don't give two different things the same name without a very good reason.

Of course. But there aren't "two different things". The two "type" things are the same thing. The connection between them is clearly indicated by using the same name for both.

So you say. I don't see the connection in your model; only in name.

The use of the same name is the connection. Every time "variable" appears, we mean the same concept of "variable". Every time "value" appears, we mean the same concept of "value". Every time "tag" appears in your model, you mean the same concept of "tag", right? So why should "type" be any different? We don't need to draw arrows between every instance of "type". In scientific and technical writing, unless explicitly stated otherwise, it's safe to assume that every use of a significant term refers to the same thing.

But they are NOT the same thing. There are (at least) two very different "kinds" of types in dynamic languages: tag-based typing and parse-based typing. If you want to put them under the umbrella "type", that's fine, but there needs to be a clear sub-division in the models and vocab that's often lacking or nebulous or downplayed. -t

Parsing to determine type is not the same as a LexicalAnalysis to determine type, obviously. The former occurs at run-time, the latter occurs prior to run-time. However, the "type" is exactly the same in both.

You haven't given a metric or clear explanation of how we know they are "exactly the same". Either way, in the tag model, they are not "exactly the same" such that I give them different names. Perhaps you are arguing I abandon the tag model to use a diff model where they are the same. But that probably violates my priorities as previously listed because your model is more complicated (and confusing).

My metric for "the same" is sameness. Whether our model is more complicated or not is for the reader to decide, but there's a trivial reason why it's more complicated: Compared to your tag model, it explains more behaviour in more kinds of languages. Indeed, it is the very basis for TypeSystem behaviour in all popular imperative programming languages.

Re: "in more kinds of languages" -- No argument from me there. But I am NOT looking for a grand god-model of all languages. I want a model that predicts type-related behavior of common/typical dynamic languages and will tune/trim my model for that scope and that scope alone.

Fine, but then you can't claim your model is simpler -- or even less confusing -- unless your model covers precisely the same elements as ours.

It's less confusing for the stated purpose/scope. Static languages have less confusion associated with "types" in my experience because much more of such apps are explicit: it's one of the very advantages of static languages.

Apparently it's less confusing to you, but that's hardly surprising given it's your model. That doesn't mean it's less confusing to anyone else. Perhaps you'd like to recount your personal experiences of showing it to your fellow programmers? How do they react to it?

Very incidentally, the tag model could be extended to more language flavors using a variable model similar to:

   <variable name="foo" static-type-tag="..." dynamic-type-tag="..." value="..." readonly="false"/>
Can you give an example of a language where a variable can be declared to have both a "static-type-tag" and a "dynamic-type-tag"?

C-sharp. Note that the dynamic type is generally ignored if using one of the static types. We could also use nested XML to split variables into parts and simply not nest if using one of the static types. But that's like a data-modeling fight over nulls versus "skinny tables", which I'll avoid here by saying I'm showing the "widest" variable structure here (at least for scalars).

{Can you show us the code with both a static type and a dynamic type associated with a single variable?}

 object x = 2.7;

{I see the static type associated with it is object. What's the dynamic type associated with it?}

Here's how I'd model it:

 <variable name="x" static-type-tag="object" dynamic-type-tag="double" value="2.7" readonly="false"/>

object x = "2.7"; // would instead give:

<variable name="x" static-type-tag="object" dynamic-type-tag="string" value="2.7" readonly="false"/>

double x = 2.7; // would give:

<variable name="x" static-type-tag="double" dynamic-type-tag="N/A" value="2.7" readonly="false"/>

(As described in a prior topic, C-sharp provides two different operators to directly examine (output) these two different tags. It's possible the dynamic tag will also be "double" instead of null or "N/A" in this case. I'd need to run tests to see. But, it's not change-able for most "base" types, and thus "dynamic" may be misleading, but I've yet to find a better term. Perhaps "run-time-tag"? "secondary-tag"?)

{I suppose you could make that work. It's unnecessarily complex since you would have to add the dynamic-type-tag to everything that has a value, and special case your rules when the dynamic-type equals the static-type. Probably better to just go with the simpler}

 object x = 2.7;

<variable name="x" type="object"> <value type="double">2.7</value> </variable>

object x = "2.7";

<variable name="x" type="object"> <value type="string">2.7</value> </variable>

double x = 2.7;

<variable name="x" type="double"> <value type="double">2.7</value> </variable>

dynamic x = 2.7;

<variable name="x"> <value type="double">2.7</value> </variable>

{which has neither of those problems. BTW, what's wrong with looking it up in the language definition?}

Better to verify; Microsoft has made mistakes in their writing.

Again, that's akin to the heated "thin table versus nulls" debate in table design. I see no reason to rekindle that here.

{If there are any differences between what an implementation does and what the language definition says it does, it's the implementation that wrong. This has nothing to do with the "thin tables versus nulls" debate where you wish to complicate things in the name of simplicity.}

That's a false statement. Both ends can make mistakes. If you want to claim X is "simpler" than Y, then please be clear how you are measuring.

{Sure both ends can make mistakes, but when you correct a mistake in the language definition, you get a different language. That doesn't happen when you correct mistakes at the implementation end. }

"Different language" is relative. Every minor bug fix could technically result in a "different language".

{No, different language isn't relative. It simply means that there is a difference between the languages. And yes, every correction to the language definition could result in a different language.}

This appears to be a LaynesLaw loop over what constitutes a "language": the "official" documentation or the interpreter EXE or both. It's probably a pointless debate path and will probably boil down to WhatIsIntent and/or EverythingIsRelative since one can "declare" one or the other the official determination standard but which won't change day-to-day issues for app developers either way, becoming a UselessTruth from their perspective because they just want to finish and ship their damned project regardless of what part of the tool stack is declared "official".

But it's mostly moot because one should check their interpretation of the "official" document even if the document was deemed technically "perfect". If you are comfortable with your own reliability in reading and interpreting of such a document, then the tag model is probably not for you: it's for people like me who find the documentation of "types" vague or contradictory. If it's because you are an elite mind and I'm a big dummy head, so be it. Us dummy-heads want docs useful to us also. So put the big-ass gold star sticker on your forehead and get out of our way, and stick your PersonalChoiceElevatedToMoralImperative where gold stars don't shine.

{As far as I can tell, the tag model is only for Top, since he won't share.}

I tried.


Re: Semantics are not "in the head".

Oh really.

Really. Syntax and grammar are about formation of correct sentences in a language. Semantics are about what the language does to the machine, not what it means to us.

That's an algorithm, not semantics.

No, algorithms are the precise steps used implement semantics. You appear to be confusing the academic field called "Semantics" with what ComputerScience calls "semantics". Syntax is about how a computer language is written, semantics are about what a computer language does to the machine. Algorithms are how semantics are implemented.

So you are admitting you are using overloaded words.

Not at all. Where did I admit that? In general, "semantics" is about meaning whether we're talking about the academic field or ComputerScience. However, in ComputerScience, what language elements "mean" is defined in terms of what they do to the machine. Thus, they're not "in the head" but "in the machine."

Then you are defining a language by implementation, which ideally shouldn't be the case.

{Why would you think that? (That we've defined a language by implementation.) }

"...what language elements "mean" is defined in terms of what they do to the machine"

Ideally, they are defined in terms of I/O, not processing. Otherwise, you are dictating implementation without a reason.

{Ah. You think that any restriction on what the machine does is "defining a language by implementation". In that case, yes we are, but so what?}

Indeed. Furthermore, by "what they do to the machine", I mean that what a given language statement does, i.e., its semantics, are observable in terms of state changes or other actions in a machine. However, this does not mean that state changes or other actions in a machine are the sole or even a significant determinant of how we design programming languages.

We are again in a LaynesLaw loop over "does". Again, teaching programmers about how interpreters "actually work" is not my main goal. I'm building an I/O forecaster model with simplicity as the primary goal over implementation mirroring (per rank chart). If you wish to focus on implementation, that's fine, but that is not my main focus for reasons already given. I'll give them the Newtonian Model which they can absorb in a few hours instead of the more accurate but more involved Einstein model that may take months to absorb. (Besides, you appear to be using the tag model also, but just label it differently and wrap it in fuzzy wording.)

No, we're not "using the tag model". We're describing what programming languages do. Your "tag model" appears to be trying to do the same, but you label it with a PrivateLanguage and leave out significant parts.

"Do" is not defined in terms of programming languages. If you mean leaving out static languages, that's not a flaw but a trade-off decision. The equivalent XML data structure has about 3 times as many parts. All else being equal, less parts is better than more parts. I chose to narrow the scope rather than adapt a more complex structure.

The "leave out" I'm thinking of is expressions.

When we "solve" it for variables in both models, I'll revisit that.

What does that mean? Expressions -- which include variable references -- evaluate to values, which are covered in the description at the top of TypeSystemCategoriesInImperativeLanguages.

What do you mean by, "'Do' is not defined in terms of programming languages"?

How is do-ness measured? If you mean observable input and output, then the tag model does the same.

The debate over "do" came about as a result of discussing whether "semantics" is "in the head" or (presumably) externally observable, not from debate over the "tag model", but if "do-ness" is "observable input and output" (which is certainly an aspect of "do-ness") then it's clearly not just "in the head".

By "observable state changes" do you also mean X-raying RAM during runtime?

You could do that with a debugger -- which would give you output (at least) in places where it might not be explicitly specified -- but you can also do it by simply examining the effects of statements. E.g., what does this statement do to this variable? To the screen display? To the printer? Etc.

Screen and printer? Isn't that called "output"? So how is do-ness materially (objectively) different from output-ness (I/O)?

{I see you've ignored the first one.}

I'm not sure I'd consider a debugger "official" output because often what you see is shaped by implementation. It's a courtesy view. If a different vendor re-implements the language, what you see or don't see in the debugger may be different even if the language's usual output is the same. If vender B's debugger showed different stuff than vendor A's, that alone wouldn't be a reason to call B's interpreter/debugger "broken" or "wrong". Anyhow, back to screens and printers for now. Please finish your answer.

{I would have thought it was obvious, but ok. What shows up on the screen and printer would indeed be considered output. What happens to a variable would not. Since what happens to a variable is part of "do-ness" but not "output-ness", there is, objectively, a difference between "do-ness" and "output-ness".}

I focus on output-ness, not do-ness. Ideally a language should be defined by its "interface" to the world, not its implementation. Commodore greatly simplified the hardware of their C-64 machine throughout the 80's in order to make them ever cheaper, but they are all considered C-64's, and with occasional relatively small exceptions, were considered the "same model" of computer. Similarly, a programming language interpreter may be re-worked for efficiency or to trade space for speed or the like.

{Making up fictional anecdotes does not help your case any. The only non-cosmetic, significant change made to the C-64 was done to reduce power consumption. Most of the cost savings came from the general downward trend in costs of parts in that time period. But back to the matter at hand. You asked for an objective difference between "do-ness" and "output-ness". It doesn't matter one bit what you focus on. If there's a difference, there's a difference. Furthermore, the purpose of programming languages is to tell the computer what to do. Output is only a small part of that. If memory serves, there are even some languages that don't have output.}

A language with no output has no use. The purpose of a programming language is serve humans, not computers. Are you Ceylon or something? That would explain your attitude. If stable control of RAM is your goal, then you'd use assembly or the like.

Regarding C64 changes, "...the original 1982 board had about 40 chips on it while the the final 1992 board had only about 15." http://www.commodore.ca/products/c64/commodore_64.htm

{Sure there are programming languages with no output, such languages rely on the environment to communicate rather than providing output themselves. Yes, the commodore 64 reduced the chip count. They were able to do so by making the each chip more complex.}

Example of using the environment? Programmers are going to want some kind of representation of output when testing anyhow.

{SQL is a language that uses the environment to communicate.}

And that means that different database I/O API's create their own artifacts or have their own oddities, which has drawbacks and can create inconsistencies across them. Still, one can select a representative API or two and use that as an I/O testing reference. Thus, one could say, "Based on Oracle 10g SQL and ODBC used with C, here are the results of...".

{Yes, but it's still a language without output that isn't useless.}

Essentially it needs extra parts to be a complete tool. It's comparable to a car engine without wheels (at least), and the choice of wheels does affect some of the resulting "output" characteristics.

{And it's still a language without output that isn't useless.}

Anymore than an engine is "useless" per se. It just needs some way to make contact with/on the outside world for its use to be felt.

{That's true enough, and there's an advantage to doing it that way. By not including the wheels with the engine, the engine can be used to drive a car by attaching it to some wheels or to provide electrical power by attaching it to a generator. Similarly, by not including output in the language, the environment can do what's appropriate for it instead of having to conform to the language. In conclusion, your statement that languages with no output have no use is clearly false.}

I meant as stated with no explicit extra parts. Empirical testing requires SOMETHING that generates output be included. Otherwise, it's almost like a socket wrench without the end-pieces.

{Claiming something is useless without extra parts is a far different claim than claiming something is useless. Please be more careful in how you say things. So, do you now agree that there are languages that are useful (in the ordinary since) without output? Do you now agree that there is a difference between "output-ness" and "do-ness"? Do you now agree that semantics of computer languages are about "do-ness"?}

This is getting unnecessarily quibbly. We need to have an "output port" to do sufficient empirical analysis. SQL leaves many "output" issues to drivers and API's, and if we wanted to experiment and compare, we'd have to select a reference output mechanism. In some cases such glue-parts may affect the experiments such that we should call what we are testing "SQL plus output tool X" to be thorough. Our interaction with tools being compared still needs an "output port" of some kind.

{How can a counter-example to the claim under dispute be unnecessarily quibbly? Or was that a warning about what you were about to say?}

"Useful" depends on the context, which doesn't appear to be material to the main discussion that I can see anywhere. If we are comparing two or more models and/or tools, we need some reference "output" to objectively compare with, such as to see if Model X's output is "equivalent to" Language Y's output. A byte stream is probably the simplest and most common and thus I elect it as our de-facto comparison format for this discussion. If you can give a good reason to use some other comparison format, please describe the reasoning behind such. We are not comparing query languages anyhow such that I see no reason to drag SQL's into this and muck things up over it. Debugger's can give us a nice view into some of the guts, but for reasons already given, they should not serve as acceptable reference output. Debugger I/O would not be a "safe" source to build a production app around; there is no guarantee or expectation of cross-version I/O stability with debuggers, especially if a different vendor implements a debugger and/or the language. I'm not even sure the OSS version of C-sharp has a debugger such that if you rely on debuggers for comparison, the OSS version will be considered 100% different from MS's version since it always produces zilch.

If a language/tool comes with output operators or mechanisms out of the box, that's the low-hanging-fruit of the "comparison port". At least that's what I am going to use for my descriptions. If you want to select something else, that's fine but I will not recognize it as "official" in my book and ignore it unless a good reason is given. I will agree that "intermediate" I/O-like info is useful in providing clues to the actual behavior (I/O) of a language, but should not be taken as-is as final information.

{This section of the discussion is about the semantics of programming languages. You made a claim that semantics were about "output-ness". SQL is a perfectly good counter example. This came about because I wanted to know the semantics of the semi-abstract machine langauge that is part of your model. You claimed that semantics are "in the head" as an excuse not to answer it. In response we told you that semantics were about "do-ness". Now I don't really care what you use to describe the output. But I do need to know how the statements in your semi-abstract machine language interact with each other, the input, and the output (however you define it). Otherwise, I can't make any use of your model.}

I believe the problem is that you've been "in the guts" of languages for so long, working on compilers etc., that you cannot bring yourself to consider them a black-box from a scientist's perspective. I use output-ness as the reference standard for testing for practical reasons: app developers generally think of "the language" in terms of the I/O, not in terms of actual implementation. They don't care if the interpreters are implemented via caffeinated gerbils on Tinker Toy treadmills, as long as the I/O is as expected, and in theory it could. My model may ignore actual implementation, but as long as it provides forecasting ability, it does its stated job. It's somewhat comparable to math regression and epicycles (done right) in that it makes no claim to mirror the underlying mechanism: it only fits curves. I don't know what you call "semantics"; I cannot read your mind. I try my best to explain the model, and if fails for you, then I'm currently stumped. You are probably not the best specimen anyhow due to your "guts exposure" per above.

{I don't work on compilers, so that can't be it. I've never met an app developer who thinks of language in terms of I/O. I'm not talking about actual implementation either. What I mean by "semantics" is the usual definition of the term for programming languages. I.e. what the language requires the computer to do. For example, "x = 10" in many computer languages would require that the computer take value "10" and store it in variable "x". That is what is meant by semantics, and that is one of the things necessary to "run" your semi-abstract machine language.}

My model (as given) uses an XML representation of a variable and the examples precisely show where the hypothetical interpreter (candidate models of ops) looks and/or changes with arrows pointing to the very specific corresponding elements of that XML representation. I don't know how to make it any more explicit than that on a wiki. (The specific steps a hypothetical interpreter(s) takes for a given operator depends on the specific language being modeled. I give suggestions based on typical/common patterns found in the wild. I agree these suggestions ideally need better cataloging and regimentation, but that shouldn't be a show-stopper to seeing the general usage of the model.) -t

{Yes, we've seen the XML representation of a variable. But an XML representation of a variable is hardly a semi-abstract machine language. Where's the rest of it? You only have one example I could find with arrows, and it only gives one rule. There you say it looks for quotes in the source language. You later said that quotes weren't necessarily important, that it would it depend on the source language. How can I tell if it does? (And before you say "experiment" keep in mind that I need to know this to set up the experiment in the first place.)}

That was a specific sample language in which quotes did matter. One does experiments to see if quotes are important (affect results) in a given lang/op. That's Science 101, I shouldn't have to re-state such.

{Show me the experiment you would use to show that the quotes matter.}

Observation 1 in TypeTagDifferenceDiscussion is a simple one. There are more involved ones, but are language-dependent. Here's a JavaScript example:

 // Example quote03, numbering for reference only
 1. a = 123;
 2. b = "123";
 3. alert(a + a);  // result: 246
 4. alert(b + b);  // result: 123123

{Your experiment to determine that quotes matter appears to be the same as your experiment to determine if the language has tags. Why the different conclusions?}

What's different? The tag model can "explain" how the quotes affect the results. "123" applied with quotes makes variables behave different than those with 123 applied without quotes, per experiments (not all of which are shown here). This behavior can be modeled in the tag model by having the quotes "set" (affect) the type tag, which then later affects how "+" behaves. The "quoteness" in statements 1 and 2 appear to affect the state/behavior of the variables that carry over to statements 3 and 4. (We could swap lines 3 and 4 to verify the ordering doesn't matter.) Thus, whatever model is used should have/show a mechanism to "save" this state (quote-ness) with or associated with the variable. I choose an XML representation that uses a "tag=" attribute to explicitly carry this state along with a given variable (a data structure that represents the state of a variable).

Keep in mind it's not the only way to model this phenomenon, but it usually works and it's relatively simple.

{What a round-about way to say that values have types.}

"Have" is a vague word. I model the have-ness explicitly. There is an explicit data structure to illustrates this have-ness and one can see the data structure explicitly change its type as we run through a hypothetical interpreter. It can also illustrate how parse-based "typing" ignores the tag (either because of a given operator's implementation, or because the language has no tags). Your verbal approach does not clearly distinguish between these two "kinds" of typing techniques, and that is a big failing of it.

And in colloquial "type" discussions, something that can be parsed (interpreted) as a given type is often said to "be associated with" and/or "is" that type such that the colloquial approach also fails to distinguish between them. (Typical implementations of isNumeric() is an example.) Parse-based typing does not exclude "associated with" (have) and thus associated-with applies to both typing approaches. I'm looking for a model that makes the distinction clear as night and day. You seem to value fitting existing spoken language usage ABOVE clarity of this point, and that is a big mistake in terms of having a clear model. Parse-based typing does not change state in my model because parse-based typing does not use state (or at least acts as if it doesn't). --top

{How can it make the distinction clear when you can use parsing to explain the above code as well?}

That's why "has" is not good enough. It doesn't distinguish between parsing and non-parsing. Remember, this is in response to "What a round-about way to say that values have types".

{How does your model distinguish between parsing and non-parsing? The code above can be explained by setting your tag. It can also be explained by parsing. (Our model find the distinction between parsing and non-parsing irrelevant. So why bother explaining it?)}

How can it be explained by parsing? (Note I am talking about the processing of "+", not assignment statements 1 and 2.)

{Line one sets a to "123". Line two sets b to ""123"". Line three parses the value of a and sees only digits. + therefore adds the values numerically to return "246" and alert outputs "246". Line four parses the value of b and sees a non-digit. + therefore concatenates what's inside to return ""123123" and alert outputs "123123". See, just parsing.)

Are you saying the variable "keeps" the quotes along with the value (digits)? That's indeed one way to model it, but creates a lot of confusion, especially with embedded quotes. Plus, it can be argued that "keeping" the quotes is just another form of tagging. Further, it doesn't work so well for Boolean values and other types since there may not be an equivalent to quotes for them. For example, 'd=date("12/31/2013");' may be the way to generate variables having the explicit type of "date" in some langs. And, you have to do "quote diddling" in your model when you concatenate strings. I find that a clearly-separated "tag" makes modeling smoother.

{Yes, the string literals in that code snippet are stored exactly as they appear in the source. Yes, you could argue that it's another form of tagging, but that's the whole point. Your model can't differentiate between parsing and tagging since any piece of code can be explained either way. Since we can encode any value into a string, there is absolutely no problem at all handling booleans, dates, or even complex structures in a similar manner. Yes I do "quote diddling", but that's just something + has to parse the values for. It's not otherwise special.}

Please explain "can't differentiate". You seem to be viewing it all wrong. (There are specific cases where the result is the same either way, but then it doesn't matter which path you choose to keep.) And yes, I already agreed one can model other types in a similar way, but it's essentially an ugly form of tagging, almost like old-style BASIC's type markers for variables. In fact, BASIC did it better than you because BASIC only needed one character and it's always in the same place.

{What I mean by "can't differentiate" is that any combination of source code, input, and output can be explained by using tags and it can be explained by using parsing. There is no way, using just source code, input, and output to tell if it's one or the other. (Note: It's not just specific cases, it's every case.)}

No, not unless you go to a different model with a different vocab and conventions to force it one way or another, cherry-picking the model per op.

{Nope, using what little of the model that you've been willing to articulate so far and the exact same vocab.}

Please demonstrate. I don't see it. Specifically, how can line 3 and 4 produce different results if the tag is not inspected by the interpreter?

{The values are parsed. a was set to "123" and b was set to ""123"". Since a contained only digits, + used numeric addition. Since b contained something other than digits (in particular the first and fifth characters are '"'s), + used concatenation.}

If you put the tag inside the value, then yes you have to "parse" to get at it. But that's just silly word-game playing. My model doesn't put it inside the value, which arguably is a value plus a tag and not just a "value" anyhow. I'd challenge calling it just a value if you did that. It's a value with a tag(s) embedded.

{No. It's just a string value. There's nothing special about the '"'s. In fact, this language has a concatenate function does just what it sounds like it does, without parsing. In this case, alert(concatenate(a, a)) would display "123123". alert(concatenate(b, b)) would display "123""123".} You could also do alert(a + concatenate(a, a)) and it would display "123246".}

What is "this language"?

{The one I'm using to show that your model can't differentiate between languages that parse vs. languages that use tags.}

Do you mean actual implementation? Actual doesn't matter; the primary purpose of my model is NOT about modelling actual implementation. I ranked the priorities in a list. Did you forget the list? A programmer is not going to know by I/O whether a given language actually puts the type marker(s) in with the value or not inside. I would like to point out that your toy language still has two different ways to "calculate" types: search for the type marker(s), or ignore the marker and look only at the value's characteristics. The first would be used for typeName-like ops, and the second for isTypeX-like ops, for example.

{No, I mean in your model. In your model, I can explain any combination of input, source, and output in both ways. (And it wouldn't matter if my toy language had ten million different ways to calculate types.)}

I don't believe you. show it.

{Just scroll up a little bit.}

MY model does NOT shove the quotes up the value. And for the sake of argument if we got drunk off our asses and did it that way, one can still see it's a very different process to scan for the type markers versus analyze only the value bytes, ignoring the type markers, per diff op modelling.

{As presented, it doesn't violate any of the rules of your model to "shove the quotes up the value". Stuff like this is why I asked you about restrictions when translating to your semi-abstract machine language. You refused to answer. And yes, it's a different process, but that's an implementation detail you wanted to ignore. You only want to use the source, input, and output to differentiate between "parsed" and "tagged". Since I can map any combination of source, input, and output to both the tagged and the parsed models, you simply can't differentiate between them that way.}

None of the many examples do it, yet you go right ahead and drive off the road and into a creek. What keeps somebody from not pulling the same trick in your model? And my use of "parsed" versus "tagged" was within the model, not general.

{So what? They're just examples, and can only tell you about that particular combination of source, input, output, and mapping to your model. The things that prevents them from pulling the same trick in our model are the are rules against it.}

What rules? If you had good, clear rules, I would have stolen them already for the tag model.

{The ones that say "Every value has a type..." and "Every value is represented by...". You check the language definition and see which one the language says it does.}

"Has a type" is vague for reasons already given multiple times. And I doubt the "language definition" tells you that quotes are "kept with" the value in most languages we are interested in. And even IF they were, that doesn't mean we should necessarily model the language that way unless it has a clear advantage in the deviation from the norm.

If you're going to allege that "has a type" is vague, you need to provide evidence that it's vague. I find it difficult to believe that it's vague, given that "value has a type" and "variable has a type" are familiar descriptive phrases used in both technical documents and formal treatises with no apparent confusion. Without some compelling evidence to the contrary, it's simplest to assume that rather than being vague in general, you simply don't understand it. I.e., the problem is yours and yours alone -- or perhaps one shared with or only found in very poor programmers -- rather than a characteristic of the phrase "has a type". If it's a misunderstanding among very poor programmers, then I doubt your "tag model" is going to help, but if you have evidence otherwise -- like you've tried your model on programmers in an experimental setting to observe their reactions (you know, that "science" stuff) -- then I look forward to reading about it.

See above near "colloquial". The explanation of the apparently contradictory responses between typeName()-like functions and isTypeX()-like functions, for example, are not fully addressed in colloquial-land and papered over. I want a model that makes the distinction as clear as possible. There is definitely (at least) two kinds of "type" detection processes in dynamic languages, regardless of what we call them or how we model them.

{What apparently contradictory responses between typeName()-like functions and isTypeX()-like functions? If you're talking about ColdFusion's cfArgument, then we did explain it without contradiction. So regardless of how contradictory it appears to you, it's not. As for typeName() and isTypeX(), the language defines what they do in terms of the type system used by the language. How is that not fully addressed (it can't be any more specific without being more specific about the language(s) in question.) or papered over?}

No, I'm thinking more like Php's getType() versus is_numeric():

  // Example Php04
  a="123";
  print(getType(a));  // result: string
  print(is_numeric(a));  // result: true
Thus, in Php, a could be said to be "string" and "numeric" at the same time. And you are right, the behavior is "per language", but we can model such behavior using the "tag modeling kit" for a good many dynamic languages.

A curious programmer may ask, "how can it be both at the same time?" The answer, using the tag model, is that getType looks at only the tag, while is_numeric looks at the characteristics of the value, not of the tag, and the value "can be interpreted as" a number (based on parsing the value) because it's all digits.

There is only one IS-A because there is only one tag "slot" in the model. There can be many "be interpreted as" because a given set of characters can successfully be interpreted as different "types". (Although Php is inconsistent in that some isX functions only look at the tag, and programmers have complained about this inconsistency.)

{Or, they could read the language definition. They would then see that the value "123" has a type of string, and getType() returns that. They would also see that is_numeric() returns true if and only if the value passed in has a numeric type (as returned by getType()) or it's a string that can be converted to a numeric type. Viola, we've simplified things so that there's no need for your tags at all. The fact that it's simpler this way becomes especially clear once you include in your description how PHP decides how to set your tag, something you've been leaving out.}

You've given no explanation/model as to why getType() returns what it does or how long it does it. Also note it's simpler to model is_numeric as always parsing because it's one step instead of up to two. Granted, under the hood the interpreter may check as a speed short-cut, but it's otherwise unnecessary for a prediction model. Occum. And "see that it has a type of string" implies a simple relationship. IS-A/HAS-A ain't good enough as we can see because is_numeric is also "asking" what it "is", and gives a DIFFERENT answer. Is has-ness different than is-ness?????? I'm assuming they are the same and so we've got two is-a's going on giving different results. It's just sloppy fuzzy notiony words with contradictions not explained. Why do you have such an attachment to fuzzy language? It may "mean" something clear in YOUR head, but I only see overlap and fuzz. Bill Clinton was right about one thing: "is" is a fuzzy word. It's much much clearer and clean to me if we define the var as a data structure with two clearly separate "compartments" (XML attributes) and model operators as reading one or the other compartment (depending on best fit of results). And replacing "getType" for the tag attribute doesn't simplify anything, in fact makes it worse because you are not modelling how getType "works", it's just a function floating around in space that does magic. An attribute is a simpler part than a function. That tag model is far more "mechanical" and visual and we can step thru it like clock-work: tick tick, look at attribute X, tick tick, look at attribute Y, tick tick, etc. Maybe I'm just fucking language-blind, give me a damned visual. I don't "get" your overlapping is-a/has-a shit and I'm fucking giving up. You are Lewis Carroll reincarnated on LSD, which that fucker needed like a hole the head. If Lewis Caroll and Dr. Seuss had a bastard baby and fed it LSD milk, it would grow up sounding much like you: The is-a has-a was-a fizz-a, typing madly until it is-a, but tell the mothah' it was-a fuzzah, tazing and typing and wiping and swiping tags with names of bags that have no links to values that blinks and shrinks the kinks until it is-a link to a type of hype you cannot wipe until it becomes smelly tripe.

'''

    What type am I? Asked this guy. 
    Am I number, Am I string, 
    or am I something in between? 
    Is it what I am, or what they see, 
    inside my guts, or their view of me?"
    Must I be one, or can I be both?
    Or is duality, something to loath?
'''

It's not fuzzy to us.

Nice rant, by the way.

{I did too tell you why getType() returns what it does. It's defined to return the type associated with the value. In this particular case, the value is of type string, so getType() returns string. I didn't tell you how, but that's an implementation detail. Something you've repeatedly stated you wish to ignore. (I don't know why you care about how long it takes). Yes, if you read the language definition for PHP, you would find that there is a single type associated with every value. Yes, is_numeric() doesn't return true just for those types that are defined, by the language definition, to be numeric. It's defined to also return true for certain string values as well. Occam says to cut the tags entirely, since you need to know the types associated with the values to set your tags up in the first place. (I.e. the step you keep sweeping under the rug in order get your model to appear about as simple as ours), and once you know that, you already have all you need to know what getType() will return without having to use tags. There is absolutely no doubt that "has" is different from "is". That's something you should have learned in early grade school. After that, you appear to have blown a neuron. Take a deep breath, and try to post something coherent next time.}

"Is" is vague. That's something I learned in college. Categories are in the head. "But that's an implementation detail" sweeps a big step under the rug. It's a detail that should be modeled if we want a good model. You are right, I should take a break from "types" here. It's getting very frustrating. It is frustrating.

If "that's an implementation detail" is supposed to be part of a model, how do you reconcile your previous claims that your "tag model" is a model and not about implementation?

We need to "explain" that part one way or another; not just say a function magically does something vague. The explanation can be virtual, as in a model that produces the right output. It has to be clear, not necessarily "real". If one uses epicycles to model planet movement, that's fine, as long as the epicycles are sufficiently described (and predict planets properly).

The "implementation detail" we're referring to is how the type is associated with a value or a variable. It doesn't matter whether it's a tag byte or a type name or a type ID or a pointer to a type definition or the topmost item on the "type" stack at a given point when traversing the abstract syntax tree. "Associated with" is entirely sufficient, because "x is associated with y" -- as in a variable is associated with a type, or a value is associated with a type -- simply means that given an x we can answer questions about y. When we say a variable x is associated with a type y, we mean that given variable x we can answer questions about its type y. Or, given a value x, we can answer questions about its type y. We don't need to say that x has tag byte y, or x has type name y, or x has type ID y, or x has a pointer to a type definition y, or when we encounter node x when traversing the abstract syntax tree we can find y on the top of the "type" stack, because all of these mean precisely the same thing. "X is associated with y" gives us all the information we need with no extraneous detail.

Well, okay, but give the "association" a name, and make it clearly separate (named differently) from OTHER associations or type-association-like processes or artifacts, such as is_numeric()-like results. The best way I've found so far to do this is with XML because it's familiar and has relatively clear rules. If English were good enough by itself, we'd never need XML and computer languages and logic notation systems. If you want, you can use circle-and-stick graphs, but just label the lines and the nodes so we can write rules and instructions with clear references to the parts. In short, avoid anonymous associations in models if you want to make sure they are clear to the reader. I called the association in my model the "type tag". It has a name, and there's only one per variable per XML attribute rules. I don't see the one-ness limiter in your model. If somebody is running a "by-hand" interpreter in their head or on paper, then they have a clear choice, model a given op as using the "tag=" attribute or the "value=" attribute from the variable's representation. The choices are clearly distinct and their association with "variable" is clear because of the XML. Note that a stack may be overkill for the intended use.

PageAnchor: Assoc02

ALL associations should have at least these:

These are not too much to ask.

{In general, they are. They might be good questions (outside the first, which is simply a matter of convenience), but none of them are necessary.}

Bull! Mind-only assumptions are arrows in the back of good documentation. And if a specific language has certain limitations, they should be included in the model for that language. It's a given that the model will need to be customized per language (unless we have a Swiss Army Model with way too many parts to switch off or ignore). If you are dead-set believing that your verbal descriptions are "good enough" and won't change your practices, then there is no need to continue here because clarity is a second class citizen to such stubborn personalities.

Re: "Totally unnecessary": even if YOU think it's clear, it does not hurt to make the above explicit.

It is explicit. Our model calls it "type" -- short for "type reference" -- as is done in pretty much every language reference, ever. You call it a "tag", as is done in a handful of descriptions of particular language implementations using very specific implementation approaches.

I have already described potential overlaps, confusion, and discussions with rank-and-file coders more than twice and I won't repeat them here. Let's LetTheReaderDecide if existing material is clear on this matter or not. If the reader thinks it is, then they have no need for the tag model. Done!

The tag model has served me well. I hope others will also find it useful if English-centric approaches are not working for THEM. ThankYou, --Top

Yes, over time, let's see how many visitors demonstrate preference for your "tag model". It's clear how many support it now.

You've done no reliable survey; anecdote against anecdote.

How many edits have there been in support of your model? Aside from the two of us who oppose it, I've seen at least two responses that seemed confused by it. One thought "tag" meant a data definition, another thought "tag" was the same as C's typedef.

I have no idea what text you are referring to. And again, the subject is a model/tool for I/O prediction such that what a tag "is" is irrelevant. I have abandoned attempts to define it at this point (for now). I have used XML as a working representation due to its familiarity, but if somebody wants to replace with something else on their own, that's fine. I would note that your model is not based on clear definitions either. (You probably think they are clear, in your head, but you often mistake your head models for universal truths.)

One of the confused responses -- the one that thought "tag" meant a data definition -- is one you responded to not long ago on ThirtyFourThirtyFour. The other was a few weeks ago on a different page. Regardless, it's quite apparent there has been no articulated support for your "tag model" on this Wiki. Have you tried it on your work colleagues or other developers? What was their reaction?

I still don't know what text you are referring to.

See the text on ThirtyFourThirtyFour that begins, "Well a tag may be a data definition, though I cannot be sure of that from what I am reading, so call it a "Datad". When we work out what it really means, it may become pervasive. It may be data adder, or data addressor, or data administrator, or data dictionary or ...."

I have yet to test the tag model thoroughly on others, but like I said, few if any have appeared happy about the current state of affairs. In general most don't seem interested in the details and spot-fix any issues and move on. Anyhow, we've been over this popularity contest talk before. Let the fucking readers decide.

If in "general most don't seem interested in the details", what makes you think they'll be interested in the details of your tag model? I'm not clear why you wish to limit it to the "fucking readers", too -- wouldn't the virgins and celibates be equally interested?

I used to be less curious about such, spot-fixing any odd issues and lathering up on defensive wrapping. Later I decided to poke around some more rather than live with fuzz. Most indeed don't care, but it's nice to have a model and testing kit for those who do.

Where is your testing kit?

I already explained it multiple times, but for some reason you don't seem to get it. I don't know where the communication break-down is; I cannot read your mind and your feedback is too vague for me to process.

On what page is it documented?

Are you sure the problem is in understanding TypeSystem categories defined by your model, as opposed to simply developing a better understanding of the language-specific peculiarities of (I presume) PHP, ColdFusion and JavaScript? In other words, does your model do a better job of explaining PHP peculiarities than the PHP manual?

Yes!

Can you give an example of where your tag model explains something that the PHP manual does not?

See PhpTypeSystemDiscussion.

I'm going to say it's vague and you are going say it's clear and we are going to re-argue the same points all over again.

Perhaps you could point out specifically what you feel is vague? It's relatively easy to identify vagueness by highlighting absent definitions, dangling references or WoodenLanguage, but not so easy to prove clarity as it's inevitably subjective.

I tried that before, and it just seems to lead to fractal vagueness.

There are things in ComputerScience that are axiomatic and/or abstract and either have to be taken at face value, or you have to see what the code looks like that implements them. Your comment reminds me of someone I once met who couldn't come to grips with set theory because given a set like X {a, b, c}, he had to know what a, b, and c were. If you told him they were apples, he had to know what kind of apples. If you played along and said they were Mackintosh apples, he wanted to know whether they came from the same tree or different trees. If you told him they came from the same tree, he had to know where the tree was. And so on.

There are some unpleasant people with dysfunctional behaviors you remind me of, but I'll save my mud slinging for a time when I'm pissed instead of just irritated by you.

Why are you irritated by me? That's an oddly emotional reaction to what is no more than text on your computer screen.

Part of the problem appears to be that there are multiple ways to model the behavior (I/O) of a given language, and you wish to limit such models to a traditional standard (or what you believe to be a traditional standard) where-as I'll happily blow up tradition if it gets in the way of stated goals. YOU want to limit models to only Mackintosh apples.

It's not even a question of "a traditional standard". What I and others have described is how popular imperative languages are actually constructed in terms of values and variables and their relationship to types. We have explained not only why the language behaviour or "I/O" is the way it is, we add a bonus of describing how languages are actually built. How is your model simpler or superior to that? Can you match the parts of your model with the parts of our description, and show how and where your model improves on it without loss of explanatory power?

I cannot quite figure out your model because there is too much English and not enough data structures. But anyhow, I place simplicity of the model (for its target purpose) above fitting actual implementation. If epicycles created a simpler model than Newton and it accurately predicted the motion of the planets for a sufficient time-frame, it'd go with epicycles over Newton.

I've tweaked the descriptions at the top of TypeSystemCategoriesInImperativeLanguages in an attempt to make them more readable. I've also brought in my descriptions of operator invocation from ThirtyFourThirtyFour. The "model" hasn't changed, but hopefully the descriptions are clearer. Again, I'd be curious to see -- if you still feel your model is simpler or superior -- how and where specifically you believe it to be simpler or superior. In particular, can you match the parts of your model with the parts of our description, and show how and where your model improves on it without loss of explanatory power?


SeptemberThirteen


EditText of this page (last edited November 22, 2014) or FindPage with title or text search