Tops Dynamic Types Done Right

TopMind's description of an "ideal" dynamic typing (or un-typing) system that is or emulates "type-tag free". Lack of a tag is allegedly a LiberatingConstraint.

WYSIWYG Typing - No "hidden type tags". Operators look at the variable's value and ONLY the value. There are no "side" indicators other than the value.
- Discussion moved below.
No "funny modes" - No nulls, nils, NAN's, etc. If you need them for database compatibility, then use the string "[null]" or the like[1]. They cause far more problems than solutions.
No symbol operator overloading - "+" is for math only. A different operator is used for concatenation, such as ampersands or periods[2].
- What is the result of "abc" + 3? A NonNumericException?? Or some NaN value? How is NaN represented?
- NAN is excluded per "no funny modes" rule above. If a particular function or library wants to make it produce a string "NAN", that's fine. I'd prefer ("abc" + 3) trigger a conversion error. But error-handling is slipping off topic. If one really needs to check before hand, then an "isNumber" operation can check, generally by parsing. Keep in mind that if your domain is math-intensive, then a more "traditional" language may be in order.
If you really need type tags for compatibility or efficiency, then don't make the base language and libraries depend upon tag existence or be affected by its existence. Expose it to the programmer as little as possible.
Lint-like tools should be available to point out "fishy" usage for further manual inspection. Call it "soft" type/usage-checking.

--top

Re: "WYSIWYG Typing"

I assume with value you essentially mean a byte array (with fixed queryable length). Or do you mean a string (which is an array of chars which leaves open some encoding)? Or do you mean a "void pointer"?
I wouldn't want to unnecessarily dictate the implementation. Most such languages generally act like the values are stored as strings. Note that for compression/efficiency purposes, they could be stored as integers, doubles, etc. (and serialize-able back to strings if the usage pattern or content changes.) However, this should be transparent to the programmer/user, similar to file systems with a file compression option.
I didn't mean the implementation. I meant the "programming model". But I see from your comment above that you want numbers to look like strings. And probably structures to look like strings also. This would simplify conversions (but also more bug-prone).
- As far as "error-prone", errors can be introduced by having an over-complex and verbose typing system that confuses and distracts from the domain logic. Having to mentally "consider" what the tag's value/setting may be adds another layer of complexity to a programmer's mental model of the code activity, roughly comparable to two-value logic versus ThreeValuedLogic: more states and more potential combo's to have to consider. -t
Note that you proposed "compression", i.e. storage of numbers as ints of course implies a hidden type flag (one that is completely hidden for the programmer (except for performance measures)). See TypesServerTwoPurposes.
This is primarily about the model that the programmer sees and uses. A compression model should be able to be swapped with a non-compression model and the source code still produce the same output. Consider that maybe Intel chips are using tags galore under the hood for efficiency, but we don't care as long as it executes our instructions per spec. If we swap our chip for one that doesn't use such techniques, our output should still be the same (but maybe slower).

Footnotes

[1] I'm assuming zero-length strings are not considered equivalent to Null. In some systems they are.

[2] I actually prefer periods be reserved for objects and/or maps. "->" is too a bit verbose. Related: MergingMapsAndObjects. -t

What does "no hidden type tags" mean in practice? Can you give an example of behaviour with "hidden type tags" and without "hidden type tags"?

See DefinitionOfTypeTag.

Yes, I've read that. I'm afraid none of what you wrote made much sense (but some of the queries and criticisms about what you wrote do make sense) and I write compilers and interpreters, so I do know something about the subject. There seems to be a consistent conflation of implementation and model, and a consistent confusion between type and "tag" (presumably), that I find confusing.

And it all seems to hinge on strings. By "no hidden type tags" do you mean all values are represented as strings, but strings are cast -- by operators -- as needed to other types?

I agree that it's a difficult topic to formalize because terms such as "value", "type", "canonical representation" (output), and "string" are not rigorously defined. I don't know what to say other than English is not serving us very well. The main goal is to make it so that the developer does not have to "think about" a type-tag. We want to simplify his/her life by removing that concern/factor. -t

Is there some problem with the English phrase, "all values are represented as strings, but strings are cast -- by operators -- as needed to other types"? That seems to be exactly what you're trying to describe.

Well, what exactly does "casting" mean? How does one externally measure that casting has happened?
I've shown an example below. If "2" + 2 = 4 or "2" + "2" = 4, then "2" must have been cast to a numeric value because addition is a numeric operation.
"Treated as" or "interpreted as" by the math engine perhaps, but that still doesn't define "cast".

It can be used to explain why in one language 2 + 2 = 4 but "2" + 2 = "22", yet in another language "2" + 2 = 4. In the former, values are associated with types, such that 2 and "2" are distinct. In the latter, "2" is the same string as 2 because all values are of type string, so the '+' operator must internally cast strings to numeric values in order to perform numeric addition.

If all variables "are" strings, then there is only one type, and therefor no type. There is no need to track that something "is" a string in the interpreter if everything is automatically and always a string. It becomes meaningless in the realm of that language to talk about "strings". It's roughly analogous to the fact that our laws assume human beings, and not intelligent aliens from other planets. We don't have to encode such into our laws because the scope is assumed to be about Earth humans. It simplifies our laws and reasoning about our laws to not make such distinction. (Now, "corporate personhood" is another messy matter.)

What do you mean by 'variables "are" strings'? Do you mean "variables are of type 'string'" or "values are strings"? I'll assume the latter. If all values are strings, then all values are of string type. In some circles, that is called "un-typed" or "typeless", but that's a terminological convenience rather than a conceptual notion. No one thinks it really means "there are no types"; it's merely shorthand for "all values are of string type."''

Without a rigorous definition of "string", this is hard to test either way. Just because one can extract a particular character from some object does not by itself mean that object "is" a string. It only means that we have available to us techniques to extract a particular character from it. (This gets into the philosophical debate over whether "types" are defined by the operations that can be done on objects of that "type", or some other characteristic, such as implementation.)

A rigorous definition of "string" isn't required, especially if every value is of type string, but even if values are of differing types. In the latter case, "string" is whatever the language designer decides it will be.

If every language can define it any way it wants, then it's difficult to talk about "strings" in any consistent way. We probably need working definition to take this further. Let me ask you this: From the app programmer's perspective, how does one know (test) that "every value is of type string"?

In any general-purpose language, and most domain-specific ones, we can presume "string" will be used in the usual canonical sense.

In a language where every value is of type string, this is typically stated by the language designer -- in some cases as a feature. There is no test that will definitively prove that all values are of string type.

So the language designer could call them "Foobnixia". This reinforces my suggestion that "type-free" and "only has strings" are not really different. Could you propose a tag-free or type-free sample language that doesn't "use only strings" as a comparison here? (That supports "typical" functionality.) -t

This seems to be going in circles. I already noted that (quoting myself) 'In some circles, that is called "un-typed" or "typeless", but that's a terminological convenience rather than a conceptual notion. No one thinks it really means "there are no types"; it's merely shorthand for "all values are of string type."' I'm not sure what your "Foobnixia" is intended to illustrate, and there is absolutely no question that strings are a type -- this is fundamental to both programming theory and implementation.

I still would be hesitant to call them "string types". A super-set of the "string type" perhaps because in such languages math operations can also be done on them just as easily as string operations. The "native" operators tend to be a combo of those for strings, numbers, and sometimes date/time. I don't see string-centrism in that arrangement. If we cannot open up the interpreter, we cannot tell what data structure or byte arrangement it's using under the cover, and it shouldn't matter because that should be swappable for a different implementation in theory.

{Types are often defined in terms of operations rather than representations. String operations (concatenate, substring, etc.) are, by definition, performed on values of string type. Math operations are, by definition, performed on values of number type. These are true independent of how the values are internally represented. How values may be cast from one type to another is an aspect of the TypeSystem in use. Even AssemblyLanguage and Forth have types, but they don't have TypeChecking.}

I don't know how to objectively verify "has types" unless we come up with a rigorous definition of "types", which has proven elusive on this wiki. ("I know types when I see them" isn't "rigorous".) Anyhow if we look at the native operations available, then we have a stringnumberdate because all three "kinds" op operations are possible on any given scalar variable/object in such languages. This is another case where IS-A fails because we don't have to stick with a strict hierarchy of operations. -t

{Objective verification of types, whatever that is, doesn't enter into it. I am using familiar terminology for generally-recognised concepts. Your personal rejection of these is irrelevant, and I won't waste further time on it. ComputerScience isn't going to change to suit just you, so if you wish to engage in these discussions, you'll need to change to suit ComputerScience.}

Fuzzy is never the ideal, and it's not my fault they are fuzzy. The existing terminology is "good enough" for roughly 90% of the uses out there (largely shaped by historical tradition), but we are at the 10% here. Traditional ADT-style classification of "types" fails when one views operations as a buffet of choices rather than "belongs to" one type. Shit overlaps. That's life. Hierarchical taxonomies clash with a SetTheory view of things. Sure these kinds of objects have behaviors of strings, but they also have behaviors of other "types". Thus, I reject the notion that they are primarily "strings". They are (potentially) numbers also if we classify them by operations. Please don't be blinded by history. -t

Now I will grant that one can do string-like operations on ANY value we may encounter (if there is a way to "serialize" it), but that's true regardless of language or implementation. Strings are just more flexible than numbers by their nature, independent of technology. -t

Assuming from you "no tags" transparency that structures also cannot be distinguised from strings. This means that you have some canonical representation of this in mind (an dprobably also for objects and functions). Lets assume JSON. This means that I could validly do something like

  "{ a: 1, b: [ 3, 4 ]}".b[0]

Correct? Doesn't this open a whole can of bugs?

I'm not sure what "b" here is supposed to be. For this topic, I'm only considering "scalars", but it's an interesting issue and may give us something like tcl meets lisp meets strings. There is a discussion around this wiki somewhere on that.

CategoryTypingDebate, CategoryRant, ColdFusionLanguageTypeSystem, DefinitionOfTypeTag