Tops Type Determinator Challenge

From ThereAreNoTypes:

UnknownDonor: "Values, types and variables/attributes are categorically distinct."

Top: "Until you produce an algorithm or something comparable that can take as input a language grammar and output which elements are which [and apply it to a code sample also], and most agree with the results, I won't believe you. Your type "definition" depends on WhatIsIntent, which is either too fuzzy or not dissect-able in practice. You have been unable to take the damned human out of the equation. "Step 5, have an expert determine intent of X" is not acceptable."

Is there such an algorithm or formula; and if not, can it be made in your estimation; or does some Law of Logic "forbid" it?

I'll simplify the challenge: only the code sample has to be "marked up" as to which element is which. Thus:

Inputs:

Output:
   if (foo == 7) then delete universe(b)

if <-- type ( foo <-- attribute == 7 <-- value ) then <-- attribute delete <-- type universe <-- etc ( b )
(Example not intended to be realistic for any existing language. Any resemblance to an existing language is due to either coincidence or mental laziness on the part of the author. If white-spaces are significant, then use a place-holder symbol, such as "[tab]" or "[space]".)

--top


How about looking at the source code of a compiler with a type system. Obviously languages with rich type systems, such as Delphi, tell the difference between a variable, type, and value. Otherwise the compiler wouldn't work, and wouldn't compile programs, and wouldn't correct you with static typing errors. Since the delphi source code is not available for dcc32, I'd recommend you look at the source code of some free compilers that have rich type systems. Likely you will be too lazy to look in to the source code because it's too large, or you want a short 5 line example which isn't possible. Or you want to play word games maybe, and say that types are actually just sets of things, or types are just flagged down by something, and not actually types, or some other DodgingTheIssue?.

This is not about humans stating what is what. Please reread the premise. Different humans have different answers. It's about codifying rules that determine types, not just determining types.

Do you need to codify rules to determine what an equal sign (operator) is in order to understand equality in math? Why do you understand the equality operator without needing any code to explain it? Why do children in math class understand number types without having codification available? Understanding the concept of a type, is far more important than a particular implementation (detail) of a type system. In order to understand the concept of a type, just look at the number types available in math, which are well known and widely accepted:

http://www.purplemath.com/modules/numtypes.htm

Renaming or redefining number types in math to "side flags" or "pylons" is not going to help the cause. You are playing word games and having a silly SemanticWar? here.

That's precisely why I want to remove humans from the determination. It would then be open-and-shut logic. If you violate rule 38, then you violate 38. (I question some uses of the term "type" around here.) -t


Challenge accepted.

Top, the RelProject has a debugging facility to convert TutorialDee code into a textual representation of its corresponding parse tree. Like this:

Given the following TutorialDee code...

 VAR X INTEGER;
 VAR Y INIT(X);

TYPE T POSSREP {A INTEGER, B INTEGER};

VAR Q INIT(T(2, 3));
...assuming the above is in a file called demo.d, I can run it through Rel as follows:

 java -jar Rel.jar -v1 < demo.d
And get the following output:
 Statement
  VarDef
   Identifier
   VarScalarOrTuple
    VarTypeAndOptionalInit
     Type
      Identifier
 Statement
  VarDef
   Identifier
   VarScalarOrTuple
    VarInit
     Expression
      Dereference
 Statement
  TypeDef
   Identifier
   TypeDefInternal
    PossrepDef
     PossrepDefIdentifier
     PossrepDefComponentCommalist
      PossrepDefComponent
       Identifier
       Type
        Identifier
      PossrepDefComponent
       Identifier
       Type
        Identifier
     PossrepDefConstraintDef
    PossrepInitialiser
 Statement
  VarDef
   Identifier
   VarScalarOrTuple
    VarInit
     Expression
      FnInvoke
       Identifier
       ArgList
        Expression
         Integer
        Expression
         Integer
Note how the values, types and variables/attributes are categorically distinct, and can be resolved accurately and unambiguously by the parser given a grammar (follow the link at http://dbappbuilder.sourceforge.net/Rel.php) and a code specimen in TutorialDee.

No no no. YOU are the one who labelled parts of the particular language. You skipped a key step.

As the person who originally wrote, "values, types and variables/attributes are categorically distinct", I'm afraid I must be missing your point. My comment was confined to programming languages and in that context it is accurate. Your "key step" appears to suggest something -- and I've no idea what -- that neither parsing nor semantic analysis are intended to do. In strictly programming language terms, distinguishing values, types and variables is straightforward, common, and necessary to implement working languages. If you feel there is some philosophical blur between them (again, I've no idea what, because that blurring certainly doesn't exist in my mind) it doesn't affect compiler design.

Given a grammar for a programming language, use an algorithm that imparts the "correct way" to determine types etc. to determine what parts of the language, per code samples, are what. The algorithm will determine what's a "type", "attribute", etc. NOT the compiler writer. I don't give a damn what the compiler author thinks is what; we are not testing human minds here, but an objective "formula" for determining what are types versus attributes, etc. (The testing machine will not use internal variable names as part of the algorithm. It will be the nature of the parts, not their labels.) If the "rules" for such are truly clear, as some of the more aggressive WikiZens claim, then it should be possible to turn those rules into an algorithm. This is to weed out bullshit and hot-air. -t

Note that you can use pseudo-code and descriptive steps as a starting point rather than actual code. Just be ready to answer questions about it and flesh out the fuzzy or confusing spots.

The algorithm that determines types, etc. "to determine what parts of the language, per code samples, are what", is precisely what I illustrated. It is something a programmer specifies in a grammar, and is used to appropriately allocate and access variables, identify literal values, and perform type checking. That is precisely the "rules" to which your correspondents referred. I believe you may have misunderstood your correspondents, or perhaps their descriptions were not clear, or both. There is no inherent arrangement of keywords and tokens from which types, variables, and values can be implicitly derived, but no one has ever claimed that there is. In programming language terms, however, variables, types and values are categorically distinct. This distinction is fundamental -- and explicit -- in creating usable programming languages.

Again, the compiler builder and/or grammar builder (human) decided what to call an "expression", etc. They could have instead called it a "flitterblonk". It's named whatever the hell they feel like naming it. You claim they are "fundamentally distinct" but have not provided any objective way to determine that. "I know it when I see it" is not good enough. If the determination is clear-cut, then a machine can do it just by analyzing the grammar rules of a language. If you cannot produce such an algorithm, then you haven't given it enough thought, backing my claim it's not clearly defined. Now maybe you'll claim that grammar rules are not enough; that one also needs to know what the operators do. That's fine, as long as you objectively encode that info.

Uhm, he provided an algorithm, and you immediately reject it. If I were him, I would just give up, because explaining things to you is not rewarding. Indeed you are a very HostileStudent. If someone proves to you that the equal sign is an operator, you would immediately reject it and say "but it is not an operator, it's a FooBar because anyone can rename operator to some other term! Actually, the equal sign is just a FLAG. It's not an operator, it's a flag!"

He didn't follow the criteria. He needs to go back to Requirements School to learn what requirements are. And I am not his student. The bottom sentence about re-interpreting things has some truth to it. There may be some historical tendencies to see things as such and such, but that may not be the only way to see it. It's similar to how some functional expressions can be viewed as imperative statements and vice verse. Some of you are mistaking historical habits for universal unwavering truths.

I followed the criteria as they are described, using precisely the approach used in implementing programming languages. Any misunderstanding, and/or failure to explain, is entirely Top's.

If I rewrote the Rel parser in German (such that it still executed Rel properly), it would fail because most couldn't read its labels. In theory the "type determinator machine" should give the same answer regardless which language the parser/compiler was translated into.

Sorry, I'm not following you at all.

After the translation, the above output would look something like:

  Groff
    Shnicken
      Hiffsten
    Fluken
    Worstein
      Nuffen
      Ausdruck
       Fn-Invoken
        Arguff Liste
         Ausdruck
          ganze Zahl
         Ausdruck
          ganze Zahl
Not only did a human assign the "parts of speech" classifications to the grammar, but it is not general purpose. It only works on Rel.

Of course. That is what the grammar is for. "Typeness", if I can use that phrase for what you seek, is not an attribute of a grammar. Nor is it an attribute of the source code itself. Types, in terms of programming languages, are defined by humans. They describe (relatively invariant) sets of values and associated operations. They come from mathematics (either axiomatically or derived), logic, past experience, intuition, convenience, analogy, and scientific study. They do not come from structural attributes of source code or grammar. You might find clusters of characteristics that way, such as that "(" often follows "if", but that isn't a type.

Re: '"Typeness"...is not an attribute of a grammar. Nor is it an attribute of the source code itself.' - As I stated elsewhere, if they only exist in human heads, and each head can have a different model/view of the "essence" of the source code, then "types" fail to improve communication or understanding, at least until both sides arbitrary agree to pick one mental modelling choice over another. It's a flawed mental tool. That's fine if you know it's flawed and work within its known limitations, but some insist there's a "right way" to view programs, which is as arrogant as The Church persecuting Galileo. -t

"Typeness" in programming languages is often represented by programs. In particular, it is those programs, or parts of programs, that specify (relatively) invariant sets of values and associated operators. However, there is no inherent mechanism that can divine a type definition purely from a grammar and source code, because the mechanisms to define types do not obey some inherent pattern in symbols. They are, in short, higher order constructs. An equally difficult problem would be to discover loops and procedure definitions from patterns of symbols alone. An even harder problem, I suspect, would be to automatically distinguish -- given a grammar and source code samples -- games from business applications.

Also, you need to distinguish identifying the types themselves from type definitions. These are profoundly distinct.


I think that Top wants an algorithm which discovers the types instead of just outputing the types 'assiigned' by a programmer. Much like humans do learn types (categories, whatever you call these patterns). To satisfy Top the algorithm may not really use the input grammar (as the compiler internally does) because Top would probably immediatly claim that the algorithm just printed the names of rules of the grammar instead of the 'types'.

Actually there is now an algorithm which does exactly this:

This algorithm learns the types of elements of a language. Any language. Given enough example sentences of course. I assume that somewhere in the human brain is a process at work which does the same (albeit probably more parallel, more efficient and more general). Nonmetheless the end result is: Each element in a sentence gets a 'type' - the selected rule to produce it in its context. The algorithm will not call it e.g. "noun" (how could it), but the description will be equavalent to "element at a position where a word from a learned list of nouns is allowed". -- GunnarZarncke

Nice find!

I wonder if Top is conflating the relatively narrow definition of types as sets of values and associated operators, with types in a more general sense, i.e., with the very idea of types or with the various "kinds of things" in a programming language. The former clearly distinguishes types from variables, values, and operators. The latter admits different kinds of variables, different kinds of values, different kinds of operators, different kinds of statements, different kinds of any programming language element you care to mention, and different kinds of things in general. Perhaps there is some blurring or ambiguity in the latter case, but I don't know that it's particularly of concern when implementing languages.

When implementing languages, types, variables, values, and operators are unambiguous. That is self-evident. In implementing a language, what I code for handling type checking is very distinct from what I code to implement variable assignment, though the latter may make reference to the former to make sure the user doesn't miss a type mismatch error. There is no confusing a variable with an operator, or a type with a value. Of course, in some languages with (say) first-class types there are 'type' values that can be bound to 'type' variables, but that confuses neither the language implementer nor the successful user because the context makes the arrangement of these elements clear. A car, for example, can be both "vehicle" and "fast" without confusing "car", "vehicle", or "fast", and it doesn't cause some peculiar notion that a "vehicle" is a "fast" or lead to some difficulty identifying cars.

{Re: "Given enough example sentences of course." (Unsupervised learning) - I assume the sample (training) sentences have been pre-labelled by humans as far as parts of speech. This would then simply reflect a kind of voting approach. I suppose that would satisfy the requirements. However, it's still based on human "impressions" and "notions" rather than some clear-cut rule(s). You'd still have to admit there are no clear-cut rules in the traditional sense of printable formulas and algorithms. At least you seem to be closer to understanding what I am asking for. Kudos.}

No. The training set contains no labeling at all (the mentioned corpus is a very large set of real world sentences). The performance of the algoritm is compared to manually labelled sentences though. Actually the algorithm even works on sentences without word boundaries ("mynameisgunnar"), but needs an even larger corpus of sentences then (to learn what is a 'word'). The algorithm can also be used to find the structure of e.g. DNA sequences or web page click streams or any other sequence of tokens. -- .gz

{How is "manually labelled sentences" not a training set?}

So are you saying that types are just a labelling of sentences? Obviously a non-arbitrary labelling. Are you free to label the sentence "Gunnar runs fast" as "Noun Noun Noun"? Or as "1 2 3"? Probably not. The labels you want to use must have a minimum structure to be called types. Are you asking: What is this minimum structure?

If there such a thing, sure. Are you saying types a continuous concept? So that something is 40% typish and something else is 80%? As a mental model, I could say that the labeling of words or phrases is "adding side flags" to them.


Re: "The former clearly distinguishes types from variables, values, and operators [and attributes]. "

By what rule, God, Bible, algorithm, formula?

"I know it when I see it" is not good enough. I don't want vague notions, I want iron-clad rules."

By convention and/or definition, and informally:

The RelProject/TutorialDee example above unambiguously identifies each of the above given a grammar and source code corpus.

You don't need operators to have types by some accounts. And it gets into the issue of whether operators that act differently due to some feature of the value(s) is a different operator under the same name or the same one. It's the borderline cases I'm interested in, not the clear-cut ones most agree with. We had a similar discussion somewhere regarding validation versus "types", and operator overloading.

I've updated the 'Type' definition. As for the "clear-cut ones" vs something else... Stop moving the goalposts! The definitions hold regardless of "feature of the value(s)" and operator overloading. A type doesn't become an operator because of some "feature of the value(s)", and if it did that would be an exceptional case.

I'm not moving the goalpost, I'm just making sure you don't over-focus on the simplistic cases.

For example, some dynamic languages will parse a variable with a string to see if it can be evaluated as a number if you use "+". If not, it does concatenation instead. Is that two different "types", or "changing behavior based on parsing"?

Automated type coercion. Values are still values, types are still types, variables are still variables. Nothing in the definition says a value can't be coerced from one type to another. Nothing in the definition says an operator can't perform type coercion.

How are you identifying that it's a "type" being coerced?

I'm not. It's a value being coerced from one type to another.

How does one know it's a "type" that's being coerced? How does one know that it's "changing its type"? (Further, I never stated the original was being changed, but that may not be a key issue regardless.)

There are various ways you can implement the infrastructure necessary to support type coercion. Are you asking for a description of them?

Using your "+" operator example, a common way is for the concatenation operator to attempt to coerce both operands to numbers. If the coercion succeeds, it uses addition and returns a number value. Otherwise, it coerces both operands to strings and concatenates them, returning a string value.

I'm asking how does one know "types" are involved besides "I know 'em when I see 'em".

Do you know apples when you seem them? Are they different than oranges? Part of the great thing about being a human is that we can clearly distinguish between things using objectivity. It is so obvious that an equal sign is objectively (not subjectively) different than a number or type of number. You seem to be the type that uses human psychology rather than science, facts, or math. I'm trying to implant some psychology into your head since you don't seem to be able to use math, theory, or science to distinguish between variables, values, and types.

The following is a very similar process:

  function AdjustURL(string url) {
    if (contains(url, "foo")) 
       return(concatenate(url, "bar"));
    else
       return(url);
  }
Is this "type coercion" also? If not, why is it different than the "+" example?

Does an URL belong to a set of URLs? Is there a collection of associated operators? If so, it is an URL type. Indeed, many language implementations provide just such a thing. It's usually a subtype of 'string', too, because an URL is a string with special properties. However, your example isn't coercion, because I see no evidence that you're changing an URL to a string or vice versa, let alone doing it implicitly.

I didn't necessarily "change" the value in the "+" example either. As far as "belonging to a set of URL's", do you mean explicitly in the computer and/or human heads? Where does this set have to exist to qualify? Note that the set of all possible "real" numbers cannot exist in the computer nor in our head because there's no room. We thus find ways to fake it: UsefulLies. One can define URL's as a set of all possible URL's, but that just means we can make theoretical models; and others may use a different model.

If 2 + "2" produces 4, you've changed the second value from a string to a number, i.e. you've coerced the string into a number. By "belonging to a set of URLs", I mean explicitly in the computer, but we don't need to explicitly specify every possible value if we can produce rules that determine whether a given representation of a value belongs to a type or not. For a string to be an URL, for example, it must exhibit a certain structure.

Re: 'If 2 + "2" produces 4, you've changed the second value from a string to a number...' - No, we don't necessarily have to. See "Computation and Mapping are Interchangable". We can take two strings as in input and return a string as an output. (And there are other ways besides mapping to do the same thing.) In languages that store/represent numbers as strings, such as Perl and ColdFusion, one doesn't really know what's happening under the hood, and it doesn't matter because it can be replaced by implimentations that don't have to convert the representation of the input parameters. In practice they probably convert in order to leverage the existing Intel hardware, but that's an implimentation detail that can be swapped for another.

As I stated above, there are various ways you can perform type coercion. However, the conversion from a string to a number, even if mechanically avoided, still occurs inevitably. Addition is a mathematical concept that applies to numbers, not character strings. Therefore, regardless of the mechanisms, if mathematical addition has been performed, it must -- by definition -- have been performed on numbers.

Re: "but we don't need to explicitly specify every possible value if we can produce rules that determine whether a given representation of a value belongs to a type [set] or not" - That's rather broad. Every IF statement is doing just that. Does that mean the results of every IF statement is a "type" (member of a type)?

No. What if the IF is used to exit the program when the red button is pushed?

It determines "type of things that cause program to exit".

Whilst the 'things that cause program to exit' may be plural, that doesn't necessarily mean it's a set. To be a type, it must be a set of values.

DataAndCodeAreTheSameThing, or at least can be viewed that way in one's head.

Indeed, in some languages that is explicitly true, and we can have (say) language statement values that are of a language statement type. However, in the majority of non-homoiconic languages, it is a meaningless equivalence because it is effectively inaccessible to source code in the language, except in the most awkward ways. In other words, the equivalence may be philosophically true, but not a language actuality. In C, the statements you write might be data to a C compiler, but they're not data as far as your program written in C is concerned unless you're prepared to write, at least, your own C parser and have access to the file containing the source code.

Okay, what if take Bob the Programmer's JavaScript (JS) interpreter and replace it with one that allows block structures to be added and changed dynamically (similar to Lisp). However, we don't tell Bob that we made the change. He thus never changes the block structures, yet the capability is there. His existing JS apps otherwise runs as they always did and he doesn't know the difference. So, does the IF statement now become a "type" in the absolute sense? It may be static in Bob's head because he assumes (models) it that way, but it's not static anymore. I see little use in making types depend on "mutability", other than temporary or convenient simplifications for communication or cognitive purposes. That's fine, but it's not an absolute trait. -t

Types do not depend on mutability, but on capability. Informally, if there exists an invariant set of block structures and associated operations, then there is a type. A simple example: C has an 'int' type, whether you use it or not.

That's because it's a side-flag. QED.

That doesn't make sense. An "int" isn't a side flag. It's a set of values (... -2, -1, 0, 1, 2 ...) and associated operators (+, -, /, etc.)

I thought we agreed that types don't need operators to be types. Further, the operators con operate on (use) both the value portion and the type-tag portion. Thus, they are not unique or dedicated. The language may make it awkward, but in some cases it appears we can even switch them such that the "type" is used as the value and vice verse.

Some types don't need operators to be types, but they're of narrow purpose. Typically, such empty types practically serve only as a form of indicator. E.g., given a type <x> with no operators, we might be able say that it is a type, or if we ask if a given variable contains a value of some type and we ask if it's type <y> but it's really type <x>, the answer will be 'no'. Such a type may serve as a "bottom" type, or as a "void" type. All other types (typically) have operators. What use would be an integer that we can't add, subtract, multiply, divide, etc.? A C "int" has operators.

And "int" is NOT a set of values. It may be in your head, but not necessarily in the language.

Int is most certainly a set of values. In C, it is finite subset of Z, the integers.

"Defined as conceptually equivalent to", perhaps, but no actual set is actually in there (or has to be in there). One can probably define or model the same behavior in a completely different way. The language standard may say it has to match a model X, but that does not mean it has to be implemented using the nor thought about in the head using that same model as long as the substitute is equivalent (or close enough for our intended usage).

It's not just conceptually equivalent to a finite subset of Z, it is a finite subset of Z. Imagining a set as a bucket that exhaustively contains every element is a helpful analogy or mental model, but it isn't reality. For example, the set of all even natural numbers, , would be infeasible as a container of every positive even integer. Only the expression that generates the set, i.e., its "builder", is necessary to fully and completely describe it. This notion is behind every non-enumerated type definition, and as it happens, programming languages are ideal for expressing complex set builders.

Yes, but like you said, it's a "notion". It's not actually in the software; it's in the reader's head. The reader could in theory replace their head model with a different "int" model that ALSO reflects program/compiler behavior. It's relative. We can only empirically test what comes out of the machine as far as fitting a definition, not peoples' heads. -t

{Oh please, not the EverythingIsRelative cop out again. Since EverythingIsRelative the concept of an equal sign can't be true either (ThereIsNoEquality?) because ThereAreNoTypes and the computer can't place equal signs into the CPU, and it can't place types into the CPU. Do you understand what abstractions are? You realize that tables don't actually exist either because the computer is dealing in 1's and 0's and therefore ThereAreNoTables. Please, the cop out of EverythingIsRelative is so lame.}

There may well be mental models that could replace the "int" model, but that doesn't matter. We're talking about what is actually modelled, not what is perceived. Types, as discussed here, are a concrete aspect of language design and implementation, not philosophy of computer science.

So types are defined by what the builder of the compiler/language had intended??? That's loco. Right back to DefinitionsThatRelyOnIntent (WhatIsIntent) again. -t

No, types are those parts of programming languages that obey recognised and generally-accepted ComputerScience definitions for types, like the one I've given on this page, or they aren't types.

Good! Now codify that process so that the steps are clear enough to state as an algorithm or list of rules. Codify "recognize", for example. Do you mean "it looks like something I've seen before"? That's probably not a good foundation for a definition.

With pleasure: "Type = a (relatively) invariant set of values and zero or more associated operators." As a definition, that is both sufficiently general and as precise as need be for the majority of programming purposes. More detail would be redundant or inaccurate, and less would be vague. Of course, it doesn't tell why we might want types or what they might be for, but that's a different issue. I have used terms -- "set", "operators", "relatively", "invariant", "values", "more", "zero", and "associated" -- that are recognised and understood by the vast majority of programmers and computer scientists. It's not necessary to codify "recognise" (or "understood") because it's generally recognised what "recognise" means. If I need to be more specific about "recognise" -- say, for the purposes of working in the areas of machine learning, AI, machine vision, etc. -- then I shall leave it to an expert in those areas to hammer out the details. If you don't recognise "recognise", then I submit the problem is yours and probably yours alone.


Computation and Mapping are Interchangeable

Take the following example:

  x = multiply(1, 2)
Normally we think about this as a computation, but it could also be implemented as a lookup table similar to the times table we had in grade-school:
  1, 2, 2
  1, 3, 3
  ...
  2, 3, 6
  2, 4, 8
  2, 5, 10
  etc.
In practice this tends to be resource hungry, but in theory the look-up tables could handle infinite expressions such that every computation can be defined as merely a lookup process and our programming language results defined as nothing more than re-mappings, but computation seen as a more practical alternative to impliment them. Our spoken language and thinking is thus shaped by practical concerns instead of theoretical ones, but that doesn't have to be the case. -t

Congratulations, you've re-discovered the conceptual basis for functional programming: a mapping from input to output. Of course, some mappings are infinite but computable, and others are finite but infeasible, so simple lookup tables are impractical.

[And what are the types of things being stored in that table? In Math the teacher would describe to you the number types that 1, 2, 3, etc. are. Are they negative numbers? Are they decimals? wholes? fractions? Positive numbers? If you put strings into that table such as "foo" and "bar", the strings cannot be multiplied. Whereas numbers can be multiplied, precisely because strings are a different type than numbers. They are not a different flag than numbers, they are a different type]

It can be seen as validation: "the two things you supplied are not in the table". We could add "foo" and "bar" into the table. That may make it less useful to humans used to things being a certain way, but the program doesn't care; it's just blindly following rules.

Certainly. Foo and bar are now part of a set of values, i.e., a type. Your "validation" is a type constraint.

So "everything's a type". How useful.

Very. You're on the verge of rediscovering TypefulProgramming.

If everything is a type, then every language is "typeful". Why sell it if it's everywhere? "Gitch'r Fresh Air, just two payments of $99.95!" (Well, okay, it's not a commodity in L.A.)

Congratulations, you invented the first proper TopsTypeDeterminator?:

  while (t = getNextToken(program)) {
    print(t . " is a type");
  }
Not quite. A type is a collection of values and 0 or more associated operators. Not everything is a collection of values. "IF" isn't a collection of values. "System.exit()" isn't a collection of values. Even System probably isn't a collection of values, despite being a class.

You sure about that? Absolutely?

On average, yes. I'm absolutely sure, however, that exceptions can be found. In a language with first-class control structures, "IF" might well be a control structure belonging to a set of control structures, with 0 or more related operators. In that case, "IF" would belong to a type. In typical C-derived languages, it does not.

C above, I mean see above in the Bob the Programmer example.


Maybe another way to view Tops quest is to compare it to the problems with definitions of sets (which are definitely related to types). Set theory is a very mathematical tool needed not only to reason about other theories but also to specify them, even their axioms. The structure of a theory is often given as some sets of values with 'rules' (e.g. predicate logic) on them. Naive set theory uses natural language to specify what it means by a set and the operations on it. This means set theory is vague. It cannot be implemented as an algorithm as such. It just serves as a stepping stone to 'real' set theory. Now there is axiomatic set theory of course. But that assumes an understanding of the concepts. Obviously you cannot use set theory to specify and reason about set theory. So where do you start? Its chicken and eggs. Does this mean there are no sets? No. Only that the problem of what sets are cannot be answered by referring to sets. -- .gz

See http://en.wikipedia.org/wiki/Naive_set_theory

It's probably possible to view/model just about anything as sets if one works at it enough. But if that means everything IS A set, or can be a set, then it's probably meaningless to say things like "sets are good" or "you should use more sets" because everything already is a set in one way or another. If you cannot identify things that are not sets, then selling sets is moot because they are already "there". More than likely they mean a flavor of programming that resembles some existing notation or technique: a style of doing things. That's perfectly okay, but be careful what you label it. What people typically call "types" in Algol-influenced (C, Pascal, and BASIC-like) languages are modeled well as "type flags/tags" in my observation. But that irritates some because it's too narrow; but that narrowness is a good thing. I asked, "what is common about what people are calling "types" in these kinds of languages, and how can I describe or model it better?" I isolated a spoken language pattern and found a way to describe it via a model: type tags. -t

{To say "types = values with type tags" (that is your model, isn't it?) is the same as saying "bicycle = vehicle with bicycle wheels". "Vehicle with bicycle wheels" gives us one possible characteristic of what might be a bicycle, but it might be something else entirely. It definitely doesn't model a bicycle, because it fails to capture the abstract essence of a bicycle. Similarly, "types = type tags" hints at how a particular type system might be implemented in some programming languages, but it is inapplicable to other languages and certainly doesn't tell us what types actually are in programming language terms. It fails to capture the essence of a "type".}

If the "abstract essence" is in the head, and there are multiple ways to model/view things in one's head, then "abstract essence" has little utility, or at least is problematic when both sides are using a different mental model. My alternative is not perfect, but at least narrows it down more. Each variable, or scalar segment, has two "parts" (the value and the type), and operations may take both of those in consideration when producing results. That's the "mechanics" of types in C-style languages. As far as the "essence", that gets into what's in your head, and it may not match what's in another head.

{The abstract essence is much more concrete than that. Which is a more accurate description of an "int" in C: "'int' is a side flag" or "'int' is a set of integer values (... -2, -1, 0, 1, 2 ...) and related operators, such as +, -, =, *, /, etc?"}

The language does not have to use "is a set of" to perform integer operations. That's in your head. Internally it may merely be "if the side flag is 'int', then interpret the bytes in such a way....".

{The language does have to use "is a set of" to determine that some set of input tokens representing a value is an int, so that we can choose the correct +, -, *, etc., for that value.}

{In your approach, you'd have to use "is a set of" to determine the correct "side flag", then use the "side flag" to choose the correct +, -, *, etc., for that value. Do you see here how the notion of "side flag" is actually redundant?}

There are different ways to implement "Determining the correct...". Regardless of the implementation approach, the output "depends on" (affected by) both the side-flag and the value. It's as if two parameters are being passed instead of one (the flag and the "raw" value). They just happened to be munged together under one "variable".

{It doesn't matter how you implement "determining the correct..." Implementation is irrelevant. No matter how you implement it, or even how you look at it, you are performing "is a set of". That's what it is, regardless how you do it.}

{That a value has a type is trivial, and part of the definitions I provided above. A value has a type, by definition. Saying the output depends on both the side-flag and the value adds nothing to the self-evident notion that the output depends on the type of the value. In fact, at best it's merely an awkward (and somewhat implementation-flavoured and evasive) way of saying "a type has operators."}

What definition? Please use a PageAnchor or something searchable.

{Definitions as follows, from above:}

Why say "and zero or more associated operators"? And why limit it to "invariant"? What field usage pattern does this come from?

{I say zero or more associated operators because it's true. Types without operators do exist, but their use is limited. E.g., "void" in C. "Invariant" refers to the fact that types represent categories or kinds of values, rather than transient storage. Admittedly, it probably does not significantly harm the definition to remove "(relatively) invariant".}

Everything in the universe has "zero or more operators". This hints at something you are thinking of, but cannot articulate right now.

{Sorry, I'm not following you. I'm not concerned with everything in the universe, just programming languages. That a type is a set of values and zero or more operators is intuitive. E.g., int: set of values (... -2, -1, 0, 1, 2 ...); operators +, -, *, /, etc. This pattern extends to every type.}

Why include it in the definition if it applies to everything, even in programmer-land?

{It doesn't apply to everything, just types, though at least one recognised definition informally describes types as "everything that can be determined about a program without running it." Under the definition I've used, "if" in C (for example) is not a type per se.}

If you get rid of "invariant" it may; but as already described in the Bob example, that probably won't change how most people think of or use the language.

{Sorry, not following you.}

Please review the Bob the Programmer example. "If" can be dynamic.

{If "if" is dynamic, belonging to a set of values and having operators that act upon those values, then it's a type. Otherwise not. This was discussed above.}

Doggonit! I thot "operator" was irrelevant.

{Where did I say "operator" was irrelevant? What I said was that there are types, of limited utility, that have no operators. E.g., "void" in some languages, the 'bottom' type in others. Most types have operators.}

"zero or more associated operators"

{Indeed. A "bottom" or "void" type (for example) may have no operators because it's never instantiated. Instantiating a value of a given type requires at least one operator to instantiate a value of that type. In C, it's implicitly invoked for the built-in types based on recognising input tokens. E.g., '123' is recognisable as an implicit request to invoke the 'int' type's operator that instantiates a value of type int, '12.45' is a request to instantiate a value of type float, and so on. In C, for user-defined types, it's whatever initialisation the programmer specifies. In C++/Java/C#, it's implicit for built-in types and explicit constructor(s) are specified for user-defined types. In TutorialDee, the value instantiation operators are called selectors, and are invoked implicitly as with C/C++/C#/Java/etc. and explicitly for user-defined types. And so on.}

{In short, every type for which you wish to instantiate a value needs at least one operator to instantiate a value of that type.}

I hate to say it, but that depends on how you define "instantiation". VB Classic by default acted like every variable existed already with a nil variant.

{I'm not clear how that is relevant. Variables are neither values nor types. Do you mean the variables by default contained the value 'nil'? (I don't recall much VB Classic, I'm afraid.) If that's the case, there's an implicit operator associated with the 'variant' type that instantiates 'nil' so variables can be initialised.}

But using "implicit" you are starting to shoehorn it into your target model by having imaginary actions, such as initialization. True, all of programming is probably that way in our heads to some degree, but it just means there are multiple models for the same thing. In practice it probably does have an initialization action under the hood, but that is analyzing the implementation, not visible behavior. It's externally indistinguishable from "all variables pre-existed with a 'nil' value". Both sub-models "work".

See PlethoraOfUsefulHeadModels.

{Beware of confusing "implicit" with "non-existent". A variable that is not initialised is very distinct from a variable that is initialised, whether implicitly or explicitly. That is visible behaviour. In any model, if you use a variable before it is explicitly initialised, you must define its behaviour, i.e., you either must provide implicit initialisation, treat it as undefined and force all subsequent behaviours to be undefined, or throw an exception. There are no other options.}

I don't know much VB, but it is very important for a Vb programmer to know if the variable is initialized to Nil or whether it just points to garbage (uninitialized). In practice initialization under the hood is something you must know about, otherwise checking the variable for nil wouldn't work (if it was just pointing to garbage).

VB-Classic did not result in garbage if uninitialized, just nil. (Most used the OPTION EXPLICIT directive to avoid accidentally using such, as a side note. The nil think was a dumb default.)

What would be a good default?

I meant the problem is that it allows null/empty/blank variables without an assignment. In Php, for example, you can create a variable "out of the blue" if you assign something to it. However, VB's default behavior was that this could happen on the right side of an assignment statement. I'm okay with "left-side creation" of variables. - t


The "sets" definition actually kind of matches the side-tag model: the tag identifies which set the variable/object belongs to, and the value portion is a particular member of that set. Maybe we can reach a common ground. -t

The side tag "model" is about as useful as saying "Types are side types". The side type identifies which type the variable/object belongs to. You are just playing word games. Type signature, type tag, side flag, why not call it side label? How about renaming apples to "a fruit with an apple tag on it". One doesn't understand what types are by an implementation detail. If apples actually had tags inside their atomic structure which differentiated them from oranges, that would be an implementation detail of the universe.

"Sets" are also an implementation.

Set theory is the branch of mathematics that studies sets. Do you think mathematicians have an implementation nearby? They don't even use computers for christ sake. You do math on paper.

Your fruit analogy makes no sense to me. And "side tag" is not meant to be the full definition in itself, just a label. Don't mistake a title for the article. It could be called the "bicameral variable model", for example. A variable in a typed language carries both a type identifier component (the "side tag"), and the value component. I'll try to formalize it a bit more:

   ST-DEF-CANDIDATE-01
   Types are a secondary component of a variable
   which identifies the set to which the value
   portion of the variable belongs.
Constants can have types too, and literals. A literal string is not a variable, nor is a constant. A number in math has a type, and a number is not a variable.

{Indeed. A literal is a value. A number is a value.}

   ST-DEF-CANDIDATE-02
   Types are a secondary component of a programming feature/object
   which identifies the set to which the value
   portion of the feature/object belongs.

ST-DEF-CANDIDATE-03 Types are a conceptual or actual "tag" attached to programming features or objects, which is a list of set identifiers that are used for knowing how to interpret and check compatibility of interactions among the features or objects.

[Indeed I think the interpretation aspect is interesting and worth to be discussed. That is because interpreted languages do not do any type analysis beforehand ('without actually running it'), so there are no formal types in these languages. But obviously there are some 'types'. And I'd guess that Top means exactly these types which are used in interpreted languages and probably encoded as 'side flags' quite often.]

[So there is indeed a bit loose language when we speak about types in the sense of 'properties of program (locations)' and types as declared or inferred attributes ('side flags') of values. The former I will call format types and the latter data types.]

[A type theorist would call the formal type of a variable which can hold values of different data types a 'union type'. The simplest union type is 'any' or 'top' (sorry Top, its really called that), meaning the formal type which tel nothing about a variable because it could be any value of any data type.]

[But mostly it would be a more specific union type, namely the union of all possible (and this may depend on program analysis) formal types of all values which may be assigned to that variable. The formal types of all of these values again determined from the variables which may contains them and so on until ultimately the formal types of literal values are reached. Here we have a 1one-to-one mapping between the data type (e.g. 'int' for 123 or 'String' for "abc") and the formal type (which will usually also be called 'int' or 'String').]

[Now what about interpretation. Are there indeed no formal types? Even if the interpreter never does program analysis we as programmers could reason about our interpreted program like this: 'This variable can hold only a number because it is only ever assigned a number.' And when we do this we infer the format type of a variable to be 'int' instead of 'any'.]

[A smart compiler (JIT) for this interpreted language could do the same. Actually most faster dynamic languages out there do so. Most prominently probably the Chrome JavaScript interpreter. They remove all the switch(value.type) { case int: intop() ; ... } cases with specialized code which handles only the cases which can be proven to actually occur. -- GunnarZarncke ]

See ApplyingTheSideTagTypingModel for some related notes about interpreters and compilers. -t


An observation... There is a lot of confusion on this page with respect to symbols,values and types. We use symbols in our languages to provide more or less agreed upon references to things in the real world. These are COMPLETELY ARBITRARY. BUT how these symbols get strung together is NOT ARBITRARY. There are many differences in SURFACE SYNTAX, but every human langauage has a way to describe something like this:

NOUN VERB NOUN

A noun is a "TYPE" of thing as is a verb. Keep in mind that MY use of "NOUN" and "VERB" as symbols is equally arbitrary, but irrelevant to the discussion. Things that we label as NOUNS or alternately things that can be the DOER in this simple sentence or the recipient of an action have some set of traits that make them similar and suitable to BE the doer of an action or the recipient of an action. Plato called these forms. Rubyists call it "duck typing". It is generally informal, but it is necessary to be able to communicate a simple sentence like "Dog bit Man". A thing capable of biting bit a thing capable of being bitten. Abstracted slightly DOG op MAN, where operation is an operation that is possible for a dog to do and to be done to a man. Most of use do not spend a lot of time thinking about our linguistic type systems, but they do exist and at some level are required to allow communication to happen. This is true whether the communication is human to human or human to machine. You can't have a relational query (or any other language) without a type system, even if quite minimal.

Mathematics is an attempt to improve our ability to understand the world by reasoning about it. Natural languages are problematic for this (you can see this by looking at the arguments on this page). Hence mathematicsis a more formal, constrained and precise way of describing and reasoning about things so that we do not spend all of our time mired in the vagaries of natural languages. As observed mathematics is ALSO a language (actually a family of languages sharing certain attributes) and also has types. While the definition of "=" is arbitrary, the CONCEPT of equality is NOT.

An intuitive notion of equality turns out to be a remarkably fragile thing. Consider how many times you have seen, heard or participated in arguments about whether two things are really the same. It turns out that a lot of stuff has to be specified in order to meaningfully determine equality. Part of that "stuff" is a type system that determines the kind of things for which equality can be determined and how to go about determining it. Does 12 = 12? If we understand 12 as symbols representing numbers, then yes. If we see the 12 literally as ink on a page (or pixels on a screen or symbols at a point in a stream of text) then the two instances of 12 might not be "equal". While there is no universal type system, we have to at least temporarily agree on one to give our utterances meaning.

I agree that "types" are generally associated with classifications, and that a classification system may determine what noun can be paired with what verb, etc. However, just about everything can be view as a classification(s) and just about every verb can be reworked to be a noun, etc. We need a way to distinguish patterns of symbols separately from how we may personally think about them in our heads, in part because there are multiple different head models that can be used to produce similar conclusions (predictions) about symbol patterns and code results. Some people seem to believe their head model is the One True model, and are rather insistent about it in an aggressive way. -t

{Yes, I've noticed TopMind appears to be rather "One True model"-insistent about his "side flag" or "side tag" notions. The classification system you're describing is commonly known as a TypeSystem.}

Where did I insist it should be the ONLY model? Your imagination is making stuff up again.

[The side flag model, is not even a model. It's equivalent to an apple sticker. An apple sticker is a flag or tag on that apple. This apple sticker or tag is only one physical way of labeling or classifying the apple. The tag itself is one small PHYSICAL detail. A model is not one small physical detail. Type systems are not modeled by tags or flags. Tags or flags may be an implementation detail, a way to label off a structure as a certain type. We can use apple stickers to classify apples, because stickers (or tags) are one way to implement a classification system. People who are learning how to program should never, ever, ever, ever be introduced to unimportant implementation details. Low level details of how something is implemented is NOT theory, and is not a model, and is very dangerous for education. It's like arguing that mysql uses pointers internally for speed, so therefore the relational model must have something to do with pointers. The relational model is nothing to do with how some particular product chooses to implements it.]

As I've said before, the distinction between implementation and model can be blurry or interchangeable. And any communication of a model will require a physical representation because thoughts cannot be directly transferred from one head to another via a USB patch-cord or whatnot. And thoughts also ultimately have a physical representation. Your criticism is thus vague or incomplete. Without a clear definition of "types", it's hard to critique one model against another or distinguish between it and an "implementation". And if somebody wants to model relational using pointers, good for them. I may not wish to use their model myself, but will not say "it is objectively bad/wrong" without something more rigorous. You seem to mistake your personal preferences and thoughts for universal truths.

{How is stating that "a value/variable has a side tag" (where the "side tag" represents the type, presumably) any better than stating that "a value/variable has a type"? Why complicate a simple notion (e.g., that a value has a type) by introducing a "side tag"? I also asked this on the "DedicatedOperatorMeaningless" page, to which you responded enigmatically with "first things first", then nothing.}

"Has a type" doesn't say much about the relationship between the value and what we may call a "type indicator". A "side tag" implies something that is separate from, yet "tags along with" the value. A shirt tag is not really part of the shirt, but stays with the shirt. "Has a type" leaves the door open for bleeding into one another. Plus, side tag also suggests that the indicator is subservient and/or less visible than the value itself. It has a side roll. A "tag" provides a physical metaphor that better matches or mirrors how the tag model works and/or is commonly represented. If you draw the model on a chalk-board (now dry-erase-boards, iBoard in 7 years from now), drawing a tag is the best known way to represent the type indicator of the model. I know it's not necessarily precise or rigorous, but neither is the competition. Plus, it's the name of the model. We could call the model "Susan", but that makes the name less self-describing. If they called Lisp "Nestolist", it would be more self-describing than "Lisp", for example. -t

{You appear to be describing one implementation strategy rather than a general model, apparently justified on the basis that simply stating a value or variable has a type will leave "the door open for bleeding into one another" and that "the indicator is subservient and/or less visible than the value itself." I don't know what you mean by either of those statements. It all seems more complicated than, and yet effectively equivalent to, stating that a value or variable has a type. Likewise, it's simple and intuitive to say a 3 is an integer, or that a creature is a cat, or that I'm wearing a "T" shirt. Why complicate this trivial notion by introducing integer, cat, or shirt tags? Furthermore, it is conceptually more accurate to say that an object is a <insert type here>, rather than an object has a "type indicator."}

Please clarify the last sentence. Many languages make it fairly easy to "ignore" the type (or produce results resembling the ignoring of the type indicator). That tends to weaken the case for is-ness. Side-flag-theory/model can explain fairly "mechanically" how "ignoring" happens, and proposes tests to match the model or sub-model (predictive capability). Saying something "is a" doesn't, at least not in a consistent way. It's too ethereal a statement.

It simply states that everything has a type, even if the type is "anything" or variant, or the type is "an arbitrary collection of bytes", or the type is unknown.

And I disagree it's an implementation for reasons already recently given. What's actually going on under the hood may very different from a chalkboard version of a side-tag model. But as long as our model can be tuned to explain what's actually happening, that's all that matters; and the tag model is usually flexible enough to do that. It offers a prediction framework without actually having to know the real mechanism on the chip/compiler level. That qualifies it as a model. Now you could argue it's too low-level a model to say anything broad or abstract or "general" about "types", but until I see something better come along, I will stand behind the side-tag-model (STM). There are higher-level models, but so far they are too vague or open-ended. STM can be turned into a software model where we can x-ray the simulated objects to see what changes what and when, and it will mirror the inspect-able output or behavior of the actual program running (if set up or tuned right). STM's competitors can't do that yet. A runable model is usually far more useful and testable than a non-runable one. -t

Okay, assuming your "type tag" notion is not about implementation, I can see some limited value in terms of explaining simplistic scenarios involving, say, differences between languages that have a "type tag" associated with variables & values vs having a "type tag" associated only with values. The former, for example, might prevent assigning a new value to a variable if the variable and value's "type tags" differ, whilst the latter may permit any value to be assigned to a variable. However, I'm still not clear how this is any different from simply stating that the former language associates types with variables and values, but the latter only associated types with values.

I think we'd have to clarify how "value", "type", and "variable" are being used here. I'd rather deal with specific instances of languages than try to nail these 3 down in a general sense. Otherwise, we'll get lost in messy debates over these other terms also (value and variable).

Is that relevant at this point? All I'm asking is why, in general, we should prefer statements like this...

...to statements like this? You can draw a tag on a whiteboard and anything you put in it is clearly distinct from what is outside the tag. Almost nobody will draw something half-in in the tag. "Associates" can be interpreted or represented in too many different ways, some of them potentially misleading or confusing. If multiple variables/objects referenced the same "type", and you change that thing being referenced, then all those objects/things are referencing a different "type". Most common languages don't work that way and thus it's a poor model. (User-defined types/classes may work that way, but that gets into the distinction between a type and a class, which is probably another topic.)

Sorry, not following you here.

This wiki needs a dry-erase board.


Perhaps an example would be useful. Suppose that you were wondering through a sequence of symbols, and you found this:

     DEADBEEF
Now, is this a VALUE, a TYPE or a VARIABLE? -- JeffGrigg

[None of the above. After all, you just said it was a symbol.]

{Without knowing how a particular language treats it, it's difficult to classify and difficult to model how the language treats/processes such. -t}

By "sequence of symbols" I mean characters. I'm familiar with lexers, parsers, and the traditional compiler-compiler tools used to parse ("recognize") modern programming languages, and the mathematical foundation behind them.

I believe that it's impossible to successfully do TOP's challenge, as stated. That is, it's impossible to determine, from any arbitrary language's grammar (and lexical analysis expressions, if used), which things are values, types, and variables. The information isn't in the grammar. Grammars match a source program against the grammar rules. But the "meaning" of the rules is determined by the code that processes them, interpreting or compiling (generating code).

So, in general, if you see "<something, something, something> DEADBEEF <something, something, something>", you can't tell if "DEADBEEF" is a value, variable or type just by looking at it. It could be any of the three.

I'm not saying that {value, variable, type} don't exist or are not useful. But I do contend that they are useful conventions that we have adopted, not that they are "universal truths" of anything. -- JeffGrigg

Although the term "convention" is perhaps open to interpretation, you more or less seem to be agreeing with my premise that "types" in "typical" programming languages we use are recognized based on terminology patterns of similar-looking languages. Essentially, humans are using pattern recognition to say what is a "type", and not necessarily an unwavering universal rule. -t

For example, we call a Koala bear's arm an "arm" because it generally resembles human arms in shape and use. There is no universal "rule(s)" or equation that separates arms from non-arms. (Koala bears are fairly distantly related to humans as far as mammals go. They are not even bears either, by the way.) It's comparing and finding similarities, i.e. pattern recognition. I believe a similar thing is being done with "types" and programming languages when humans label parts or artifacts of the languages. -t

By the way, Jeff, would it change your assessment if the rules of the execution of the language were also supplied? Not just the grammar rules. (I'm not sure how such rules would be encoded. It's not a subject I have much familiarity. The compiler/interpreter itself may qualify.) -t

Identifying types, variables and values in a given source code corpus requires knowledge of both that language's grammar and its semantics. There isn't something inherent in text that can be universally and automatically used to identify types, variables and values without knowledge of a given language's grammar and its semantics. However, the concepts of types, variables and values are distinct and unambiguous, even though there are some languages that may seem to blur them. For example, in OO languages that support mutable class instances (e.g., Java, C++, Python, etc.) each possible state of a given class instance is a value; the instance itself is a variable. Or, for example, languages with first-class types may allow type definitions to be used as values that can be assigned to variables, but that does not mean a type is a value or a value is a type.

I do think that there are such things as types. But that they are determined by convention.

To a large extent, they're determined by the CPU hardware (and related circuits) -- manipulating data of certain "types," such as a 32-bit twos-complement signed integer value is efficient (fast) because it's supported directly by the underlying hardware. We find it useful to write software that makes good use of the hardware. And hardware vendors find it useful to build hardware that does a better job of executing the existing software. There's a feedback loop between the two.

Modern CPUs have stacks and efficient ways of accessing data with offsets to the current stack, because most modern programming languages use such things extensively.

We do make "types" that go beyond what is directly supported by the hardware. But even those types retain a fair amount of hardware-related design limitations, such as limited string length. -- JeffGrigg

The canonical primitive types found in most programming languages are certainly reflective of underlying CPU architecture, which in turn was based on obvious computational needs and hardware manufacturing capability. User-defined types and TypefulProgramming, however, have far more to do with meeting user requirements via (among other things) TypeSafety than any limitations imposed by hardware.

I think that in order to automate processes effectively, we find that we need to impose structure upon them and standardize them. For example, to count the number of people living in a city, one has to define what one means by a "person" who is "living in" the city. And if you want meaningful numbers regarding their distribution by race, for example, you'll have to come up with some finite and usable definition of "race" to use.

We've found that grouping representations of data (like name, birth date and race) around objects/records/things to be quite helpful. And later, we've found that grouping procedures (code/methods) with the data it operates on, helpful. -- JeffGrigg

[I'm not sure what you are getting at here. All definitions are arbitrary, so of course what we mean by "person" and "living in" and "race" are governed by convention. But what does that have to do with the surrounding discussion.]

Because "person" is a type. And so is "race." We decide what "people are living in a place" somewhat arbitrarily. And we define "race" fairly arbitrarily, depending heavily on how we chose to look at things, and maybe what we are trying to accomplish. So yes, there are "types," and they are quite useful and necessary. But what I'm saying is that any attempt to come up with an absolute objective independent concept of type will fail. We make up most of the classifications we use. None of them are "true." But many of them are useful. So we use them.

So in spite of it being impossible to prove an absolute "truth" to types, we'll keep using them. Because they're useful. -- JeffGrigg

{A somewhat fuzzy yet UsefulLie. -t}

[I think I get what you were saying Jeff, the boundaries of each race is what's arbitrary. I think you are making a HastyGeneralization here. Just because some types are fuzzy doesn't make all types fuzzy. For example, prime is a type, there is absolutely no fuzziness to whether or not an integer is a prime.]

{If all categories are types, and almost everything can be viewed in terms of categories (which is true), then the definition of "type" is watered down too much to be of practical use. -t}

Maybe in some branch of philosophy, perhaps. Not in ComputerScience.

{If it's clear-cut, nobody's listed the rules/formula yet. That's what triggered this topic to being with.}

It's a set of definitions (some of which appear on this page), and at a theoretical level, the essence of a branch of ComputerScience called TypeTheory. There is no set of rules, any more than there is a set of rules that will identify a cat. However, much as we can use accepted definitions to recognise a creature as a cat, we can use accepted definitions to recognise types. We can even go further and create TypeSystems that will allow us to manipulate types -- in a rigorous, predictable, and rule-driven way -- in computer languages. In that context, as programming language designers, we decide what "types" can be.

By the way, please stop conflating your personal non-acceptance of popular definitions with some -- emphatically, non-existent -- general debate about "what is a type" in ComputerScience. The problem here is yours, and yours alone.

We can indeed make rules that determine cats from non-cats, and then refine them when we find a flaw in our rule set. As far as "making a type system", just because you call your model/machine a "type system" doesn't automatically make it all "types". I can call my dick a "type system", but that act doesn't necessarily make it so. (And please, no jokes about "smallInt".)

[OK; we'll talk about "tiny int." >;-> ]

I'd be interested to see how your rules to distinguish cats from non-cats differ, qualitatively, from the definition of "type" presented on this page or any recognised definition of "type".

There use to a BASIC game floating around, with source, which asks various questions and then identifies the animal you had in mind. The questions resemble, "Does it have pointy ears?", "Does it have whiskers?", etc. It's over-simplistic for industrial use, but illustrates that classification can be encoded as an algorithm. A fancier version would probably use probabilities since a given specimen may lack a feature due to an accident or birth defect.

You can play exactly the same game with types. Indeed, I have seen it used to teach children about different types of numbers!

That's categorizing existing and known domain nouns. That's not the challenge.

[There's a point to categorizing things that don't exist?]

You'd think that something as simple as "sex -- male/female" would be simple and objective. But the history of the Olympic games has shown even this obviously "two-state" piece of data is much more complex than you might think, when you really take it seriously. http://en.wikipedia.org/wiki/Gender_verification_in_sports

This kind of thing does lead me to think that much of what we do is UsefulLie and AllAbstractionsLie. But that the abstractions we use are useful -- in that they save money and get the job done -- turns out to be a good thing. And sufficient. As for deep philosophical discussions about The Ultimate Truth... Well, they're great and useful. But not as a driving force for how we should develop software. -- JeffGrigg

{It became an issue because certain people aggressively insisted they know the "right" definition of "types" and that other definitions and/or models are "wrong". -t}


See DedicatedOperatorMeaningless regarding associating operators with types.


Unfortunately, due to a DeleteTantrum by GrammarVandal, some content has been lost. I've undeleted this page and made a concerted effort to retrieve it, but PageHistory sometimes has holes. Alas.

Anyway, the following text is found above:

...

That's categorizing existing and known domain nouns. That's not the challenge.

...

I responded to the effect that I do know how to identify types, but that applying cluster analysis to source code would require considerable manual intervention -- primarily to put it in a form suitable for the various types of cluster analysis -- but there is no automated way to identify types given just a language grammar and source code samples, because the semantics of the language need to be identified.

There were more responses, but they're gone now. My last bullet point suggested a more easily-solved, and perhaps less contentious challenge. However, the challenge is for Top: Top, since you created this page in response to my statement that "values, types and variables/attributes are categorically distinct", please give an example of a language where values, types and variables/attributes are not categorically distinct.

First, we'd have to define all three of those clearly and come to a mutual agreement with the definition. Otherwise, different people will call different things different names/categories.

And I have asked before, would it change the nature of the challenge if a description of what the language/compiler/interpreter "does" is given? Such as a reference implementation?

I do not believe humans use rigorous/clear/consistent rules to determine what they call "types" or something else. Instead it's based on tradition and similarities (pattern recognition). If they did, they should encode such rules in a clear format if they insist their determination technique is the One True Way. -t


CategoryTypingDebate


FebruaryTwelve AprilTwelve


EditText of this page (last edited June 17, 2013) or FindPage with title or text search