Tops Type Determinator Challenge

UnknownDonor: "Values, types and variables/attributes are categorically distinct."

Top: "Until you produce an algorithm or something comparable that can take as input a language grammar and output which elements are which [and apply it to a code sample also], and most agree with the results, I won't believe you. Your type "definition" depends on WhatIsIntent, which is either too fuzzy or not dissect-able in practice. You have been unable to take the damned human out of the equation. "Step 5, have an expert determine intent of X" is not acceptable."

Is there such an algorithm or formula; and if not, can it be made in your estimation; or does some Law of Logic "forbid" it?

I'll simplify the challenge: only the code sample has to be "marked up" as to which element is which. Thus:

Inputs:

Language Grammar
Code specimen in given language

Output:

Code specimen marked up with element "type". Example

   if (foo == 7) then delete universe(b)

   if  <-- type
   (   
   foo  <-- attribute
   ==
   7   <-- value
   )
   then  <-- attribute
   delete  <-- type
   universe  <-- etc
   (
   b
   )

(Example not intended to be realistic for any existing language. Any resemblance to an existing language is due to either coincidence or mental laziness on the part of the author. If white-spaces are significant, then use a place-holder symbol, such as "[tab]" or "[space]".)

--top

How about looking at the source code of a compiler with a type system. Obviously languages with rich type systems, such as Delphi, tell the difference between a variable, type, and value. Otherwise the compiler wouldn't work, and wouldn't compile programs, and wouldn't correct you with static typing errors. Since the delphi source code is not available for dcc32, I'd recommend you look at the source code of some free compilers that have rich type systems. Likely you will be too lazy to look in to the source code because it's too large, or you want a short 5 line example which isn't possible. Or you want to play word games maybe, and say that types are actually just sets of things, or types are just flagged down by something, and not actually types, or some other DodgingTheIssue?.

This is not about humans stating what is what. Please reread the premise. Different humans have different answers. It's about codifying rules that determine types, not just determining types.

Do you need to codify rules to determine what an equal sign (operator) is in order to understand equality in math? Why do you understand the equality operator without needing any code to explain it? Why do children in math class understand number types without having codification available? Understanding the concept of a type, is far more important than a particular implementation (detail) of a type system. In order to understand the concept of a type, just look at the number types available in math, which are well known and widely accepted:

http://www.purplemath.com/modules/numtypes.htm

Renaming or redefining number types in math to "side flags" or "pylons" is not going to help the cause. You are playing word games and having a silly SemanticWar? here.

That's precisely why I want to remove humans from the determination. It would then be open-and-shut logic. If you violate rule 38, then you violate 38. (I question some uses of the term "type" around here.) -t

Challenge accepted.

Top, the RelProject has a debugging facility to convert TutorialDee code into a textual representation of its corresponding parse tree. Like this:

Given the following TutorialDee code...

 VAR X INTEGER;
 VAR Y INIT(X);

 TYPE T POSSREP {A INTEGER, B INTEGER};

 VAR Q INIT(T(2, 3));

...assuming the above is in a file called demo.d, I can run it through Rel as follows:

 java -jar Rel.jar -v1 < demo.d

And get the following output:

 Statement
  VarDef
   Identifier
   VarScalarOrTuple
    VarTypeAndOptionalInit
     Type
      Identifier
 Statement
  VarDef
   Identifier
   VarScalarOrTuple
    VarInit
     Expression
      Dereference
 Statement
  TypeDef
   Identifier
   TypeDefInternal
    PossrepDef
     PossrepDefIdentifier
     PossrepDefComponentCommalist
      PossrepDefComponent
       Identifier
       Type
        Identifier
      PossrepDefComponent
       Identifier
       Type
        Identifier
     PossrepDefConstraintDef
    PossrepInitialiser
 Statement
  VarDef
   Identifier
   VarScalarOrTuple
    VarInit
     Expression
      FnInvoke
       Identifier
       ArgList
        Expression
         Integer
        Expression
         Integer

Note how the values, types and variables/attributes are categorically distinct, and can be resolved accurately and unambiguously by the parser given a grammar (follow the link at http://dbappbuilder.sourceforge.net/Rel.php) and a code specimen in TutorialDee.

No no no. YOU are the one who labelled parts of the particular language. You skipped a key step.

As the person who originally wrote, "values, types and variables/attributes are categorically distinct", I'm afraid I must be missing your point. My comment was confined to programming languages and in that context it is accurate. Your "key step" appears to suggest something -- and I've no idea what -- that neither parsing nor semantic analysis are intended to do. In strictly programming language terms, distinguishing values, types and variables is straightforward, common, and necessary to implement working languages. If you feel there is some philosophical blur between them (again, I've no idea what, because that blurring certainly doesn't exist in my mind) it doesn't affect compiler design.

Given a grammar for a programming language, use an algorithm that imparts the "correct way" to determine types etc. to determine what parts of the language, per code samples, are what. The algorithm will determine what's a "type", "attribute", etc. NOT the compiler writer. I don't give a damn what the compiler author thinks is what; we are not testing human minds here, but an objective "formula" for determining what are types versus attributes, etc. (The testing machine will not use internal variable names as part of the algorithm. It will be the nature of the parts, not their labels.) If the "rules" for such are truly clear, as some of the more aggressive WikiZens claim, then it should be possible to turn those rules into an algorithm. This is to weed out bullshit and hot-air. -t

Note that you can use pseudo-code and descriptive steps as a starting point rather than actual code. Just be ready to answer questions about it and flesh out the fuzzy or confusing spots.

The algorithm that determines types, etc. "to determine what parts of the language, per code samples, are what", is precisely what I illustrated. It is something a programmer specifies in a grammar, and is used to appropriately allocate and access variables, identify literal values, and perform type checking. That is precisely the "rules" to which your correspondents referred. I believe you may have misunderstood your correspondents, or perhaps their descriptions were not clear, or both. There is no inherent arrangement of keywords and tokens from which types, variables, and values can be implicitly derived, but no one has ever claimed that there is. In programming language terms, however, variables, types and values are categorically distinct. This distinction is fundamental -- and explicit -- in creating usable programming languages.

Again, the compiler builder and/or grammar builder (human) decided what to call an "expression", etc. They could have instead called it a "flitterblonk". It's named whatever the hell they feel like naming it. You claim they are "fundamentally distinct" but have not provided any objective way to determine that. "I know it when I see it" is not good enough. If the determination is clear-cut, then a machine can do it just by analyzing the grammar rules of a language. If you cannot produce such an algorithm, then you haven't given it enough thought, backing my claim it's not clearly defined. Now maybe you'll claim that grammar rules are not enough; that one also needs to know what the operators do. That's fine, as long as you objectively encode that info.

Uhm, he provided an algorithm, and you immediately reject it. If I were him, I would just give up, because explaining things to you is not rewarding. Indeed you are a very HostileStudent. If someone proves to you that the equal sign is an operator, you would immediately reject it and say "but it is not an operator, it's a FooBar because anyone can rename operator to some other term! Actually, the equal sign is just a FLAG. It's not an operator, it's a flag!"

He didn't follow the criteria. He needs to go back to Requirements School to learn what requirements are. And I am not his student. The bottom sentence about re-interpreting things has some truth to it. There may be some historical tendencies to see things as such and such, but that may not be the only way to see it. It's similar to how some functional expressions can be viewed as imperative statements and vice verse. Some of you are mistaking historical habits for universal unwavering truths.

I followed the criteria as they are described, using precisely the approach used in implementing programming languages. Any misunderstanding, and/or failure to explain, is entirely Top's.

If I rewrote the Rel parser in German (such that it still executed Rel properly), it would fail because most couldn't read its labels. In theory the "type determinator machine" should give the same answer regardless which language the parser/compiler was translated into.

Sorry, I'm not following you at all.

After the translation, the above output would look something like:

  Groff
    Shnicken
      Hiffsten
    Fluken
    Worstein
      Nuffen
      Ausdruck
       Fn-Invoken
        Arguff Liste
         Ausdruck
          ganze Zahl
         Ausdruck
          ganze Zahl

Not only did a human assign the "parts of speech" classifications to the grammar, but it is not general purpose. It only works on Rel.

Of course. That is what the grammar is for. "Typeness", if I can use that phrase for what you seek, is not an attribute of a grammar. Nor is it an attribute of the source code itself. Types, in terms of programming languages, are defined by humans. They describe (relatively invariant) sets of values and associated operations. They come from mathematics (either axiomatically or derived), logic, past experience, intuition, convenience, analogy, and scientific study. They do not come from structural attributes of source code or grammar. You might find clusters of characteristics that way, such as that "(" often follows "if", but that isn't a type.

Re: '"Typeness"...is not an attribute of a grammar. Nor is it an attribute of the source code itself.' - As I stated elsewhere, if they only exist in human heads, and each head can have a different model/view of the "essence" of the source code, then "types" fail to improve communication or understanding, at least until both sides arbitrary agree to pick one mental modelling choice over another. It's a flawed mental tool. That's fine if you know it's flawed and work within its known limitations, but some insist there's a "right way" to view programs, which is as arrogant as The Church persecuting Galileo. -t

"Typeness" in programming languages is often represented by programs. In particular, it is those programs, or parts of programs, that specify (relatively) invariant sets of values and associated operators. However, there is no inherent mechanism that can divine a type definition purely from a grammar and source code, because the mechanisms to define types do not obey some inherent pattern in symbols. They are, in short, higher order constructs. An equally difficult problem would be to discover loops and procedure definitions from patterns of symbols alone. An even harder problem, I suspect, would be to automatically distinguish -- given a grammar and source code samples -- games from business applications.

Also, you need to distinguish identifying the types themselves from type definitions. These are profoundly distinct.

I think that Top wants an algorithm which discovers the types instead of just outputing the types 'assiigned' by a programmer. Much like humans do learn types (categories, whatever you call these patterns). To satisfy Top the algorithm may not really use the input grammar (as the compiler internally does) because Top would probably immediatly claim that the algorithm just printed the names of rules of the grammar instead of the 'types'.

Actually there is now an algorithm which does exactly this:

"Unsupervised learning of natural languages" http://www.pnas.org/content/102/33/11629.full (a very recommended read)

This algorithm learns the types of elements of a language. Any language. Given enough example sentences of course. I assume that somewhere in the human brain is a process at work which does the same (albeit probably more parallel, more efficient and more general). Nonmetheless the end result is: Each element in a sentence gets a 'type' - the selected rule to produce it in its context. The algorithm will not call it e.g. "noun" (how could it), but the description will be equavalent to "element at a position where a word from a learned list of nouns is allowed". -- GunnarZarncke

Nice find!

I wonder if Top is conflating the relatively narrow definition of types as sets of values and associated operators, with types in a more general sense, i.e., with the very idea of types or with the various "kinds of things" in a programming language. The former clearly distinguishes types from variables, values, and operators. The latter admits different kinds of variables, different kinds of values, different kinds of operators, different kinds of statements, different kinds of any programming language element you care to mention, and different kinds of things in general. Perhaps there is some blurring or ambiguity in the latter case, but I don't know that it's particularly of concern when implementing languages.

When implementing languages, types, variables, values, and operators are unambiguous. That is self-evident. In implementing a language, what I code for handling type checking is very distinct from what I code to implement variable assignment, though the latter may make reference to the former to make sure the user doesn't miss a type mismatch error. There is no confusing a variable with an operator, or a type with a value. Of course, in some languages with (say) first-class types there are 'type' values that can be bound to 'type' variables, but that confuses neither the language implementer nor the successful user because the context makes the arrangement of these elements clear. A car, for example, can be both "vehicle" and "fast" without confusing "car", "vehicle", or "fast", and it doesn't cause some peculiar notion that a "vehicle" is a "fast" or lead to some difficulty identifying cars.

{Re: "Given enough example sentences of course." (Unsupervised learning) - I assume the sample (training) sentences have been pre-labelled by humans as far as parts of speech. This would then simply reflect a kind of voting approach. I suppose that would satisfy the requirements. However, it's still based on human "impressions" and "notions" rather than some clear-cut rule(s). You'd still have to admit there are no clear-cut rules in the traditional sense of printable formulas and algorithms. At least you seem to be closer to understanding what I am asking for. Kudos.}

No. The training set contains no labeling at all (the mentioned corpus is a very large set of real world sentences). The performance of the algoritm is compared to manually labelled sentences though. Actually the algorithm even works on sentences without word boundaries ("mynameisgunnar"), but needs an even larger corpus of sentences then (to learn what is a 'word'). The algorithm can also be used to find the structure of e.g. DNA sequences or web page click streams or any other sequence of tokens. -- .gz

{How is "manually labelled sentences" not a training set?}

Response #1: The algorithm works on unlabelled sentences. I don't know where you got that the training sentences are labelled. The only labelled sentences in that article occur only when comparing the labelling created by the algorithm with hand-labelled sentences. one could say: The algorithm infers a grammar suitable for the analysed sentences. -- .gz
- The million-dollar question is if it could it be used to find "types" and "say" what is a type and what isn't.
Response #2: The performance of the algorithm has been compared to manually labelled sentences, in order to validate it. The algorithm does not require manually labelled sentences in order to work.

So are you saying that types are just a labelling of sentences? Obviously a non-arbitrary labelling. Are you free to label the sentence "Gunnar runs fast" as "Noun Noun Noun"? Or as "1 2 3"? Probably not. The labels you want to use must have a minimum structure to be called types. Are you asking: What is this minimum structure?

If there such a thing, sure. Are you saying types a continuous concept? So that something is 40% typish and something else is 80%? As a mental model, I could say that the labeling of words or phrases is "adding side flags" to them.

Re: "The former clearly distinguishes types from variables, values, and operators [and attributes]. "

By what rule, God, Bible, algorithm, formula?

"I know it when I see it" is not good enough. I don't want vague notions, I want iron-clad rules."

By convention and/or definition, and informally:

Type = a (relatively) invariant set of values and zero or more associated operators.
Operator = a function or procedure that given one or more values, returns one or more values.
Value = an immutable instance of a given type, i.e., one of a set of values.
Constant = an invariant reference to a value.
Variable = a possibly time-varying reference to a value.
Attribute = a synonym for "Variable".

The RelProject/TutorialDee example above unambiguously identifies each of the above given a grammar and source code corpus.

It doesn't fit the requirements as originally stated. It's single language and has your opinions of the names of things, not classifications determined by an algorithm. If I took the source code of the Rel compiler, I could make it output "verk", "noppy", and "grippper"; instead of type, assignment, and expression. The goal is to have a machine take an unseen grammar and language and be able to identify types. If the compiler was re-written in german, "the machine" should still be able to output english descriptions.
Sorry, no idea what you want here. Given a grammar and source code, nothing will delineate types. A grammar may support type definitions, but it does not present proposed types for analysis.
[How do we determine in math that number types and equal signs are completely different beasts? Do mathematicians sit down and debate whether or not the equal sign is a number type? Do they make algorithms to prove the equal sign is not a number? Are equal signs really just numbers, playing games with us poor humans? How do we know in math that the widely accepted subtraction/addition signs are much different things than numbers? What algorithms do mathematicians use? Or do they even need algorithms for such trivial matter?]
That's a very interesting question. I wonder if anybody's worked on it. -t
[It is actually quite obvious that equal signs and number types are not the same. Even children easily grasp this in elementary math classes. That's why I don't understand why someone couldn't understand it applied to programming, and not just math. Programming by its nature contains math in it.. You can add two numbers in a program, just like a calculator. The rules in math still apply in programs, because programs contain math. Since math has number types, so do programs. We have just proven ThereAreTypes? using logic. The page called ThereAreNoTypes is false.]
- Those concepts are in the human's head, not the computer's head. It doesn't know if it's doing arithmetic or not, it's just following rules. It could be doing a simple map look up described below and never even "perform" arithmetic.
- Programming is in the human head too. You are not programming in machine code, you are using human programming languages designed for humans, not for computers. Machine code is designed for computers, not for humans. What is needed is an understanding of ComputerScience in order to continue the discussion. Without knowledge of ComputerScience this conversation is going nowhere.
- ComputerScience needs to come up with a unambiguous definition of "types".
- ComputerScience is a field of study, not a dictionary. It doesn't need to come up with anything. What you really mean is that you'd like to find an existing definition you can agree with.
- It's difficult to have real science without some rigor in definitions pivot-able to the experiments. It's like a ruler that changes size based on the mood of the parking attendant.
Again again again, the trivial/simple cases are not the issue. Those are not the ones we ultimately want to focus on. They are just serving as test cases and thought experiments right now. When we perfect them, we can graduate to the tougher ones. -t
I don't see how the so-called "trivial/simple cases" are any different from non-trivial ones. In fact, what non-trivial ones?
Perhaps "controversial" is a better description. Example, "There's no such thing as a type-free language", and the Coord-01 example under TypelessVsDynamic. How does one objectively know there's type "coordinate"?

You don't need operators to have types by some accounts. And it gets into the issue of whether operators that act differently due to some feature of the value(s) is a different operator under the same name or the same one. It's the borderline cases I'm interested in, not the clear-cut ones most agree with. We had a similar discussion somewhere regarding validation versus "types", and operator overloading.

I've updated the 'Type' definition. As for the "clear-cut ones" vs something else... Stop moving the goalposts! The definitions hold regardless of "feature of the value(s)" and operator overloading. A type doesn't become an operator because of some "feature of the value(s)", and if it did that would be an exceptional case.

I'm not moving the goalpost, I'm just making sure you don't over-focus on the simplistic cases.

For example, some dynamic languages will parse a variable with a string to see if it can be evaluated as a number if you use "+". If not, it does concatenation instead. Is that two different "types", or "changing behavior based on parsing"?

Automated type coercion. Values are still values, types are still types, variables are still variables. Nothing in the definition says a value can't be coerced from one type to another. Nothing in the definition says an operator can't perform type coercion.

How are you identifying that it's a "type" being coerced?

I'm not. It's a value being coerced from one type to another.

How does one know it's a "type" that's being coerced? How does one know that it's "changing its type"? (Further, I never stated the original was being changed, but that may not be a key issue regardless.)

There are various ways you can implement the infrastructure necessary to support type coercion. Are you asking for a description of them?

Using your "+" operator example, a common way is for the concatenation operator to attempt to coerce both operands to numbers. If the coercion succeeds, it uses addition and returns a number value. Otherwise, it coerces both operands to strings and concatenates them, returning a string value.

I'm asking how does one know "types" are involved besides "I know 'em when I see 'em".

Do you know apples when you seem them? Are they different than oranges? Part of the great thing about being a human is that we can clearly distinguish between things using objectivity. It is so obvious that an equal sign is objectively (not subjectively) different than a number or type of number. You seem to be the type that uses human psychology rather than science, facts, or math. I'm trying to implant some psychology into your head since you don't seem to be able to use math, theory, or science to distinguish between variables, values, and types.

The following is a very similar process:

  function AdjustURL(string url) {
    if (contains(url, "foo")) 
       return(concatenate(url, "bar"));
    else
       return(url);
  }

Is this "type coercion" also? If not, why is it different than the "+" example?

Does an URL belong to a set of URLs? Is there a collection of associated operators? If so, it is an URL type. Indeed, many language implementations provide just such a thing. It's usually a subtype of 'string', too, because an URL is a string with special properties. However, your example isn't coercion, because I see no evidence that you're changing an URL to a string or vice versa, let alone doing it implicitly.

I didn't necessarily "change" the value in the "+" example either. As far as "belonging to a set of URL's", do you mean explicitly in the computer and/or human heads? Where does this set have to exist to qualify? Note that the set of all possible "real" numbers cannot exist in the computer nor in our head because there's no room. We thus find ways to fake it: UsefulLies. One can define URL's as a set of all possible URL's, but that just means we can make theoretical models; and others may use a different model.

If 2 + "2" produces 4, you've changed the second value from a string to a number, i.e. you've coerced the string into a number. By "belonging to a set of URLs", I mean explicitly in the computer, but we don't need to explicitly specify every possible value if we can produce rules that determine whether a given representation of a value belongs to a type or not. For a string to be an URL, for example, it must exhibit a certain structure.

Re: 'If 2 + "2" produces 4, you've changed the second value from a string to a number...' - No, we don't necessarily have to. See "Computation and Mapping are Interchangable". We can take two strings as in input and return a string as an output. (And there are other ways besides mapping to do the same thing.) In languages that store/represent numbers as strings, such as Perl and ColdFusion, one doesn't really know what's happening under the hood, and it doesn't matter because it can be replaced by implimentations that don't have to convert the representation of the input parameters. In practice they probably convert in order to leverage the existing Intel hardware, but that's an implimentation detail that can be swapped for another.

As I stated above, there are various ways you can perform type coercion. However, the conversion from a string to a number, even if mechanically avoided, still occurs inevitably. Addition is a mathematical concept that applies to numbers, not character strings. Therefore, regardless of the mechanisms, if mathematical addition has been performed, it must -- by definition -- have been performed on numbers.

The "numbers" are in your head. I can think of them as lookup tables of strings if I want. Thus, they are not objectively "numbers". They may fit the "historical pattern" of numbers, but being a historical pattern is not an objective truth about what's in the program.
That is a concern for philosophy of mathematics, not ComputerScience. ComputerScience takes numbers at face value. (Pun intended.)
Exxxxxxaaaaaaaaactttllllyyyyy!!!!! That's my fucking point!!!!!! Ding ding ding! ComputerScience relies on ArgumentFromAuthority. Example Coord-01 in TypelessVsDynamic has a "coordinate type" because it resembles something from tradition, NOT because it's objectively there. Change the function name, and suddenly it's not a coordinate type despite having an identical structure. -t
Your point is one about the philosophy of mathematics? I think you're in the wrong place. We're mostly computer scientists and programmers here. Mathematics is down the hall and to the left... ComputerScience does not rely on ArgumentFromAuthority, but from an explicit and pragmatic choice to base its mathematical foundations on empiricism, logicism and formalism. The more obscure philosophical leanings (psychologism, for example -- which I suspect you espouse whether you've considered it explicitly or not) simply don't apply, because they don't produce useful computational results. Furthermore, you appear to be deriving something from the discussion about Coord-01 in TypelessVsDynamic that I'm not seeing at all. Your interpretation of it appears to be very different from mine, but I'm not interested in debating it.
Taking "numbers at face value" is indeed authority. "By tradition, we interpret things this way...". Same with the coordinate example.
Ah, you may have misunderstood my punny quip. By "numbers at face value", I meant we use them in their mathematical sense. I.e., if it looks like a number, and acts like a number, then it's a number. It would, in the vast majority of cases, be unreasonable to do otherwise.
Well, that's the problem with the Coord-01 example: it's a "coordinate" because it looks and acts like a coordinate. It's not a property of the source code, it's a property of the reader's interpretation of the source code. If we change the function name from "coordinate" to "shmizzle", the program structure is otherwise identical; yet readers will no longer "recognize" it as a type they know and love. The program hasn't changed in structure, only the reader's mind has. Thus, "types" is not a property of the program, but of the reader's mind. (The function name is internal, not part of the usual output.)'''
I think that's a misinterpretation of Coord-01, or perhaps your correspondent explained it awkwardly. A type is an invariant set of values and associated operations, whether a human recognises it or not. It's entirely conceivable, even reasonable, for a programmer to create a type as an intuitive way of solving a problem but not recognise it as a type. It's entirely conceivable, even reasonable, for a second programmer to study the code and recognise it as being typeful.

Re: "but we don't need to explicitly specify every possible value if we can produce rules that determine whether a given representation of a value belongs to a type [set] or not" - That's rather broad. Every IF statement is doing just that. Does that mean the results of every IF statement is a "type" (member of a type)?

No. What if the IF is used to exit the program when the red button is pushed?

It determines "type of things that cause program to exit".

Whilst the 'things that cause program to exit' may be plural, that doesn't necessarily mean it's a set. To be a type, it must be a set of values.

DataAndCodeAreTheSameThing, or at least can be viewed that way in one's head.

Indeed, in some languages that is explicitly true, and we can have (say) language statement values that are of a language statement type. However, in the majority of non-homoiconic languages, it is a meaningless equivalence because it is effectively inaccessible to source code in the language, except in the most awkward ways. In other words, the equivalence may be philosophically true, but not a language actuality. In C, the statements you write might be data to a C compiler, but they're not data as far as your program written in C is concerned unless you're prepared to write, at least, your own C parser and have access to the file containing the source code.

Okay, what if take Bob the Programmer's JavaScript (JS) interpreter and replace it with one that allows block structures to be added and changed dynamically (similar to Lisp). However, we don't tell Bob that we made the change. He thus never changes the block structures, yet the capability is there. His existing JS apps otherwise runs as they always did and he doesn't know the difference. So, does the IF statement now become a "type" in the absolute sense? It may be static in Bob's head because he assumes (models) it that way, but it's not static anymore. I see little use in making types depend on "mutability", other than temporary or convenient simplifications for communication or cognitive purposes. That's fine, but it's not an absolute trait. -t

Types do not depend on mutability, but on capability. Informally, if there exists an invariant set of block structures and associated operations, then there is a type. A simple example: C has an 'int' type, whether you use it or not.

That's because it's a side-flag. QED.

That doesn't make sense. An "int" isn't a side flag. It's a set of values (... -2, -1, 0, 1, 2 ...) and associated operators (+, -, /, etc.)

I thought we agreed that types don't need operators to be types. Further, the operators con operate on (use) both the value portion and the type-tag portion. Thus, they are not unique or dedicated. The language may make it awkward, but in some cases it appears we can even switch them such that the "type" is used as the value and vice verse.

Some types don't need operators to be types, but they're of narrow purpose. Typically, such empty types practically serve only as a form of indicator. E.g., given a type <x> with no operators, we might be able say that it is a type, or if we ask if a given variable contains a value of some type and we ask if it's type <y> but it's really type <x>, the answer will be 'no'. Such a type may serve as a "bottom" type, or as a "void" type. All other types (typically) have operators. What use would be an integer that we can't add, subtract, multiply, divide, etc.? A C "int" has operators.

And "int" is NOT a set of values. It may be in your head, but not necessarily in the language.

Int is most certainly a set of values. In C, it is finite subset of Z, the integers.

"Defined as conceptually equivalent to", perhaps, but no actual set is actually in there (or has to be in there). One can probably define or model the same behavior in a completely different way. The language standard may say it has to match a model X, but that does not mean it has to be implemented using the nor thought about in the head using that same model as long as the substitute is equivalent (or close enough for our intended usage).

It's not just conceptually equivalent to a finite subset of Z, it is a finite subset of Z. Imagining a set as a bucket that exhaustively contains every element is a helpful analogy or mental model, but it isn't reality. For example, the set of all even natural numbers, , would be infeasible as a container of every positive even integer. Only the expression that generates the set, i.e., its "builder", is necessary to fully and completely describe it. This notion is behind every non-enumerated type definition, and as it happens, programming languages are ideal for expressing complex set builders.

Yes, but like you said, it's a "notion". It's not actually in the software; it's in the reader's head. The reader could in theory replace their head model with a different "int" model that ALSO reflects program/compiler behavior. It's relative. We can only empirically test what comes out of the machine as far as fitting a definition, not peoples' heads. -t

{Oh please, not the EverythingIsRelative cop out again. Since EverythingIsRelative the concept of an equal sign can't be true either (ThereIsNoEquality?) because ThereAreNoTypes and the computer can't place equal signs into the CPU, and it can't place types into the CPU. Do you understand what abstractions are? You realize that tables don't actually exist either because the computer is dealing in 1's and 0's and therefore ThereAreNoTables. Please, the cop out of EverythingIsRelative is so lame.}

There may well be mental models that could replace the "int" model, but that doesn't matter. We're talking about what is actually modelled, not what is perceived. Types, as discussed here, are a concrete aspect of language design and implementation, not philosophy of computer science.

So types are defined by what the builder of the compiler/language had intended??? That's loco. Right back to DefinitionsThatRelyOnIntent (WhatIsIntent) again. -t

No, types are those parts of programming languages that obey recognised and generally-accepted ComputerScience definitions for types, like the one I've given on this page, or they aren't types.

Good! Now codify that process so that the steps are clear enough to state as an algorithm or list of rules. Codify "recognize", for example. Do you mean "it looks like something I've seen before"? That's probably not a good foundation for a definition.

With pleasure: "Type = a (relatively) invariant set of values and zero or more associated operators." As a definition, that is both sufficiently general and as precise as need be for the majority of programming purposes. More detail would be redundant or inaccurate, and less would be vague. Of course, it doesn't tell why we might want types or what they might be for, but that's a different issue. I have used terms -- "set", "operators", "relatively", "invariant", "values", "more", "zero", and "associated" -- that are recognised and understood by the vast majority of programmers and computer scientists. It's not necessary to codify "recognise" (or "understood") because it's generally recognised what "recognise" means. If I need to be more specific about "recognise" -- say, for the purposes of working in the areas of machine learning, AI, machine vision, etc. -- then I shall leave it to an expert in those areas to hammer out the details. If you don't recognise "recognise", then I submit the problem is yours and probably yours alone.

Computation and Mapping are Interchangeable

Take the following example:

  x = multiply(1, 2)

Normally we think about this as a computation, but it could also be implemented as a lookup table similar to the times table we had in grade-school:

  1, 2, 2
  1, 3, 3
  ...
  2, 3, 6
  2, 4, 8
  2, 5, 10
  etc.

In practice this tends to be resource hungry, but in theory the look-up tables could handle infinite expressions such that every computation can be defined as merely a lookup process and our programming language results defined as nothing more than re-mappings, but computation seen as a more practical alternative to impliment them. Our spoken language and thinking is thus shaped by practical concerns instead of theoretical ones, but that doesn't have to be the case. -t

Congratulations, you've re-discovered the conceptual basis for functional programming: a mapping from input to output. Of course, some mappings are infinite but computable, and others are finite but infeasible, so simple lookup tables are impractical.

[And what are the types of things being stored in that table? In Math the teacher would describe to you the number types that 1, 2, 3, etc. are. Are they negative numbers? Are they decimals? wholes? fractions? Positive numbers? If you put strings into that table such as "foo" and "bar", the strings cannot be multiplied. Whereas numbers can be multiplied, precisely because strings are a different type than numbers. They are not a different flag than numbers, they are a different type]

It can be seen as validation: "the two things you supplied are not in the table". We could add "foo" and "bar" into the table. That may make it less useful to humans used to things being a certain way, but the program doesn't care; it's just blindly following rules.

Certainly. Foo and bar are now part of a set of values, i.e., a type. Your "validation" is a type constraint.

So "everything's a type". How useful.

Very. You're on the verge of rediscovering TypefulProgramming.

If everything is a type, then every language is "typeful". Why sell it if it's everywhere? "Gitch'r Fresh Air, just two payments of $99.95!" (Well, okay, it's not a commodity in L.A.)

Congratulations, you invented the first proper TopsTypeDeterminator?:

  while (t = getNextToken(program)) {
    print(t . " is a type");
  }

Not quite. A type is a collection of values and 0 or more associated operators. Not everything is a collection of values. "IF" isn't a collection of values. "System.exit()" isn't a collection of values. Even System probably isn't a collection of values, despite being a class.

You sure about that? Absolutely?

On average, yes. I'm absolutely sure, however, that exceptions can be found. In a language with first-class control structures, "IF" might well be a control structure belonging to a set of control structures, with 0 or more related operators. In that case, "IF" would belong to a type. In typical C-derived languages, it does not.

C above, I mean see above in the Bob the Programmer example.

Maybe another way to view Tops quest is to compare it to the problems with definitions of sets (which are definitely related to types). Set theory is a very mathematical tool needed not only to reason about other theories but also to specify them, even their axioms. The structure of a theory is often given as some sets of values with 'rules' (e.g. predicate logic) on them. Naive set theory uses natural language to specify what it means by a set and the operations on it. This means set theory is vague. It cannot be implemented as an algorithm as such. It just serves as a stepping stone to 'real' set theory. Now there is axiomatic set theory of course. But that assumes an understanding of the concepts. Obviously you cannot use set theory to specify and reason about set theory. So where do you start? Its chicken and eggs. Does this mean there are no sets? No. Only that the problem of what sets are cannot be answered by referring to sets. -- .gz

See http://en.wikipedia.org/wiki/Naive_set_theory

It's probably possible to view/model just about anything as sets if one works at it enough. But if that means everything IS A set, or can be a set, then it's probably meaningless to say things like "sets are good" or "you should use more sets" because everything already is a set in one way or another. If you cannot identify things that are not sets, then selling sets is moot because they are already "there". More than likely they mean a flavor of programming that resembles some existing notation or technique: a style of doing things. That's perfectly okay, but be careful what you label it. What people typically call "types" in Algol-influenced (C, Pascal, and BASIC-like) languages are modeled well as "type flags/tags" in my observation. But that irritates some because it's too narrow; but that narrowness is a good thing. I asked, "what is common about what people are calling "types" in these kinds of languages, and how can I describe or model it better?" I isolated a spoken language pattern and found a way to describe it via a model: type tags. -t

Likewise, it's probably possible to view/model just about anything as tables, top. But that means it is meaningless to say things like "tables are good" and that it fits your psychology better.

{To say "types = values with type tags" (that is your model, isn't it?) is the same as saying "bicycle = vehicle with bicycle wheels". "Vehicle with bicycle wheels" gives us one possible characteristic of what might be a bicycle, but it might be something else entirely. It definitely doesn't model a bicycle, because it fails to capture the abstract essence of a bicycle. Similarly, "types = type tags" hints at how a particular type system might be implemented in some programming languages, but it is inapplicable to other languages and certainly doesn't tell us what types actually are in programming language terms. It fails to capture the essence of a "type".}

If the "abstract essence" is in the head, and there are multiple ways to model/view things in one's head, then "abstract essence" has little utility, or at least is problematic when both sides are using a different mental model. My alternative is not perfect, but at least narrows it down more. Each variable, or scalar segment, has two "parts" (the value and the type), and operations may take both of those in consideration when producing results. That's the "mechanics" of types in C-style languages. As far as the "essence", that gets into what's in your head, and it may not match what's in another head.

{The abstract essence is much more concrete than that. Which is a more accurate description of an "int" in C: "'int' is a side flag" or "'int' is a set of integer values (... -2, -1, 0, 1, 2 ...) and related operators, such as +, -, =, *, /, etc?"}

Describing integers and describing types are two different issues. -t
Int is a type, like any other. Why would the issues for one type be different from the issues for another type?
You put words in my mouth. I never defined int's with respect to C.
Sorry, I'm not following you. I didn't intend to put words in your mouth. I meant only to use 'int' for illustration, because it is a type that is generally recognised, understood, and representative of types in general.
That's because they all have side-flags in common and people "feel" that pattern.
Technically that's not true, I don't know how you determine whether or not people "feel" a pattern, and I don't see how that's relevant here. Anyway, compilers with static type systems that do not implement inheritance do not use "side-flags". Type checking is entirely at compile-time, typically with the AST being decorated with type information in order to guide the code generator. Compilers with static type systems that do implement inheritance may associate run-time type information with object instances, but that's typically called a "type tag", as a "flag" implies a boolean value.
Why are you discounting compile-time? The compiler uses the tags. One can even argue that the "types" disappear when the tags disappear such that the machine code has no types. It then becomes somewhat similar to the no-tag languages that I like, except that the compiler prevented certain patterns of actions from being formed. ("Flag" doesn't necessarily imply Boolean, but I'll table that debate for another day.)
- [No, one can not argue 'that the "types" disappear when the tags disappear such that the machine code has no types'. Because what types are is more in the semantics they enforce than in their representation. The type "int" ensures that some bytes in memory are always interpreted as an integer number. This will be the case even if the tag is not present. A type "safe pointer" will ensure that the machine code accessing the pointer can (by construction) never access memory outside its allocated array (thus preventing e.g. buffer overruns). These properties persist even when the type tags are erased from the machine code.] -- .gz
  - The C compiler simply "cements the fences" into place, it does not "enforce" types. It applies a rule such as "a fence will be built between A and B", builds the fence, and leaves. The pathways that enforce "type rules" then dry and harden and the compiler leaves town. Types influenced the wall design, but don't stay around. The "artifacts" of types are around, but not the types themselves. Just because Aztec artifacts are around doesn't mean the Aztecs are still around. The compiled C program cannot process a real as an int because there is no pathway available in which to do such. There are "walls" in the EXE. (Pascal may be a better example. C let's you fudge sometimes.) Contrast this with an interpreter that is like a live GateKeeper who checks the passes (tags, he he) of every visitor as they come and go. -t
  - Your so-called "'walls' in the EXE" are precisely what types are intended to create. That is why we use types in programming languages. A compiled C program cannot process a real as an int because real and int are different types. Whether the "Gatekeeper" is "live" or not is immaterial. Types intentionally and desirably create "walls" between certain operations.
  - Yes, the types create walls, but types are not walls. Walls are the result of types. Thus, when the tags are gone, the types are gone; only the tags' artifacts remain. -t
  - You are focusing on implementation, not on what a type is. A type is a set of values and 0 or more operators associated with that type. What implementation aspects may be transient or permanent is immaterial. The set of values and the operators are not subject to variance. They are there regardless what machinery is installed or removed as part of their construction or operation.
  - But that's not IN the EXE. You can get an identical EXE via assembler even and couldn't tell the Pascal-produced EXE from an assembler-produced EXE. They'd be binary-identical. Or even hand-coded machine language.-t
  - Again, you are focusing on implementation. What matters is the language we program in, not what bits are emitted by the compiler.
  - "Sets" are also an implementation, or at least one of many possible models of "types", and your def references "sets". And models are interchangeable with "implementation", at least in software. The only real difference is that one is more convenient for humans and the other is more convenient for machines. But, a machine (interpreter) could "run" C source directly, and a human can read machine language and know what's going on (mentally execute it and/or reason about its potential behavior). I know this will tick you off, but it's relative. -t
    - You are still focusing on implementation. Where the C source "runs" is immaterial. 'Type' as a programming concept may be exploited by the machine to provide automated TypeChecking and TypeSafety, but it doesn't have to be. The reality of types -- based on sound definitions -- exists in the absence of any compilation or execution. Types are an aspect of the programming language, not where or how it is executed.
    - Again, they are interchangeable. Execution can happen in the head or in a machine, and analysis can happen in the head or the machine (if the machine is powerful enough). If the rules for what are "types" are clear and unambiguous enough, they should be convertible into a "detector" that can run on the machine. That's what this topic is all about. (That's a machine analyzer, not a machine compiler.) Sure, it's difficult to put fuzzy notions into TuringComplete algorithms, but that's YOUR problem. Do the work or admit it's too tough a problem for yourselves. Man up! (Or Nerd Up). When you look at a language (and source) you perform "calculations" in your head to determine what are types and what are not types. Codify this classification process so that an algorithm can replicate the same steps that your head takes. If you don't know why your head is doing what it's doing, then perhaps you are in the wrong profession, or claim too much about the "clarity" of types. That's what us programmers/developers/analysts do: turn vague notions into concrete algorithms. Stop flaking, dammit! I don't expect perfection on the first try, but you are not even trying. -t
    - What is interchangeable? If two things are interchangeable, their common abstraction need not consider either of them. Likewise, the definition of 'types' need not consider compilation or execution, or whether analysis happens in the head or the machine.
    - Please clarify the 1st sentence. The definition should consider the "output" because otherwise it does nothing. If it does nothing, then just about any model that produces nothing can reproduce the same results. In other words, the model has to have consequences (output) to be testable for fit.
    - If A and B are interchangeable and they possess a common abstraction C, then I need not consider A or B because C is sufficient. For example, if REAL and RATIONAL are interchangeable and both are subtypes of NUMERIC, from a definitional point of view I need only concern myself with NUMERIC. Definitions need not consider output. Do you need to consider the destinations where a bicycle may go in order to define a bicycle?
    - Whilst "type detectors" are conceivable and even reasonable in certain domains -- factor analysis in statistics, or cluster analysis in data mining, for example -- programming languages present an interesting challenge. Types are defined via higher order constructs -- typically programs -- based on their semantics, not their syntax. Brains are good at pattern-matching semantics, i.e., programs. Programs themselves, as yet, are not. Syntactically deciding that something is a 'type definition' -- aside from trivially identifying obvious syntatic 'type definition' machinery that's provided by the language, as I did above with Rel/TutorialDee -- is akin to syntactically distinguishing business applications from games. Can you do it? Can you even conceive of how to do it? Syntactic analysis of C programs will not tell you that an 'int' is a different type from a 'float' -- and certainly won't tell you that integers are distinct types from reals -- but they have distinct sets of values (subsets of their corresponding mathematical types) and distinct (though overloaded) sets of associated operators. In short, types are defined and referenced via the syntax of programming languages, but the syntax does not divulge these in any absolute sense. However, the semantics of the language, and in particular how it's used to implement certain constructs -- like defining a set of values and 0 or more associated operators -- can be easily recognised by humans. When machines do become good at this, it might be time -- finally -- to worry about programmers' jobs being replaced by machines.
    - I disagree with "can be easily recognized by humans" because what you describe is not necessarily what most humans are ACTUALLY recognizing and calling "types" in practice. Perhaps they are each calling them "types" for different reasons, even. (And I still don't see the purpose of saying "zero or more operators". It doesn't add any actual constraints.)
    - I'm not concerned with "most humans" overall, just those working in programming, SoftwareEngineering, and ComputerScience. Within those fields, I've little doubt that most humans will at least recognise my definition as acceptable to large swathes of all three fields. Those working in language theory, TypeTheory and the like would, no doubt, prefer more precise definitions as befits their specialisms. "Zero or more operators" identifies the mechanisms by which programs interact with values, which has a direct impact on the application of types in programming.
    - "Most programmers" is what I should have said. TypeTheory is not a definition, only a model.
    - All theory incorporates definitions, out of necessity.
    - Internal working def's perhaps.
    - I'm not clear on what point you're trying to make.
    - "Public" defs have to base it off actual usage for the most part. Working defs used in a narrow scope, context, or sample don't. They are like givens.
    - I'm completely lost now. Is there a point you're trying to make?
    - Let me restate it differently: Reynold's TypeTheory does not offer a "public" general definition of "types". It may define terms for it's internal usage, which are sometimes called "working definitions" or "internal definitions".
    - I didn't know Reynolds's work was particularly under consideration here. The definitions I've used are not from Reynolds, but are paraphrased from texts by C J Date alone and Date & Darwen together. However, recognised "type" definitions revolve around the same essential concepts.
    - Yeah, side-tags, described umpteen different ways. (Except I don't focus on operators because there can be zero or more, which means they are not a constraint of the def.) -t
    - Such knee-jerk responses do not contribute to the debate. You've offered no new information and are now becoming repetitive, like shouting "bicycle wheels, BICYCLE WHEELS!" at random intervals during a serious discussion on the definition of "bicycle". And, it's a discussion where "bicycle wheels" have already been dismissed -- repeatedly, by various participants, and by the Literature -- for being insufficient. You're obviously not finding any traction for "side-tags" here, so I suggest you try elsewhere. I recommend trying it with new programmers, where trivial analogies -- if used properly -- might be initially helpful.
    - Some of you complaining about repetition are hypocrites. And I was merely commenting on "revolve around the same essential concepts". If it happens to match the response to another issue, that's not necessarily the same as data repetition, because the "attribute" is different. Further, why should the definition only be geared toward the academic crowd? Newbies and practitioners are people too, my friend.
    - There is nothing hypocritical about reinforcing a point by presenting a more detailed argument with additional explanation, greater detail, or a new approach. On a number of occasions, you've offered nothing more than "it's side-tags!" That's the very opposite of additional explanation, greater detail, or a new approach. As for the definition presented here, it's hardly "geared toward the academic crowd". Indeed, an academic crowd typically seeks greater rigour and formalism. The definition presented here is aimed squarely at the active SoftwareEngineering practitioner. Furthermore, it actually is a definition, unlike "side-tags" which is at best a beginner's analogy, at worst an implementation detail, and typically nothing but buzz-phrase with no descriptive content whatsoever.
    - You guys keep CLAIMING it's super rigorous and detailed, yet cannot turn it into an algorithm/formula/rule-list for an automated pass-or-fail language test (or anything remotely close), the original topic. It's "super detailed", yet it only "runs" in your head and cannot be written down as detailed steps/rules? Rambling is details also, but not necessarily rigor. At least I don't claim side-tags is rigorous nor the "only right" model. There are multiple useful head models. The wise know that TheMapIsNotTheTerritory, but can still admire and make use of the map and realize there are other maps possibly of equal value. It's frustrating; honestly frustrating. What's even more frustrating is that you don't even seem to understand why I am frustrated, as the lack of paper-able rigor despite a feeling of rigor on your part doesn't seem to bother you in the slightest bit (NPI). I'm just seeing a very strange denial; a rejection of codifiable western reductionism. - t
    - [ TypeTheory is really quite advanced. It is possible to express really interesting properties (!) of programs (or rather their statements and variables) as types. Not only numeric properies like 'is integer number' but also 'list of type' (algebraic data types, better known as generics), 'is a safe pointer' (regions), 'is known to terminate' (termination analysis), 'has side effects' (purity analysis), 'throws a particular exception' (effect types), 'has state of a specific type' (monads) and what not else. ]
      - That's because it's a model. IF things can be mapped into that model, then lots of wonderful results and features can come about. But a model is not necessarily the same as a definition. I suppose you could define "types" as "anything that can be made to fit model X". However, that may just be "testing" human creativity, which varies widely. Maybe almost anything can be made to fit with enough mental gymnastics. We don't know. That would make for an inconsistent result, and frustration (similar to what I feel here).
      - I can't help but feel you're turning types into something more complex, ephemeral and philosophical than they need to be. I'm not saying an academic consideration of types isn't complex, ephemeral or philosophical, but I am saying it isn't necessary to give types such a full-on academic treatment in order to recognise, understand, and appreciate their role in programming languages. I am not a TypeTheory expert -- not by any stretch of the imagination -- but I certainly recognise, understand, and appreciate the role of types in programming languages. I do that, in essence, by simply accepting the fact that for the vast majority of cases regarding "type" as "a set of values and zero or more associated operators" is not only a sufficient definition, it's one that is both generally accepted and demonstrated repeatedly in practice. I suspect you feel frustration because you're intentionally fighting against this, but I'm not sure why.
      - There are different ways to explain, think about, and/or model types in programming (at least for the "head"), but for some reason a handful of WikiZens have a hissy-fit when I propose a model they don't like, suggesting it's "not proper". They play both the rigor and non-rigor cards back and forth when it suits them. If they continue to do that, then I will press them for specifics; hence this challenge. If they shut up, I wont bring up again. And I still believe you are over-stating the role of operators and that "zero or more" is useless to include. -t
      - Operators are important. What use is a type from which you can select no values, and even if you can select values (i.e., it has one operator to select values), has no means of performing operations on them?
      - DedicatedOperatorMeaningless may address this.
    - [ But this doesn't mean that these properties can necessarily by inferred from any given code. Inferring types is much harder than enforing them. And then the inferred types are generally contructed from a set of base types (which I agree where put there by a programmer). There are only a few experimental engines which by themselves infer properties of program code which doesn't build on elementary types. These generally use predicate logic, sometimes higher order logic. But in the end types are just a short cut because they are equivalent to some predicates.] -- .gz
    - There are numerous aspects of programs that are easy for programmers to recognise, but are intractable -- or, as yet, impossible -- for machines to recognise without human direction. Automatically inferring "typeness" in any general sense, outside of pre-knowledge of whatever syntactic mechanisms are explicitly provided in a language, is one. Another, for example, is identifying when a program is manipulating numbers. Can you write a program that given only a grammar and a source code corpus will identify where numerical calculations are being performed, without explicitly identifying the numbers? It's easy for a programmer to identify numerical computation, not easy for a machine to do it on its own; it's an AI problem. The same applies to types: with the definition I've provided here, it is straightforward for a programmer to identify types in a program. Getting a machine to do it the way you've described is an AI problem.
    - Problems in AI generally lack rigor (other than matching scores for certain domains). You guys implied identifying types was super-obvious, well-defined, and/or a "solved problem". You seem to be back-peddling now. I'll agree there are certain tasks obvious to adult humans that are tricky to codify/automate, such as washing the dishes. But it's quite possible that there are multiple ways to wash dishes and recognize what are eating tools (dishes and silverware), what is grime, what is water, and what is the sink. Some minds may focus more on identifying the shape and look of the various tools, while others may form rules in their head, while most probably use a multiple clues to cross-coordinate info. Further, a lot of it is just familiarity with the human domain of eating and practice washing dishes, which are forms of pattern matching and "re-enforcement weightings" (more on this later). As far as identifying the processing of numbers when digits are not readily present, without knowing the domain and "number-oriented words", that task may be difficult for humans also. If an accounting program was written in French, I possibly wouldn't recognize it as number-oriented either. Again, that's largely pattern matching based on experience. I see words like "amount", "round", "tax", "accumulate", "totals", etc. and know those are number-oriented words based on experience. There may be neurons that fire signals into the "number section" of the brain when each of those words are encountered. If multiple words fire signals to the same brain section, the number section, then one has a strong "feeling of numbers". We don't necessarily know how those neurons got wired that way, we just "feel" the notion of number-ness. It's the repeat re-enforcement weighting technique that is generally lossy in terms of retrieving the specific events that caused the re-enforcements. Our accumulated experience incrementally "bumps the dials" to gradually form impressions. This may result in "I know it when I see it" feelings, but that doesn't tell us how the weightings were set (trained), and thus gives us incomplete information. It's insufficient to build a definition around. -t
    - I'm not sure why you think something super-obvious, well-defined, and/or a "solved problem" should be trivially computable, or even computable at all given current technology. I can easily recognise a cat, fish, or bicycle. Getting a computer to reliably do it -- under all the conditions that we can -- is unsolved. Identifying types in programming languages is the same.
    - Recognizing cats under a wide variety of conditions is not well-defined. Your examples don't illustrate your point well.
    - You must have a lot more difficulty identifying cats than I do.
    - It's not "well-defined". If by "solved" you mean, "Ask a human with normal eyesight to do it", then yes, it is "solved".
    - I believe cats are quite well-defined. Ask any zoologist. Introducing abnormal eyesight is needlessly inflating the issue and has no impact on my point.
    - I thought the context was along the lines of vision/image recognition, not DNA analysis etc.
    - It is. I was being somewhat facetious. In short, humans can easily recognise cats under a variety of conditions given a variety of cats. Even children can do it. Computers can't. Likewise, types.
    - Agreed, but don't be self-deceiving enough to claim the process/steps to identify a cat are "rigorous" or "clear-cut" or "obvious". Well, I suppose "obvious" can have several meanings. But the bigger issue is that the steps are not written down in any sufficient rigor and clarity (at least not in a form usable by the readers). "See if it has pointy ears" assumes we can identify ears and have some rules for "pointy" versus non-pointy. "I know it when I see it" is just plain not rigorous enough for detailed analysis and communication and solving disputes about whether tag-free languages are "type free". Admit it's so and we can move on. Just don't insist "types" is clear again. (Note that my kids used to mistake possums for "sick cats".)
    - "Types", as discussed on this page, are precisely as clear and rigorous as "cats", and no claim has been made otherwise. There are considerably more rigorous definitions for "types" than presented here, just as there are more rigorous definitions for "cats" than have been discussed here. For the majority of purposes, a common understanding of "type" (a set of values and 0 or more associated operators) or "cat" (a creature of the family felidae, or a facsimile thereof) is sufficient.
    - We can say there are common or frequently-found patterns of usage, general "notions" among the population, but again that's not sufficiently rigorous to answer the tough questions about types in any consistent way. I don't see how the side-tag model/definition is any worse or "fits notions" less than what you propose. Don't chew it out without just cause. "My fuzz is the right kind of fuzz and your fuzz is the wrong kind of fuzz because I say so". I believe it's your anti-scripting feelings and evangelism that is making you biased against it. Incidentally, the side-tag model can be extended to include many of the features of so-called TypeSafety to protect and document intentions, etc.
  - [Set theory is the branch of mathematics that studies sets, which are collections of items. You don't need a machine in order to do math. You do realize you can do math without a machine (calculator) right? Before calculators existed people could prove their math correct just using paper. I'd advise some of the people on this wiki to start programming on paper and verifying their program is correct, without even using a computer and without even running the program. Then you will no longer confuse physical implementation with the model. If you run your program after writing it out on paper, and the program fails, even after you've proved it correct - then there is a bug in your interpreter or your compiler, or you didn't prove it correctly. As for executing C source directly on the machine, I'm afraid that CPU's don't grok things like "struct" and "#define" it would be of no meaning to them. Can you please explain how the CPU would deal with directives like "#define" and how the CPU could possibly have a file system on it to deal with #include file directives? Are you sure you know what you are talking about?]
  - The reference to "machines" is merely meant to encourage one to clarify their processes enough that a machine can execute it or perform the analysis. It's not necessarily the final goal in this case.
- [Maybe this is made clear by the semantics of types. There are operational semantics of types, which at some level or other make use of "type flags", even if these disappear at executed level). But then there is also denotational semantics, which defines the meaning of the types entirely without resorting to an operating form but instead deals with types as logical formulae. ] -- .gz
- See http://www.google.de/search?q=introduction+to+operational+semantics+of+types&hl=de&biw=697&bih=400&num=10&lr=&ft=i&cr=&safe=images
- or http://en.wikipedia.org/wiki/Denotational_semantics
I'm discounting compile-time because it's an implementation issue, specific to some languages but not others. It does not address what types are. Your phrase, "the compiler prevented certain patterns of actions from being formed" is, in fact, a lot closer to an accurate depiction of what types are for, but it still doesn't say what they are. Reference to "type tags" in any definitional sense is like saying "a bicycle is where you attach bicycle wheels to a bicycle" It tells us something about many bicycles, it even has a certain intuitive rightness -- people may '"feel" that pattern' if they already understand bicycles -- but it doesn't tell us what a bicycle is. It only tells us one aspect of how many of them are constructed.

The language does not have to use "is a set of" to perform integer operations. That's in your head. Internally it may merely be "if the side flag is 'int', then interpret the bytes in such a way....".

{The language does have to use "is a set of" to determine that some set of input tokens representing a value is an int, so that we can choose the correct +, -, *, etc., for that value.}

{In your approach, you'd have to use "is a set of" to determine the correct "side flag", then use the "side flag" to choose the correct +, -, *, etc., for that value. Do you see here how the notion of "side flag" is actually redundant?}

FalsifyingExistenceOfSideFlags

There are different ways to implement "Determining the correct...". Regardless of the implementation approach, the output "depends on" (affected by) both the side-flag and the value. It's as if two parameters are being passed instead of one (the flag and the "raw" value). They just happened to be munged together under one "variable".

{It doesn't matter how you implement "determining the correct..." Implementation is irrelevant. No matter how you implement it, or even how you look at it, you are performing "is a set of". That's what it is, regardless how you do it.}

Again, just about everything can be defined/modeled/described in terms of sets. That's too open-ended to be useful in most cases; a UselessTruth.
{But it's a reality. Set theory underpins much of mathematics and ComputerScience. The richness and universality of SetTheory makes it useful in a variety of areas, the very opposite of "too open-ended to be useful in most cases; a UselessTruth." Actually, it's a UsefulTruth, because it provides a pre-proven basis for various computational implementations -- like the RelationalModel, for instance.}
As a specific model of a specific thing being discussed, it can be of use. But it's still too open-ended to make general "types are...." statements.
{Why do you think so?}
- 1) Everything can be mentally modeled with sets
- 2) There are probably multiple competing "set models" possible for any given thing of interest.
- {It sounds like SetTheory is a powerful tool. That's a GoodThing.}
- Agreed. Just don't misuse it.

{That a value has a type is trivial, and part of the definitions I provided above. A value has a type, by definition. Saying the output depends on both the side-flag and the value adds nothing to the self-evident notion that the output depends on the type of the value. In fact, at best it's merely an awkward (and somewhat implementation-flavoured and evasive) way of saying "a type has operators."}

What definition? Please use a PageAnchor or something searchable.

{Definitions as follows, from above:}

Type = a (relatively) invariant set of values and zero or more associated operators.
Operator = a function or procedure that given one or more values, returns one or more values.
Value = an immutable instance of a given type, i.e., one of a set of values.
Constant = an invariant reference to a value.
Variable = a possibly time-varying reference to a value.
Attribute = a synonym for "Variable".

Why say "and zero or more associated operators"? And why limit it to "invariant"? What field usage pattern does this come from?

{I say zero or more associated operators because it's true. Types without operators do exist, but their use is limited. E.g., "void" in C. "Invariant" refers to the fact that types represent categories or kinds of values, rather than transient storage. Admittedly, it probably does not significantly harm the definition to remove "(relatively) invariant".}

Everything in the universe has "zero or more operators". This hints at something you are thinking of, but cannot articulate right now.

{Sorry, I'm not following you. I'm not concerned with everything in the universe, just programming languages. That a type is a set of values and zero or more operators is intuitive. E.g., int: set of values (... -2, -1, 0, 1, 2 ...); operators +, -, *, /, etc. This pattern extends to every type.}

Why include it in the definition if it applies to everything, even in programmer-land?

{It doesn't apply to everything, just types, though at least one recognised definition informally describes types as "everything that can be determined about a program without running it." Under the definition I've used, "if" in C (for example) is not a type per se.}

If you get rid of "invariant" it may; but as already described in the Bob example, that probably won't change how most people think of or use the language.

{Sorry, not following you.}

Please review the Bob the Programmer example. "If" can be dynamic.

{If "if" is dynamic, belonging to a set of values and having operators that act upon those values, then it's a type. Otherwise not. This was discussed above.}

Doggonit! I thot "operator" was irrelevant.

{Where did I say "operator" was irrelevant? What I said was that there are types, of limited utility, that have no operators. E.g., "void" in some languages, the 'bottom' type in others. Most types have operators.}

"zero or more associated operators"

{Indeed. A "bottom" or "void" type (for example) may have no operators because it's never instantiated. Instantiating a value of a given type requires at least one operator to instantiate a value of that type. In C, it's implicitly invoked for the built-in types based on recognising input tokens. E.g., '123' is recognisable as an implicit request to invoke the 'int' type's operator that instantiates a value of type int, '12.45' is a request to instantiate a value of type float, and so on. In C, for user-defined types, it's whatever initialisation the programmer specifies. In C++/Java/C#, it's implicit for built-in types and explicit constructor(s) are specified for user-defined types. In TutorialDee, the value instantiation operators are called selectors, and are invoked implicitly as with C/C++/C#/Java/etc. and explicitly for user-defined types. And so on.}

{In short, every type for which you wish to instantiate a value needs at least one operator to instantiate a value of that type.}

I hate to say it, but that depends on how you define "instantiation". VB Classic by default acted like every variable existed already with a nil variant.

{I'm not clear how that is relevant. Variables are neither values nor types. Do you mean the variables by default contained the value 'nil'? (I don't recall much VB Classic, I'm afraid.) If that's the case, there's an implicit operator associated with the 'variant' type that instantiates 'nil' so variables can be initialised.}

But using "implicit" you are starting to shoehorn it into your target model by having imaginary actions, such as initialization. True, all of programming is probably that way in our heads to some degree, but it just means there are multiple models for the same thing. In practice it probably does have an initialization action under the hood, but that is analyzing the implementation, not visible behavior. It's externally indistinguishable from "all variables pre-existed with a 'nil' value". Both sub-models "work".

See PlethoraOfUsefulHeadModels.

{Beware of confusing "implicit" with "non-existent". A variable that is not initialised is very distinct from a variable that is initialised, whether implicitly or explicitly. That is visible behaviour. In any model, if you use a variable before it is explicitly initialised, you must define its behaviour, i.e., you either must provide implicit initialisation, treat it as undefined and force all subsequent behaviours to be undefined, or throw an exception. There are no other options.}

I don't know much VB, but it is very important for a Vb programmer to know if the variable is initialized to Nil or whether it just points to garbage (uninitialized). In practice initialization under the hood is something you must know about, otherwise checking the variable for nil wouldn't work (if it was just pointing to garbage).

VB-Classic did not result in garbage if uninitialized, just nil. (Most used the OPTION EXPLICIT directive to avoid accidentally using such, as a side note. The nil think was a dumb default.)

What would be a good default?

I meant the problem is that it allows null/empty/blank variables without an assignment. In Php, for example, you can create a variable "out of the blue" if you assign something to it. However, VB's default behavior was that this could happen on the right side of an assignment statement. I'm okay with "left-side creation" of variables. - t

The "sets" definition actually kind of matches the side-tag model: the tag identifies which set the variable/object belongs to, and the value portion is a particular member of that set. Maybe we can reach a common ground. -t

The side tag "model" is about as useful as saying "Types are side types". The side type identifies which type the variable/object belongs to. You are just playing word games. Type signature, type tag, side flag, why not call it side label? How about renaming apples to "a fruit with an apple tag on it". One doesn't understand what types are by an implementation detail. If apples actually had tags inside their atomic structure which differentiated them from oranges, that would be an implementation detail of the universe.

"Sets" are also an implementation.

Set theory is the branch of mathematics that studies sets. Do you think mathematicians have an implementation nearby? They don't even use computers for christ sake. You do math on paper.

Your fruit analogy makes no sense to me. And "side tag" is not meant to be the full definition in itself, just a label. Don't mistake a title for the article. It could be called the "bicameral variable model", for example. A variable in a typed language carries both a type identifier component (the "side tag"), and the value component. I'll try to formalize it a bit more:

   ST-DEF-CANDIDATE-01
   Types are a secondary component of a variable
   which identifies the set to which the value
   portion of the variable belongs.

{I think you've confused "formal" with "informal but official-looking". I suggest you look into FormalSystems.}
We are in the early draft stage here. And I never claimed it could be fully codified. I'm just trying to present alternative fuzz to your fuzz.

Constants can have types too, and literals. A literal string is not a variable, nor is a constant. A number in math has a type, and a number is not a variable.

{Indeed. A literal is a value. A number is a value.}

ST-DEF-CANDIDATE-02 Types are a secondary component of a programming feature/object which identifies the set to which the value portion of the feature/object belongs.

ST-DEF-CANDIDATE-03 Types are a conceptual or actual "tag" attached to programming features or objects, which is a list of set identifiers that are used for knowing how to interpret and check compatibility of interactions among the features or objects.

[Indeed I think the interpretation aspect is interesting and worth to be discussed. That is because interpreted languages do not do any type analysis beforehand ('without actually running it'), so there are no formal types in these languages. But obviously there are some 'types'. And I'd guess that Top means exactly these types which are used in interpreted languages and probably encoded as 'side flags' quite often.]

[So there is indeed a bit loose language when we speak about types in the sense of 'properties of program (locations)' and types as declared or inferred attributes ('side flags') of values. The former I will call format types and the latter data types.]

[A type theorist would call the formal type of a variable which can hold values of different data types a 'union type'. The simplest union type is 'any' or 'top' (sorry Top, its really called that), meaning the formal type which tel nothing about a variable because it could be any value of any data type.]

[But mostly it would be a more specific union type, namely the union of all possible (and this may depend on program analysis) formal types of all values which may be assigned to that variable. The formal types of all of these values again determined from the variables which may contains them and so on until ultimately the formal types of literal values are reached. Here we have a 1one-to-one mapping between the data type (e.g. 'int' for 123 or 'String' for "abc") and the formal type (which will usually also be called 'int' or 'String').]

[Now what about interpretation. Are there indeed no formal types? Even if the interpreter never does program analysis we as programmers could reason about our interpreted program like this: 'This variable can hold only a number because it is only ever assigned a number.' And when we do this we infer the format type of a variable to be 'int' instead of 'any'.]

[A smart compiler (JIT) for this interpreted language could do the same. Actually most faster dynamic languages out there do so. Most prominently probably the Chrome JavaScript interpreter. They remove all the switch(value.type) { case int: intop() ; ... } cases with specialized code which handles only the cases which can be proven to actually occur. -- GunnarZarncke ]

See ApplyingTheSideTagTypingModel for some related notes about interpreters and compilers. -t

An observation... There is a lot of confusion on this page with respect to symbols,values and types. We use symbols in our languages to provide more or less agreed upon references to things in the real world. These are COMPLETELY ARBITRARY. BUT how these symbols get strung together is NOT ARBITRARY. There are many differences in SURFACE SYNTAX, but every human langauage has a way to describe something like this:

NOUN VERB NOUN

A noun is a "TYPE" of thing as is a verb. Keep in mind that MY use of "NOUN" and "VERB" as symbols is equally arbitrary, but irrelevant to the discussion. Things that we label as NOUNS or alternately things that can be the DOER in this simple sentence or the recipient of an action have some set of traits that make them similar and suitable to BE the doer of an action or the recipient of an action. Plato called these forms. Rubyists call it "duck typing". It is generally informal, but it is necessary to be able to communicate a simple sentence like "Dog bit Man". A thing capable of biting bit a thing capable of being bitten. Abstracted slightly DOG op MAN, where operation is an operation that is possible for a dog to do and to be done to a man. Most of use do not spend a lot of time thinking about our linguistic type systems, but they do exist and at some level are required to allow communication to happen. This is true whether the communication is human to human or human to machine. You can't have a relational query (or any other language) without a type system, even if quite minimal.

Mathematics is an attempt to improve our ability to understand the world by reasoning about it. Natural languages are problematic for this (you can see this by looking at the arguments on this page). Hence mathematicsis a more formal, constrained and precise way of describing and reasoning about things so that we do not spend all of our time mired in the vagaries of natural languages. As observed mathematics is ALSO a language (actually a family of languages sharing certain attributes) and also has types. While the definition of "=" is arbitrary, the CONCEPT of equality is NOT.

An intuitive notion of equality turns out to be a remarkably fragile thing. Consider how many times you have seen, heard or participated in arguments about whether two things are really the same. It turns out that a lot of stuff has to be specified in order to meaningfully determine equality. Part of that "stuff" is a type system that determines the kind of things for which equality can be determined and how to go about determining it. Does 12 = 12? If we understand 12 as symbols representing numbers, then yes. If we see the 12 literally as ink on a page (or pixels on a screen or symbols at a point in a stream of text) then the two instances of 12 might not be "equal". While there is no universal type system, we have to at least temporarily agree on one to give our utterances meaning.

I agree that "types" are generally associated with classifications, and that a classification system may determine what noun can be paired with what verb, etc. However, just about everything can be view as a classification(s) and just about every verb can be reworked to be a noun, etc. We need a way to distinguish patterns of symbols separately from how we may personally think about them in our heads, in part because there are multiple different head models that can be used to produce similar conclusions (predictions) about symbol patterns and code results. Some people seem to believe their head model is the One True model, and are rather insistent about it in an aggressive way. -t

{Yes, I've noticed TopMind appears to be rather "One True model"-insistent about his "side flag" or "side tag" notions. The classification system you're describing is commonly known as a TypeSystem.}

Where did I insist it should be the ONLY model? Your imagination is making stuff up again.

{You imply it by your alternately strenuous and casual dismissal of everything but your "side flag" notion.}
Sorry, I did not intend to "imply" it and am not aware how and where I am doing "it".

[The side flag model, is not even a model. It's equivalent to an apple sticker. An apple sticker is a flag or tag on that apple. This apple sticker or tag is only one physical way of labeling or classifying the apple. The tag itself is one small PHYSICAL detail. A model is not one small physical detail. Type systems are not modeled by tags or flags. Tags or flags may be an implementation detail, a way to label off a structure as a certain type. We can use apple stickers to classify apples, because stickers (or tags) are one way to implement a classification system. People who are learning how to program should never, ever, ever, ever be introduced to unimportant implementation details. Low level details of how something is implemented is NOT theory, and is not a model, and is very dangerous for education. It's like arguing that mysql uses pointers internally for speed, so therefore the relational model must have something to do with pointers. The relational model is nothing to do with how some particular product chooses to implements it.]

As I've said before, the distinction between implementation and model can be blurry or interchangeable. And any communication of a model will require a physical representation because thoughts cannot be directly transferred from one head to another via a USB patch-cord or whatnot. And thoughts also ultimately have a physical representation. Your criticism is thus vague or incomplete. Without a clear definition of "types", it's hard to critique one model against another or distinguish between it and an "implementation". And if somebody wants to model relational using pointers, good for them. I may not wish to use their model myself, but will not say "it is objectively bad/wrong" without something more rigorous. You seem to mistake your personal preferences and thoughts for universal truths.

{How is stating that "a value/variable has a side tag" (where the "side tag" represents the type, presumably) any better than stating that "a value/variable has a type"? Why complicate a simple notion (e.g., that a value has a type) by introducing a "side tag"? I also asked this on the "DedicatedOperatorMeaningless" page, to which you responded enigmatically with "first things first", then nothing.}

"Has a type" doesn't say much about the relationship between the value and what we may call a "type indicator". A "side tag" implies something that is separate from, yet "tags along with" the value. A shirt tag is not really part of the shirt, but stays with the shirt. "Has a type" leaves the door open for bleeding into one another. Plus, side tag also suggests that the indicator is subservient and/or less visible than the value itself. It has a side roll. A "tag" provides a physical metaphor that better matches or mirrors how the tag model works and/or is commonly represented. If you draw the model on a chalk-board (now dry-erase-boards, iBoard in 7 years from now), drawing a tag is the best known way to represent the type indicator of the model. I know it's not necessarily precise or rigorous, but neither is the competition. Plus, it's the name of the model. We could call the model "Susan", but that makes the name less self-describing. If they called Lisp "Nestolist", it would be more self-describing than "Lisp", for example. -t

{You appear to be describing one implementation strategy rather than a general model, apparently justified on the basis that simply stating a value or variable has a type will leave "the door open for bleeding into one another" and that "the indicator is subservient and/or less visible than the value itself." I don't know what you mean by either of those statements. It all seems more complicated than, and yet effectively equivalent to, stating that a value or variable has a type. Likewise, it's simple and intuitive to say a 3 is an integer, or that a creature is a cat, or that I'm wearing a "T" shirt. Why complicate this trivial notion by introducing integer, cat, or shirt tags? Furthermore, it is conceptually more accurate to say that an object is a <insert type here>, rather than an object has a "type indicator."}

Please clarify the last sentence. Many languages make it fairly easy to "ignore" the type (or produce results resembling the ignoring of the type indicator). That tends to weaken the case for is-ness. Side-flag-theory/model can explain fairly "mechanically" how "ignoring" happens, and proposes tests to match the model or sub-model (predictive capability). Saying something "is a" doesn't, at least not in a consistent way. It's too ethereal a statement.

It simply states that everything has a type, even if the type is "anything" or variant, or the type is "an arbitrary collection of bytes", or the type is unknown.

And I disagree it's an implementation for reasons already recently given. What's actually going on under the hood may very different from a chalkboard version of a side-tag model. But as long as our model can be tuned to explain what's actually happening, that's all that matters; and the tag model is usually flexible enough to do that. It offers a prediction framework without actually having to know the real mechanism on the chip/compiler level. That qualifies it as a model. Now you could argue it's too low-level a model to say anything broad or abstract or "general" about "types", but until I see something better come along, I will stand behind the side-tag-model (STM). There are higher-level models, but so far they are too vague or open-ended. STM can be turned into a software model where we can x-ray the simulated objects to see what changes what and when, and it will mirror the inspect-able output or behavior of the actual program running (if set up or tuned right). STM's competitors can't do that yet. A runable model is usually far more useful and testable than a non-runable one. -t

Okay, assuming your "type tag" notion is not about implementation, I can see some limited value in terms of explaining simplistic scenarios involving, say, differences between languages that have a "type tag" associated with variables & values vs having a "type tag" associated only with values. The former, for example, might prevent assigning a new value to a variable if the variable and value's "type tags" differ, whilst the latter may permit any value to be assigned to a variable. However, I'm still not clear how this is any different from simply stating that the former language associates types with variables and values, but the latter only associated types with values.

I think we'd have to clarify how "value", "type", and "variable" are being used here. I'd rather deal with specific instances of languages than try to nail these 3 down in a general sense. Otherwise, we'll get lost in messy debates over these other terms also (value and variable).

Is that relevant at this point? All I'm asking is why, in general, we should prefer statements like this...

"Language <x> has a type tag associated with each variable, which is used to ..."

...to statements like this?

"Language <x> associates each variable with a type, which is used to ..."

You can draw a tag on a whiteboard and anything you put in it is clearly distinct from what is outside the tag. Almost nobody will draw something half-in in the tag. "Associates" can be interpreted or represented in too many different ways, some of them potentially misleading or confusing. If multiple variables/objects referenced the same "type", and you change that thing being referenced, then all those objects/things are referencing a different "type". Most common languages don't work that way and thus it's a poor model. (User-defined types/classes may work that way, but that gets into the distinction between a type and a class, which is probably another topic.)

Sorry, not following you here.

This wiki needs a dry-erase board.

Perhaps an example would be useful. Suppose that you were wondering through a sequence of symbols, and you found this:

DEADBEEF

Now, is this a VALUE, a TYPE or a VARIABLE? -- JeffGrigg

[None of the above. After all, you just said it was a symbol.]

[I'm going to amend my answer. What I said above was from the point of view of the (unspecified) language the symbols are from. From the point of view of someone viewing the list, the symbol DEADBEEF is a value of type symbol. (It will also be of other types as well).]
It's a straw man is what it is. It's like saying that you opened up an English plain text file, and you find a chinese unicode character in it! Is it an error? Is it a symbol? is it a character? Well you were opening up a structured english file and you found an error: a chinese character that you don't know what to do with. Just because you found some crazy chinese unicode character or bizarre symbol, or some random useless word in the file, doesn't mean that automatically all science is now invalid, the theory of evolution is wrong, christianity is now right, types and values are no longer valid, the relational model is now FUBAR, and that Islam is the correct choice of religion. What does the price of rice in china have to do with anything? This is a straw man argument. So what if you found DEADBEEF ? Who cares? Depending on the context, it could be a CONSTANT, it could be an ERROR (parse error). If it was "DEADBEEF = 5" then it was a constant with a value of 5.

{Without knowing how a particular language treats it, it's difficult to classify and difficult to model how the language treats/processes such. -t}

By "sequence of symbols" I mean characters. I'm familiar with lexers, parsers, and the traditional compiler-compiler tools used to parse ("recognize") modern programming languages, and the mathematical foundation behind them.

I believe that it's impossible to successfully do TOP's challenge, as stated. That is, it's impossible to determine, from any arbitrary language's grammar (and lexical analysis expressions, if used), which things are values, types, and variables. The information isn't in the grammar. Grammars match a source program against the grammar rules. But the "meaning" of the rules is determined by the code that processes them, interpreting or compiling (generating code).

So, in general, if you see "<something, something, something> DEADBEEF <something, something, something>", you can't tell if "DEADBEEF" is a value, variable or type just by looking at it. It could be any of the three.

I'm not saying that {value, variable, type} don't exist or are not useful. But I do contend that they are useful conventions that we have adopted, not that they are "universal truths" of anything. -- JeffGrigg

Although the term "convention" is perhaps open to interpretation, you more or less seem to be agreeing with my premise that "types" in "typical" programming languages we use are recognized based on terminology patterns of similar-looking languages. Essentially, humans are using pattern recognition to say what is a "type", and not necessarily an unwavering universal rule. -t

For example, we call a Koala bear's arm an "arm" because it generally resembles human arms in shape and use. There is no universal "rule(s)" or equation that separates arms from non-arms. (Koala bears are fairly distantly related to humans as far as mammals go. They are not even bears either, by the way.) It's comparing and finding similarities, i.e. pattern recognition. I believe a similar thing is being done with "types" and programming languages when humans label parts or artifacts of the languages. -t

By the way, Jeff, would it change your assessment if the rules of the execution of the language were also supplied? Not just the grammar rules. (I'm not sure how such rules would be encoded. It's not a subject I have much familiarity. The compiler/interpreter itself may qualify.) -t

Identifying types, variables and values in a given source code corpus requires knowledge of both that language's grammar and its semantics. There isn't something inherent in text that can be universally and automatically used to identify types, variables and values without knowledge of a given language's grammar and its semantics. However, the concepts of types, variables and values are distinct and unambiguous, even though there are some languages that may seem to blur them. For example, in OO languages that support mutable class instances (e.g., Java, C++, Python, etc.) each possible state of a given class instance is a value; the instance itself is a variable. Or, for example, languages with first-class types may allow type definitions to be used as values that can be assigned to variables, but that does not mean a type is a value or a value is a type.

I do think that there are such things as types. But that they are determined by convention.

To a large extent, they're determined by the CPU hardware (and related circuits) -- manipulating data of certain "types," such as a 32-bit twos-complement signed integer value is efficient (fast) because it's supported directly by the underlying hardware. We find it useful to write software that makes good use of the hardware. And hardware vendors find it useful to build hardware that does a better job of executing the existing software. There's a feedback loop between the two.

Modern CPUs have stacks and efficient ways of accessing data with offsets to the current stack, because most modern programming languages use such things extensively.

We do make "types" that go beyond what is directly supported by the hardware. But even those types retain a fair amount of hardware-related design limitations, such as limited string length. -- JeffGrigg

The canonical primitive types found in most programming languages are certainly reflective of underlying CPU architecture, which in turn was based on obvious computational needs and hardware manufacturing capability. User-defined types and TypefulProgramming, however, have far more to do with meeting user requirements via (among other things) TypeSafety than any limitations imposed by hardware.

I think that in order to automate processes effectively, we find that we need to impose structure upon them and standardize them. For example, to count the number of people living in a city, one has to define what one means by a "person" who is "living in" the city. And if you want meaningful numbers regarding their distribution by race, for example, you'll have to come up with some finite and usable definition of "race" to use.

We've found that grouping representations of data (like name, birth date and race) around objects/records/things to be quite helpful. And later, we've found that grouping procedures (code/methods) with the data it operates on, helpful. -- JeffGrigg

[I'm not sure what you are getting at here. All definitions are arbitrary, so of course what we mean by "person" and "living in" and "race" are governed by convention. But what does that have to do with the surrounding discussion.]

Because "person" is a type. And so is "race." We decide what "people are living in a place" somewhat arbitrarily. And we define "race" fairly arbitrarily, depending heavily on how we chose to look at things, and maybe what we are trying to accomplish. So yes, there are "types," and they are quite useful and necessary. But what I'm saying is that any attempt to come up with an absolute objective independent concept of type will fail. We make up most of the classifications we use. None of them are "true." But many of them are useful. So we use them.

So in spite of it being impossible to prove an absolute "truth" to types, we'll keep using them. Because they're useful. -- JeffGrigg

{A somewhat fuzzy yet UsefulLie. -t}

[I think I get what you were saying Jeff, the boundaries of each race is what's arbitrary. I think you are making a HastyGeneralization here. Just because some types are fuzzy doesn't make all types fuzzy. For example, prime is a type, there is absolutely no fuzziness to whether or not an integer is a prime.]

{If all categories are types, and almost everything can be viewed in terms of categories (which is true), then the definition of "type" is watered down too much to be of practical use. -t}

Maybe in some branch of philosophy, perhaps. Not in ComputerScience.

{If it's clear-cut, nobody's listed the rules/formula yet. That's what triggered this topic to being with.}

It's a set of definitions (some of which appear on this page), and at a theoretical level, the essence of a branch of ComputerScience called TypeTheory. There is no set of rules, any more than there is a set of rules that will identify a cat. However, much as we can use accepted definitions to recognise a creature as a cat, we can use accepted definitions to recognise types. We can even go further and create TypeSystems that will allow us to manipulate types -- in a rigorous, predictable, and rule-driven way -- in computer languages. In that context, as programming language designers, we decide what "types" can be.

By the way, please stop conflating your personal non-acceptance of popular definitions with some -- emphatically, non-existent -- general debate about "what is a type" in ComputerScience. The problem here is yours, and yours alone.

We can indeed make rules that determine cats from non-cats, and then refine them when we find a flaw in our rule set. As far as "making a type system", just because you call your model/machine a "type system" doesn't automatically make it all "types". I can call my dick a "type system", but that act doesn't necessarily make it so. (And please, no jokes about "smallInt".)

[OK; we'll talk about "tiny int." >;-> ]

I'd be interested to see how your rules to distinguish cats from non-cats differ, qualitatively, from the definition of "type" presented on this page or any recognised definition of "type".

There use to a BASIC game floating around, with source, which asks various questions and then identifies the animal you had in mind. The questions resemble, "Does it have pointy ears?", "Does it have whiskers?", etc. It's over-simplistic for industrial use, but illustrates that classification can be encoded as an algorithm. A fancier version would probably use probabilities since a given specimen may lack a feature due to an accident or birth defect.

You can play exactly the same game with types. Indeed, I have seen it used to teach children about different types of numbers!

That's categorizing existing and known domain nouns. That's not the challenge.

In a general sense, you can identify types using the various algorithmic and statistical techniques collectively known as cluster analysis. I doubt these would be applicable to programming languages, however, at least not without considerable manual assistance. The reasons why programming languages defy simplistic automated analysis have already been well documented on this page; no point in repeating them.
I already agreed that we can use "pattern matching" to say that X resembles Y, and since we call Y "types", we call X "types" also. But that's far from the ideal. You haven't proven it can't be done, only that you don't know how.
////Content here was lost in a DeleteTantrum.//// {continued below}

[There's a point to categorizing things that don't exist?]

You'd think that something as simple as "sex -- male/female" would be simple and objective. But the history of the Olympic games has shown even this obviously "two-state" piece of data is much more complex than you might think, when you really take it seriously. http://en.wikipedia.org/wiki/Gender_verification_in_sports

This kind of thing does lead me to think that much of what we do is UsefulLie and AllAbstractionsLie. But that the abstractions we use are useful -- in that they save money and get the job done -- turns out to be a good thing. And sufficient. As for deep philosophical discussions about The Ultimate Truth... Well, they're great and useful. But not as a driving force for how we should develop software. -- JeffGrigg

{It became an issue because certain people aggressively insisted they know the "right" definition of "types" and that other definitions and/or models are "wrong". -t}

See DedicatedOperatorMeaningless regarding associating operators with types.

Unfortunately, due to a DeleteTantrum by GrammarVandal, some content has been lost. I've undeleted this page and made a concerted effort to retrieve it, but PageHistory sometimes has holes. Alas.

Anyway, the following text is found above:

...

That's categorizing existing and known domain nouns. That's not the challenge.

In a general sense, you can identify types using the various algorithmic and statistical techniques collectively known as cluster analysis. I doubt these would be applicable to programming languages, however, at least not without considerable manual assistance. The reasons why programming languages defy simplistic automated analysis have already been well documented on this page; no point in repeating them.
I already agreed that we can use "pattern matching" to say that X resembles Y, and since we call Y "types", we call X "types" also. But that's far from the ideal. You haven't proven it can't be done, only that you don't know how.

...

I responded to the effect that I do know how to identify types, but that applying cluster analysis to source code would require considerable manual intervention -- primarily to put it in a form suitable for the various types of cluster analysis -- but there is no automated way to identify types given just a language grammar and source code samples, because the semantics of the language need to be identified.

There were more responses, but they're gone now. My last bullet point suggested a more easily-solved, and perhaps less contentious challenge. However, the challenge is for Top: Top, since you created this page in response to my statement that "values, types and variables/attributes are categorically distinct", please give an example of a language where values, types and variables/attributes are not categorically distinct.

First, we'd have to define all three of those clearly and come to a mutual agreement with the definition. Otherwise, different people will call different things different names/categories.

And I have asked before, would it change the nature of the challenge if a description of what the language/compiler/interpreter "does" is given? Such as a reference implementation?

I do not believe humans use rigorous/clear/consistent rules to determine what they call "types" or something else. Instead it's based on tradition and similarities (pattern recognition). If they did, they should encode such rules in a clear format if they insist their determination technique is the One True Way. -t

CategoryTypingDebate

FebruaryTwelve AprilTwelve