Dedicated Operator Meaningless

Continued from TopsTypeDeterminatorChallenge

Some of the above talk implies a dedicated operator "per type". However there are at least two different ways to think about and/or implement association of operator to "type", and both are valid models, at least for "built-in" types. One is OOP-style polymorphism where each "type" has its own version of a given operator (same name, different method), such as "add" or "print". The other is the old-school case or IF statement where there is a single name (operator) and the behavior branches or changes depending on the item's "type".

 // Pseudo-code for internal implementation of PRINT operation on 
 // a single scalar (variable or constant).
 function printScalar(theVariable) {
   var theType = extractType(theVariable);  // get type tag
   var theValue = extractValue(theVariable);
   //
   if (theType=='int' || theType=='date') {
     byte1 = getByte1(theValue);
     byte2 = getByte2(theValue);
     intVal = getIntValue(byte1, byte2);
     if (theType=='int') {
       return (doSometingInt(intVal));
     }
     return(dateSeparate(intVal));  // split parts (date stored in RAM as an Int)
   } elseif (theType=='float') {
     byte1 = getByte1(theValue);
     byte2 = getByte2(theValue);
     byte3 = getByte3(theValue);
     byte4 = getByte4(theValue);
     return(doSomethingFloat(byte1, byte2, byte3, byte4));
   } elseif (theType=='string) {
     // etc...
   }
 }

Discarding for now which is "better" for future expansion, such as user defined types, there is no reason to say one model is more valid than the other to explain and/or define what is "happening with types". You cannot objectively say each type "has its own operators" because it doesn't have to be modeled that way (in one's head) nor implemented that way. They both explain what the program "does".

--top

Remember that the definition from TopsTypeDeterminatorChallenge is "Type = a (relatively) invariant set of values and zero or more associated operators." My choice of the word "associated" was deliberate, and it means something quite different from "dedicated".

Note the following code snippet from your printScalar() example above:

   } elseif (theType=='float') {
     byte1 = getByte1(theValue);
     byte2 = getByte2(theValue);
     byte3 = getByte3(theValue);
     byte4 = getByte4(theValue);
     return(doSomethingFloat(byte1, byte2, byte3, byte4));
   }

In the absence of a 'float' type, the block between curly braces would never be called. In the presence of a 'float' type, it might be called. That means printScalar() is associated with a 'float' type. That relationship is indubitable -- the code says so.

"printScalar" is something you imagined. It's not in the code. And the shown grouping is one of many possible implementations.
Call it whatever you like. I got the name from the first line after the comments. It could also be called "doSomethingFloat" or "foobulate". The grouping matters not. What matters is that there's a reference to 'float'.

The OOP tendency to syntactically group methods inside classes represents an explicit mechanism to implement the general notion that types have associated operators. The former is sufficient to implement the latter (for user-defined types) but it is not necessary. Any explicit source-code reference to a type is, by definition, associated with that type.

"Associated with" in a general sense, sure. But that doesn't say anything definitive about the relationship other than it exists. Applying any further details or limits seems to assume a specific implementation or language.

Yes! Exactly! That's why "associated" is appropriate, accurate, and correct.

Further, does the type "go away" if we delete a routine that used to change behavior/output based on it? For example, what if I have a "string" and a "label" type tag. The one routine that used to make a distinction between the two fell out of use and thus was deleted. "Label" may still suggest a usage to some users, but otherwise is now treated exactly like a "string" in the current software version. One could argue that peoples' heads are still making use of it and thus "operating" on it, but many other things that we don't normally call "types" can affect behavior also. "Affect behavior" is rather broad.

A type goes away if you delete every reference to it, i.e., if there are no associations with it. If there are no associations with it, it isn't used or usable, and is thus effectively non-existent.

I disagree that it's not a "type" but the others are. That appears to be your personal assessment.

Sorry, not following you. What "others" do you mean? If there are no associations with a type, it's not referenced. If it's not referenced, it's effectively not there because no code uses it.

Other "tags" that may be referenced in IF statements etc. Humans may still use it if it's say printed out. In the case given it may be used more like a comment. Whether somebody chooses to make the software behave differently based on such may change. It's information. Sometimes we use it and sometimes we don't. To pivot "typeness" on whether it happens to be used somewhere in software is rather arbitrary in my opinion.

If you're printing out a label associated with a type, then you have one implicit operator that returns the right "type" label for a value of your type, and you have a second operator that allows you to select values of that type (you must, or you wouldn't have been able to get a value to print out in the first place). Whilst until now I've avoided complicating the issue by pointing out that your "type-tag" is, in fact, an operator associated with a type that returns a label for a value of that type, that's exactly what it is. So, you have two operators associated with the type. If you have no operators associated with the type, then nothing references the type. That means it's effectively not there. You might argue that you still have some source code somewhere that defines the structure of values of that type. However, if that type is not referenced, the code is as good as a set of comments. The type is effectively not there. This is not arbitrary; it's simple logic and common sense.

I didn't say there had to be a lookup. Perhaps the type tag is printed out as the bytes it actually has: "label". Plus, why would a JOIN suddenly turn it into a "type"? And why is a tag an "operator"? You are pre-supposing a behavior-oriented paradigm.
I didn't say there had to be a lookup, either. Don't think about implementation. Look purely at the conceptual structure: You have a float type. Given any value of float type, you can obtain the string "float". Thus, by definition, you have an operator that given a value of float type returns the string "float". In the absence of such an operator, you would have to say this: Given a value of float type, you cannot obtain its type name as a string.
How is that different than extracting the value from a variable? Do we call that an "operator" also?
Of course. It is. In some languages, like Forth, it's always explicit -- it's the @ operator or "word" in Forth parlance.
Some langs may make getting/copying side-tag info easy, others hard. Languages may induce more rules and limits on the tag "value" than the "regular" value, but that's purely a language designer's choice. It doesn't say anything definitive about "types". In "loose" languages, it could potentially be used just like the value: variables could be viewed as bicameral, almost like an array with two cells.
Except, of course, in those languages with untyped variables. Beware of any definition of "types" that varies with the language. What's a "loose" language?
Do you mean languages without side-tags, that mostly ignore side-tags, or that allow empty or null tags? (Or behave equivalently to the stated) "Untyped" means you can detect lack of "type-ness". How?
I mean languages with variables that don't have a type. Languages in which I can do this without error:

       var x = 3.4;
       x = 2;
       x = 'p';
       x = new Spleen();
       x = "fish";

That's not necessarily "without a type", that could also be dynamic typing, with the type changing upon each assignment.
Indeed. I should have used the phrase "variables that don't have a specific type" and pointed out that "untyped variable" is often shorthand for "a variable of any type" or "a variable of variant type" or "a variable of a union type consisting of all available types" or "a variable of type 'top' or 'alpha'", etc.

Further, attributes, say database record columns with flags/tags, may have similar usage patterns, and the distinction between "type" and "attribute" appears to be fuzzy. In other words, when is a column a "type" indicator and when is it an "attribute"?

{A column in a database has a name (i.e. city) along with a type...why is that fuzzy? The name of the column is similar to named items in a C struct. A C struct has items that contain both a name, and a type. If a column has a type indicator (if that's what you want to call it) it doesn't mean that type indicator implements a proper type system. i.e. if you visually mark off the column as "boolean" but then allow integers and strings inside the cells, the "type indicator" is just a lying indicator and not a real type system. I believe SqLite has something like this which is considered a feature by some, and a bug by hardcore programmers and theorists. I don't see how it is fuzzy whether or not a column name is the same or similar to the column type: for example if the column name is "city" and has a type "string" with a maximum of 100 characters, how is that fuzzy? What is problematic is the computer industry and product market making things fuzzy when they need not be fuzzy, and purposely confusing matters for I don't know what gain. }

Indeed. An attribute, in database parlance, is simply a type and a label.

Sorry, I meant a "domain type" not a column's database type. For example, let's say we have "employee types" of "worker", "manager", and "executive". (This may be an oversimplification for illustration simplicity.) One could also call that an "attribute". One could also divide up employees by retirement plan, say Plan-A, Plan-B, and Plan-C.

A domain type is a type. Don't make the common mistake of thinking the only alternatives for, say, a RetirementPlanType are either an unpleasantly-fixed enumeration of retirement plans, or an overly-general string attribute with a lookup table. With a suitable TypeSystem, you can compose EmployeeType and RetirementPlanType from a string type, and obtain specific employee types or retirement plans from a lookup table. The assumption is that their "stringness" is invariant, but the list of employee types and retirement plans is (relatively) dynamic. Thus, we gain TypeSafety in that we can't accidentally join an employee type to a retirement plan, but we can still perform the usual string manipulations and we can easily add or remove employee types and retirement plans.

For example, in TutorialDee/RelProject:

 TYPE EmployeeType POSSREP {value CHAR};
 TYPE RetirementPlanType POSSREP {value CHAR};

 VAR EmployeeTypes REAL RELATION {etype EmployeeType} KEY {etype};
 EmployeeTypes := RELATION {
   TUPLE {etype EmployeeType('worker')},
   TUPLE {etype EmployeeType('manager')},
   TUPLE {etype EmployeeType('executive')}
 };

 VAR RetirementPlanTypes REAL RELATION {rtype RetirementPlanType} KEY {rtype};
 RetirementPlanTypes := RELATION {
   TUPLE {rtype RetirementPlanType('PLAN-A')},
   TUPLE {rtype RetirementPlanType('PLAN-B')},
   TUPLE {rtype RetirementPlanType('PLAN-C')}
 };

 VAR Employees REAL RELATION {id INTEGER, name CHAR, etype EmployeeType, rtype RetirementPlanType} KEY {id};
 CONSTRAINT employee_etype_fkey EmployeeTypes >= Employees;
 CONSTRAINT employee_rtype_fkey RetirementPlanTypes >= Employees;

 INSERT Employees RELATION {
    TUPLE {id 1, name 'Dave', etype EmployeeType('worker'), rtype RetirementPlanType('PLAN-C')}
 };

I'm not sure what your point is. I'm not asking how to make programming "safer" here. And, that's a specific implementation. We could make colors and car models into these kinds of things or just about anything. What does that tell us? It's just an implementation choice.

It shows how types can be implemented which do not devolve into a "type vs. attribute" dichotomy.

So, then you haven't shown that there is a hard line between them.

There is, though they may be used together. The set of values described by EmployeeType is precisely the same as the set of values of CHAR, but it has a different set of operators. The selection of specific EmployeeTypes, however, is constrained by the database, not the type.

One can view that as simply an implementation detail. A super-flexible database could let the character sets be defined in tables instead of hard-wired into the interpreter/compiler/engine. (I'm not commenting on the practicality of it here.)

Yes, you can do that.

So, what's the diff between attributes, look-up tables, and types? (And please be careful about "being" a look-up table versus using a table as a look-up table.) We can define the retirement plans as a language "type" or as an entity represented as table(s).

{Well a similar question can be asked: what is the difference between a C struct versus a table? A C struct is a small table with one row, but it is a limited table that you can't do much with compared to a proper table. You can't run queries on a C struct like you can a proper table. So yes there is overlap between the two, and you could have difficulty drawing the line. Probably there is some performance hits with using a full fledged table instead of using a simpler option like a struct. There are also overlaps between associative arrays and tables too. Isn't an associative array just a poor mans table without all the benefits and features of a real table? I guess it boils down to using the right tool for the right job. You can't really draw a clear line between a table and a struct because a struct can be represented as a table, and a table can be represented by an array of structs. It reminds me of the "TuringComplete" mind game. However one tool makes programming easier in certain situations than the other tool. People have been using C structs for years to emulate OOP, and people have been using arrays for years to emulate tables, and people have been using tables to emulate ranges, and vice versa. In a regex, the range "a..Z" is a terse way of expressing a character range. Instead of a range you could have populated a database table with all characters from a..Z, and replaced "a..Z" regex language with some kind of query language. The query however wouldn't be as terse as the range syntax.}

That still doesn't help answer the question of what is a "type". This discussion is not really about the practical merits of different techniques with similar properties. There are indeed multiple ways to achieve the same thing.

It's quite clearly established what a "type" is. There are various recognised, compatible, and established definitions. One was presented on the page that spawned this one, TopsTypeDeterminatorChallenge. You appear to reject recognised definitions in favour of a personal definition that is neither recognised nor established, and you haven't successfully deprecated the established definitions or effectively promoted your definition. Please, consider the possibility that your unwillingness to accept established definitions is a foible of your own, rather than an indication of problems with established definitions.

It's a crappy, vague def still. And I've seen no evidence that it has popular support.

{OK. I added a definition of what a type is to WhatAreTypes. The essence is: Properties of programs.}

Note that the first bullet is a slightly less-formal version of the "Type = a (relatively) invariant set of values and zero or more associated operators" definition presented on TopsTypeDeterminatorChallenge. I think Wikipedia counts as "popular support", and is perhaps the best test of popular support that we have.

Re: "A data type is a classification identifying one of various types of data [...], that determines the possible values for that type; the operations that can be done on values of that type;[...]"

That's fairly close to the 3rd proposed side-tag def at TopsTypeDeterminatorChallenge. However, this doesn't say "where" the classification is; it could be in a desk drawer in a cave in Timbuktu under that. It lacks association. "Side tag" better describes how it relates to the programming objects/features in that it is associated with the features, and brings up visions of shirt tags and mattress tags and/or motorcycle side-cars to imply an association with a primary or target object. Thus, I find it a more informative definition even if it has "embellished visual artifacts". But, we are getting closer to each other's. That's progress. --top

I'm not clear why you need to introduce a "side-tag" in order to specify association. Isn't it enough to say that, for example, a value has a type? That's as "where" as needed for any purpose, without unduly tainting the concept with flavours of implementation. It's simpler, too.

There's a problem with it, but I can't quite put my finger on it yet. And it's not just "values". Functions, constants, etc. can "have" a type.

Sure. Constants, variables, whatever, can "have" or be associated with types. Again, I'm not sure why "side-tags" need to be introduced into the simple notion that some <x> is of a given type.

Values can also affect "possible values and operations that can be done". We need a more formal definition of "value". I agree that there is a "tradition" in the way things are partitioned into "type" indicators/trackers/limiters and into "values", but I cannot state a clear-cut pattern without referencing tradition so far. One can do branching and computations using "types" also (depending only language), as if values and types are potentially interchangeable. I suspect tradition is clouding our ability to state clear rules that distinguish.

What makes you think values and types are interchangeable? "Value" is generally well-understood to be an immutable instance of a type, i.e., an individual constant from the set of constants described by a type. For example, 3 is a value of type integer.

An example of interchangeable is using using type "Integer" to mean True and type "String" to mean False. This is perfectly possible in many dynamic languages. (Recommended, probably not.) I bet many languages would pass TuringComplete muster using type tags/behavior alone. And where is this "immutability" happening? How can one objectively verify it exists in a programming language? I understand how it works in math, but that's just a head model, and again not necessarily the only head model one can use to make predictions about program behavior. I agree that the "type" is often used to modify a program's handling of the value portion, but the reverse can also be true in dynamic languages.

Could you provide an example of using type "Integer" to mean True and type "String" to mean False? What is a "head model"?

 if typeOf(x) = "integer" then
    print("The ship arrived")
 else
    print("The ship is not in yet")
 end if

Clearly, 'x' is a variable which may contain values of integer type (or, perhaps more accurately, the value in 'x' belongs to a type that is associated with an operator typeOf that returns the string "integer" when given a value of that type). I don't see any confusion between type, value, or variable here.

Because we recognize the vocabulary and style of Algo-influenced languages. In other words, tradition. But the same may not be true of an alien language. And I am using the "type" AS-A value here. It is serving the same purpose we normally associate with values. If the language is dynamic enough, we can define types that "look like" values and values like types and use them perhaps completely interchangeably. Do you agree?
You can certainly have a language where type definitions (but not types themselves) are values that can be assigned to variables, but that doesn't blur the distinction between types, variables, and values, nor does it confuse them. Type, variable, and value are still conceptually distinct. Your example is not using the type as a value, it's using an operator to obtain a string, given a value. Presumably, that string denotes the type, but it isn't the type itself. The string "integer" isn't a type, nor does typeOf(x) return a type; typeOf(x) returns a string, and a string is a value of 'string' type.
- The example was kept simple, but could have done much much more. What do you mean "use as a value"? How does one identify and codify such is happening other than "I know it when I see it"? I agree there are certain things we tend to do with values versus what we call "types", but "tend to" is generally not good enough for technical definitions.
- "I know it when I see it" is perhaps the essence of it, but that's not a casual or arbitrary assessment. The syntax of the language implies certain semantics based on similar-appearing languages. When I see "integer", I see a value of type string. When I see a 3, I see a value of type integer. What do you see? Perhaps if you could provide an example of a language where a type is a value, or a variable is a type, etc., then I could appreciate your point.
- Okay, but that's back to "tradition and common patterns", which is more "notiony" than we'd like. As far as a sample language like such, it would be an interesting MentalMasturbation. I'll think about drafting something up, but not today.
- I'm not clear how you can escape "tradition and common patterns" without some clear specification of how you are deviating from "tradition and common patterns". Otherwise, I calls 'em as I sees 'em. When I see "integer", I see a value of type string unless given a clear reason to do otherwise. It's unreasonable to deprecate "tradition and common patterns" without just cause. They're what allows us to recognise, for example, "2 + 2 = 4" as an indubitable truth about certain integers. Recognition of such a formula is based on "tradition and common patterns" surrounding simple arithmetic, numeric digits, and the '+' and '=' symbols. Rejection of such "tradition and common patterns" requires strong justification. Likewise for recognition of types, values, variables and constants.
- I am not rejecting arithmetic. This is about definitions and identification, not how to do work. Arithmetic has certain patterns, idioms, and vocabulary such that when we see those patterns again in a new language, we apply the same labels based on pattern recognition. But perhaps we should get away from that and assume a language only has user-defined types so that we don't pre-bias our analysis with vocab from a specific established domain (arithmetic). Or at least focus only on user-defined types for now.
- I intended arithmetic to be an analogy. I.e., programming languages also "have certain patterns, idioms, and vocabulary such that when we see those patterns again in a new language, we apply the same labels based on pattern recognition." That's why I recognise "integer" as a value of string type, 3 as a value of integer type, and so on. I'm not sure how user-defined types would be any different, beyond noting that to recognise <value> as a value of <user-defined> type, there must be a definition of <user-defined> type somewhere.
You still haven't made it clear why you feel "side-tags" need to be introduced into the simple notion that some <x> is of a given type.
- First things first.
- ?

A head model is the mental model one uses to grok and predict program and/or language behavior.

You mean a "mental model" or "concept"? Why the special terminology?

Immutability is a defining characteristic of a value. It distinguishes a value from a variable. A thing that is mutable is, by definition, a variable.

I'm not sure how this allows one to distinguish between values and types by looking at languages and/or source code.

Without any knowledge of semantics, merely looking at languages and/or source code is meaningless.

As a practical note, if you hard-wire the "lists" of "type instances" into your D code, then only programmers and DBA's can add new employee levels and retirement plans. You'd probably want it done via a set of CRUD screens.

Of course. The above is an example, not a ready-to-use application.

CategoryTypingDebate

FebruaryTwelve