Type Handling Grid

Based on a discussion in SignaturesAndSoftPolymorphism.

To document both hard and soft polymorphism, we can use what I shall call "type handling grids" (THG) to analyze dual operand overloaded operators such as "+" as found in some popular dynamic languages.

Each axis in our examples use these codes:

 T R (T = type indicator, R = "representation"[1])
 - -
 S S = Type indicator is string, representation is a string
 S N = Type indicator is string, representation can be interpreted as a number (parsed as)
 N N = type indicator is a number, representation is a number
Notes: "N S" typically is not permitted via syntax rules etc. Also, more than two types can be considered, but we'll limit our examples to two for now to make digestion easier.

The left axis is the left operand and the top axis is the right operand.

Here's a hypothetical THG for a language with narrow rules for when it considers "+" to be numeric. This matches JavaScript in my tests.

 ** S S N
 ** S N N
 SS . . .
 SN . . .
 NN . . #
 // # = numeric result, period = string result, E = error thrown
 // (Asterisks are wiki-spacing place-holders, ignore them.)
Here's a hypothetical THG for a language that uses only the type indicator of the first operand. If the 2nd operand doesn't also have a numeric type indicator, an error is thrown.

 ** S S N
 ** S N N
 SS . . .
 SN . . .
 NN E E #
 // # = numeric result, period = string result, E = error thrown
This hypothetical THG "tries hard" to interpret "+" as numeric. Essentially, if both operands can be interpreted as (convertible to) numeric one way or another, then the "+" expression is assumed to be numeric.

 ** S S N
 ** S N N
 SS . . .
 SN . # #
 NN . # #
My apologies for the cheap AsciiArt grids.

--top


Discussion

How would the above handle types other than 'S' and 'N'?

I know of no popular imperative programming language that uses this approach, nor can I imagine any reason why one would. It's not particularly illustrative, either. See the "Operator Invocation" section on TypeSystemCategoriesInImperativeLanguages for how most programming languages handle operator dispatch.

That doesn't directly cover the possibility of soft polymorphism, as discussed in SignaturesAndSoftPolymorphism. It may be fairly rare, but that doesn't mean we should discount it outright. Maybe future scripting languages will care less about machine speed and include more softness.

When there is a popular imperative programming language that implements "soft polymorphism", it will be trivial to add it. Until then, I shall leave it out -- along with all the other obscure and unpopular operator invocation mechanisms that I've chosen not to include.

We don't have to focus just on the popular languages. Like I said elsewhere, domain-specific tools sometimes come bundled with a scripting language such that one may not have the practical option of switching to a popular language. See PageAnchor mucked-381 in ValueExistenceProof. If you want to ONLY model the popular languages, be my guest. But that shouldn't stop somebody from modeling roads less traveled. The wiki town is big enough for the both of us, so put away your guns, hombre.

True, we don't have to focus just on the popular languages. However, how do we decide what "less traveled" roads to model or not to model? There are innumerable obscure and unpopular operator invocation mechanisms that could be modelled. Why model one over the other?

Over? I have no problem with somebody proposing a model of relatively obscure features, as long as the scope and frequency are identified. It may also spark ideas for future language designers (although overloading in dynamic languages is usually a bad idea in general in my opinion).

I have no objection to modelling relatively obscure features either. I'm not sure its of benefit, however, to include relatively obscure features in with common features -- say, in TypeSystemCategoriesInImperativeLanguages's "Operator Invocation" section -- when the purpose is simply to explain common features.

I've encountered a fair number of relatively obscure dynamic languages. I'd estimate roughly 1/3 of all lines of code in my career were not in the top 50 languages. In the work world you often have to use what's available instead of what you want such that obscure languages will be encountered, and fairly often because of products being bundled/integrated with app-specific scripting languages. Thus, the presumption that focusing on the say top 50 will cover a "vast majority" may be a faulty. The distribution curve may be rather flat rather than bunched up toward the popular end; more like "A" instead of "B" below. -t

 Distribution Example A:
 *
 *
 **
 **
 ***
 ***
 ****
 ****
 *****
 *****
 ****** 
 *******
 .
 Distribution Example B
 *
 *
 *
 *
 *
 **
 ****
 ******* 
 ************
How does that help decide what obscure features to include, and what obscure features to exclude? I think if a document is going to describe features found in popular imperative programming languages, it makes sense to exclude features not found in popular imperative programming languages. Predicate dispatch, for example, is certainly interesting (and it effectively generalises operator dispatch) but until it's commonplace (though Clojure is making it come pretty close), I'm not sure anything is gained by describing it in a document on common mechanisms for operator dispatch. The problem with including obscure languages is that they lack any consistent commonality, so that including a feature found in languages used by Programmer A is unlikely to be relevant to Programmer B, who is probably using some other non-intersecting set of obscure languages.

If one encounters a new or "oddball" language, then a TypeHandlingGrid may simplify experiments to better document or model its behavior. One can run tests and fill in the grid. It can then be compared to other binary (dual operand) operators to see if the pattern is language-wide, or merely operator specific. Everybody has their favorite note-taking techniques, but a grid is often more compact and makes it easier to visually spot patterns than sequential notes. The grid is just one suggested tool, and nobody is forced to use it. (I won't demand royalties if you use it; see how nice I am? ;-) -t

Note that the TiobeIndex says the following:

   "The following list of languages denotes #51 to #100. Since the
   differences are relatively small, the programming languages are 
   only listed (in alphabetical order)." [11/29/2013]
This does suggest a relatively flat curve ("A" above) for languages in the mid-range of popularity. For flat curves, the probability of actually using a semi-obscure language is fairly high. If you picture the density (length) of the graph bars to reflect the probability of landing on them, then it should be clear that in the work world, one is fairly likely to end up using a semi-obscure language. I am assuming that language choice is usually not the programmer's choice, which reflects my experience. It is true that one can get certificates in a favorite language to increase the chance you'll be hired for it, but for one it's usually no guarantee you'll be hired for that language; and second, often one ends up using multiple languages in a given shop such that even if your education/certificates steer you into a particular language, you'll often be asked to babysit existing programs or product-bundled languages outside of your target language. -t

(My diagrams are upside-down per Tiobe. ToDo: fix the puppies.)

One is fairly likely to encounter scripting languages embedded with various specialized tools that would otherwise be "obscure" languages. Thus, focusing on the top 50 or so is not guaranteed to cover a sufficient proportion of languages one will encounter in the field. For example, I've used a communications tool that had such (in the dial-up modem days), and telephony system in another case, as in "Press 1 to go [bleep] yourself in English, press 2 to go [bleep] yourself in Spanish, or stay on the line to be [bleep] over by a live operator."

Is your irrelevant cursing actually necessary? [I've since cleaned up slightly. -t]

As noted above, the variety of obscure scripting languages -- though that variety is rapidly diminishing in favour of embedded scripting typically based on Lisp, Lua, Python or Javascript -- means a "TypeHandlingGrid" is unlikely to deliver any value that the language manual (and the more obscure the language, the more likely there is to be one) doesn't.

As described elsewhere, I find most language manuals poorly written when it comes to dynamic types. I know you disagree. Let's not reinvent that debate here.


(From ValueExistenceProofFour)

If you're going to include unpopular languages, why not incorporate (say) predicate dispatch, instead of something obscure like "soft polymorphism"?

Predicate dispatch is not typically what the above-described product-embedded scripting languages use (outside of SQL). Product-embedded scripting languages tend to stay close to the AlgolFamily conventions to reduce the learning curve of product users. Plus, we are limiting these topics to AlgolFamily-like dynamic languages to (attempt to) keep the discussion more focused. If you wish to discuss non-AF languages in new topics, that's fine, but I may not be personally interested in participating.

Ok, that's a good reason to exclude PredicateDispatch. Why don't we include MultipleDispatch, then? It's found in AlgolFamily languages like DylanLanguage and TutorialDee.

The issue was modelling types, scalars in particular (as a start), not OOP. There are a lot of potential directions to expand into if one wants to "branch out" their model. We have enough problems agreeing on the simple stuff for now, why pour gasoline on the propane fire?

MultipleDispatch is not limited to OOP.

It's a vague term.

Which one? OOP, MultipleDispatch, or "limited"? If it's OOP, that's fine, MultipleDispatch isn't limited to OOP per se. If it's MultipleDispatch, you're wrong, it's not a vague term. If it's "limited" (or "is", "not", or "to"), I can't help you. I simply see no reason to include "soft polymorphism" but exclude MultipleDispatch, or any of a dozen equally valid but obscure type-related and/or dispatch-related mechanisms.

"Equally obscure" is purely a guess on your part. Granted, its frequency is also a guess on my part but I'll stick with my approach because it's more general purpose than type-signature polymorphism. The IF statements for a given modeled operator can have anything they want in them. (The model user can choose also other model dispatching code techniques as they see fit.) As far as the meaning of "multiple dispatch" as a phrase, I don't want to get into another definition fight today and will skip it for now. You read English and "see" far far different things in it than I do.

I didn't write "equally obscure", I wrote "equally valid but obscure". Considering the fact that the only references anywhere to "soft polymorphism" are this page and SignaturesAndSoftPolymorphism, it's considerably more obscure than MultipleDispatch. That means there's even less reason to include "soft polymorphism" in a reference on operator dispatch, types, or anything else.

My apologies regarding "equally obscure". In that case, I don't know what your point is then regarding the usage of "obscure". And "soft polymorphism" is an internal working name. I'm pretty sure that was pointed out somewhere. Looking globally (public) for an internal working term is of course likely to be futile. And I don't know if it's obscure. Nobody's done a (known) reliable survey. Not you, not me, not Mike. Your guess is based on a very limited sample size. Why are you so full of certainty using such a small sample size? That's a typical symptom of stubbornness. -t

You have argued that we should explicitly include "soft polymorphism" in language models because some non-popular programming languages might use it. I argue that until you identify a language that uses it, there's no point in including it in language models over and above other relatively-obscure dispatch mechanisms like MultipleDispatch, which is demonstrably used in some fairly well-known non-popular programming languages.

Your model is based on your best guess, and my model is based on my best guess. Unless you have solid evidence to replace the guess, why complain about guesses different than yours? It makes no sense to me why you insist your guess is more accurate. You are not special. I say LetTheReaderDecide which guesses they prefer (based on their own circumstances and experience). What the hell is wrong with that? Why quibble over guess differences? They are just guesses. It's not like there is "guess math" to apply rigor to pick one over the other. (At least you haven't used any.)

My model is not based on "my best guess", but from formal study of language semantics and empirical study of language implementation internals.

You just claim that; you haven't shown a paper trail, such as quotations and citations. I'm not going to "just trust" you, especially given your odd way of interpreting English.

I've given numerous quotations and citations showing references that accord with my interpretations. That you don't agree with them is not my problem. My "odd way of interpreting English" seems to accord with that of both SoftwareEngineers and ComputerScientists. If it doesn't accord with language used by laypersons, that's not my concern.

I disagree that your references given so far uniquely support your given interpretation. I honestly believe you inject your personal mind models into your interpretation of English descriptions to "see what you want to see". Your "translations" appear highly forced to me. Your preconceived notions are funneling any ambiguity toward your personal interpretation preference. I know you will disagree; no need to state that. I call it as I see it. We'll just have to LetTheReaderDecide who's interpretation of the English they prefer. Also note that I'm not claiming such descriptions are "not in accord" with your model. They often seem to "accord" with both because the English actually used is vague/ambiguous. They don't necessarily contradict your model, they just don't uniquely support it.

In particular, they don't clearly constrain the design of the data structures used to implement/model "variables", "types", and "values" other than perhaps dictating that some computable relationship between them exist for some language features. And that "some" in "some computable relationship" is very general. --top

Overloading a single data structure -- as you do for both values and variables -- is invariably awkward. There's no tenet of programming that would recommend such an approach, and plenty of reasons not to do it. It represents StampCoupling, for example.

We already discussed StampCoupling and "awkward". I disagreed with your assessments. Anyhow, even if it were "awkward" design in terms of software engineering, for the sake of argument, that's a different issue. The documents you cited were not about "designing software well" in terms of maintenance, grokking etc. The issue is "required" properties of said structures. Show me the interpreter-related documents that clearly dictate that such structures "must have property X and must not have property Y" in a general sense. I don't want personal interpretations, I want clear rules, or at least a clear chain that leads to such rules unambiguously. Design choices from a software engineering perspective are a secondary concern.


Footnotes

[1] I used "representation" to avoid triggering a term debate here, not because I agree with its usage. -t


CategoryLanguageTyping, CategoryTypingDebate, CategoryEvidence



EditText of this page (last edited February 6, 2014) or FindPage with title or text search