Type Handling Grid

Based on a discussion in SignaturesAndSoftPolymorphism.

To document both hard and soft polymorphism, we can use what I shall call "type handling grids" (THG) to analyze dual operand overloaded operators such as "+" as found in some popular dynamic languages.

Each axis in our examples use these codes:

 T R (T = type indicator, R = "representation"[1])
 - -
 S S = Type indicator is string, representation is a string
 S N = Type indicator is string, representation can be interpreted as a number (parsed as)
 N N = type indicator is a number, representation is a number

Notes: "N S" typically is not permitted via syntax rules etc. Also, more than two types can be considered, but we'll limit our examples to two for now to make digestion easier.

The left axis is the left operand and the top axis is the right operand.

Here's a hypothetical THG for a language with narrow rules for when it considers "+" to be numeric. This matches JavaScript in my tests.

 ** S S N
 ** S N N
 SS . . .
 SN . . .
 NN . . #
 // # = numeric result, period = string result, E = error thrown
 // (Asterisks are wiki-spacing place-holders, ignore them.)

Here's a hypothetical THG for a language that uses only the type indicator of the first operand. If the 2nd operand doesn't also have a numeric type indicator, an error is thrown.

 ** S S N
 ** S N N
 SS . . .
 SN . . .
 NN E E #
 // # = numeric result, period = string result, E = error thrown

This hypothetical THG "tries hard" to interpret "+" as numeric. Essentially, if both operands can be interpreted as (convertible to) numeric one way or another, then the "+" expression is assumed to be numeric.

 ** S S N
 ** S N N
 SS . . .
 SN . # #
 NN . # #

My apologies for the cheap AsciiArt grids.

--top

Discussion

How would the above handle types other than 'S' and 'N'?

You'd get more combo's per axis.
That seems awkward to implement. How would you implement it?
This is not really about implementation, but more on documentation.
Oh.
{How would that work? Most of the languages I use support an infinite number of types. That would make for an awfully big table.}
In science one often holds other factors constant in order to study one or a handful of factors that vary in the experiment. Testing languages is no different. Also, generally we are talking about the built-in types. User-defined types generally have source code available for direct inspection.
Also note that built-in "types" are more likely to deviate from industry or popular convention, and this relates to the "frequency" issue discussed below.
{Your response doesn't appear to have anything to do with my question.}
Perhaps I misunderstood your intended meaning. I've read it again and my answers look fine to me upon review. I will add that I've rarely seen somebody heavily overload user-defined types because usually they are built for a specific domain purpose (I'm assuming domain apps, not SystemsSoftware). A worse-case scenario would indeed make a large grid, but I don't know of any superior alternative to track such empirical experiments. Grids are about as compact as it comes on 2D surfaces. As suggested above, it may be easier to directly inspect the code of a UDF rather than use grids to map behavior.
{I asked about how you use this table to handle an infinite number of types. You responded with how you would hold other variables constant. That doesn't answer the question since the only thing varying in that question is the type. So how does this handle languages with infinite type systems?}
I don't see any practical scenario where such would be a concern, for reasons already stated. I solve real problems before I solve fictitious or rare ones. Further, nobody's pointed out a better alternative for empirical testing of gazillion UDF type combos. I'm all ears if you have one. It would probably have to be automated and tracked in a database-like tool. I don't understand what you are trying to solve or fix or "get at".
{Where have you pointed out reasons? I don't see them here. In any case, you said that it would be used for documentation. So, how can I use it to document languages with infinite type systems, (since most of the languages I use have them)?}
How many UDF have you written that use more than two fundamental types? I should clarify that I meant documenting experiments, not formal documentation for language or API users, although they probably could be used for that also.
{So many I've lost count. So, why do we need to perform these experiments in the first place?}
You must code oddly. I've already explained their purpose. If you have no use yourself for TypeHandlingGrid, then simply don't use it. Let people vote with their mouse.

I know of no popular imperative programming language that uses this approach, nor can I imagine any reason why one would. It's not particularly illustrative, either. See the "Operator Invocation" section on TypeSystemCategoriesInImperativeLanguages for how most programming languages handle operator dispatch.

That doesn't directly cover the possibility of soft polymorphism, as discussed in SignaturesAndSoftPolymorphism. It may be fairly rare, but that doesn't mean we should discount it outright. Maybe future scripting languages will care less about machine speed and include more softness.

When there is a popular imperative programming language that implements "soft polymorphism", it will be trivial to add it. Until then, I shall leave it out -- along with all the other obscure and unpopular operator invocation mechanisms that I've chosen not to include.

We don't have to focus just on the popular languages. Like I said elsewhere, domain-specific tools sometimes come bundled with a scripting language such that one may not have the practical option of switching to a popular language. See PageAnchor mucked-381 in ValueExistenceProof. If you want to ONLY model the popular languages, be my guest. But that shouldn't stop somebody from modeling roads less traveled. The wiki town is big enough for the both of us, so put away your guns, hombre.

True, we don't have to focus just on the popular languages. However, how do we decide what "less traveled" roads to model or not to model? There are innumerable obscure and unpopular operator invocation mechanisms that could be modelled. Why model one over the other?

Over? I have no problem with somebody proposing a model of relatively obscure features, as long as the scope and frequency are identified. It may also spark ideas for future language designers (although overloading in dynamic languages is usually a bad idea in general in my opinion).

I have no objection to modelling relatively obscure features either. I'm not sure its of benefit, however, to include relatively obscure features in with common features -- say, in TypeSystemCategoriesInImperativeLanguages's "Operator Invocation" section -- when the purpose is simply to explain common features.

I've encountered a fair number of relatively obscure dynamic languages. I'd estimate roughly 1/3 of all lines of code in my career were not in the top 50 languages. In the work world you often have to use what's available instead of what you want such that obscure languages will be encountered, and fairly often because of products being bundled/integrated with app-specific scripting languages. Thus, the presumption that focusing on the say top 50 will cover a "vast majority" may be a faulty. The distribution curve may be rather flat rather than bunched up toward the popular end; more like "A" instead of "B" below. -t

 Distribution Example A:
 *
 *
 **
 **
 ***
 ***
 ****
 ****
 *****
 *****
 ****** 
 *******
 .
 Distribution Example B
 *
 *
 *
 *
 *
 **
 ****
 ******* 
 ************

How does that help decide what obscure features to include, and what obscure features to exclude? I think if a document is going to describe features found in popular imperative programming languages, it makes sense to exclude features not found in popular imperative programming languages. Predicate dispatch, for example, is certainly interesting (and it effectively generalises operator dispatch) but until it's commonplace (though Clojure is making it come pretty close), I'm not sure anything is gained by describing it in a document on common mechanisms for operator dispatch. The problem with including obscure languages is that they lack any consistent commonality, so that including a feature found in languages used by Programmer A is unlikely to be relevant to Programmer B, who is probably using some other non-intersecting set of obscure languages.

If one encounters a new or "oddball" language, then a TypeHandlingGrid may simplify experiments to better document or model its behavior. One can run tests and fill in the grid. It can then be compared to other binary (dual operand) operators to see if the pattern is language-wide, or merely operator specific. Everybody has their favorite note-taking techniques, but a grid is often more compact and makes it easier to visually spot patterns than sequential notes. The grid is just one suggested tool, and nobody is forced to use it. (I won't demand royalties if you use it; see how nice I am? ;-) -t

{I'm glad to hear that I won't have to use it. Filling out such a grid with a bunch of guesses sounds like a waste of time since the language definition already contains this information without having to guess.}
I'm not sure what you mean by "guesses". The grid results should come from actual tests, not guesses. The written documentation is usually of poor quality. We've been over and over this "documentation" debate in other topics. Please don't bloat up this topic with that debate. Your debate pal(s) agreed that at least some popular languages have poor type-related docs. Further, the only way to know for sure if the documentation is right is to empirically test. If you are not personally interested in empirical testing, then simply ignore this topic.
{Sure they are guesses. I don't think you are expecting us to test all of the approximately 3.4 x 10^38 combinations of values for a typical 64-bit integer in order to get the actual entry? Since that's a rather unreasonable request, we will have to use only a very small fraction of them to fill it out, which makes it a guess.}
Sorry, I don't know what the hell you are envisioning. There is a communication gap somewhere.
{How do you know your entries are correct if you've only tested for a couple of values?}
Are you suggesting testing each and every value also? That would be nice for thoroughness, but is a whole different ballgame.

Note that the TiobeIndex says the following:

   "The following list of languages denotes #51 to #100. Since the
   differences are relatively small, the programming languages are 
   only listed (in alphabetical order)." [11/29/2013]

This does suggest a relatively flat curve ("A" above) for languages in the mid-range of popularity. For flat curves, the probability of actually using a semi-obscure language is fairly high. If you picture the density (length) of the graph bars to reflect the probability of landing on them, then it should be clear that in the work world, one is fairly likely to end up using a semi-obscure language. I am assuming that language choice is usually not the programmer's choice, which reflects my experience. It is true that one can get certificates in a favorite language to increase the chance you'll be hired for it, but for one it's usually no guarantee you'll be hired for that language; and second, often one ends up using multiple languages in a given shop such that even if your education/certificates steer you into a particular language, you'll often be asked to babysit existing programs or product-bundled languages outside of your target language. -t

(My diagrams are upside-down per Tiobe. ToDo: fix the puppies.)

One is fairly likely to encounter scripting languages embedded with various specialized tools that would otherwise be "obscure" languages. Thus, focusing on the top 50 or so is not guaranteed to cover a sufficient proportion of languages one will encounter in the field. For example, I've used a communications tool that had such (in the dial-up modem days), and telephony system in another case, as in "Press 1 to go [bleep] yourself in English, press 2 to go [bleep] yourself in Spanish, or stay on the line to be [bleep] over by a live operator."

Is your irrelevant cursing actually necessary? [I've since cleaned up slightly. -t]

As noted above, the variety of obscure scripting languages -- though that variety is rapidly diminishing in favour of embedded scripting typically based on Lisp, Lua, Python or Javascript -- means a "TypeHandlingGrid" is unlikely to deliver any value that the language manual (and the more obscure the language, the more likely there is to be one) doesn't.

As described elsewhere, I find most language manuals poorly written when it comes to dynamic types. I know you disagree. Let's not reinvent that debate here.

(From ValueExistenceProofFour)

If you're going to include unpopular languages, why not incorporate (say) predicate dispatch, instead of something obscure like "soft polymorphism"?

Predicate dispatch is not typically what the above-described product-embedded scripting languages use (outside of SQL). Product-embedded scripting languages tend to stay close to the AlgolFamily conventions to reduce the learning curve of product users. Plus, we are limiting these topics to AlgolFamily-like dynamic languages to (attempt to) keep the discussion more focused. If you wish to discuss non-AF languages in new topics, that's fine, but I may not be personally interested in participating.

Ok, that's a good reason to exclude PredicateDispatch. Why don't we include MultipleDispatch, then? It's found in AlgolFamily languages like DylanLanguage and TutorialDee.

The issue was modelling types, scalars in particular (as a start), not OOP. There are a lot of potential directions to expand into if one wants to "branch out" their model. We have enough problems agreeing on the simple stuff for now, why pour gasoline on the propane fire?

MultipleDispatch is not limited to OOP.

It's a vague term.

Which one? OOP, MultipleDispatch, or "limited"? If it's OOP, that's fine, MultipleDispatch isn't limited to OOP per se. If it's MultipleDispatch, you're wrong, it's not a vague term. If it's "limited" (or "is", "not", or "to"), I can't help you. I simply see no reason to include "soft polymorphism" but exclude MultipleDispatch, or any of a dozen equally valid but obscure type-related and/or dispatch-related mechanisms.

"Equally obscure" is purely a guess on your part. Granted, its frequency is also a guess on my part but I'll stick with my approach because it's more general purpose than type-signature polymorphism. The IF statements for a given modeled operator can have anything they want in them. (The model user can choose also other model dispatching code techniques as they see fit.) As far as the meaning of "multiple dispatch" as a phrase, I don't want to get into another definition fight today and will skip it for now. You read English and "see" far far different things in it than I do.

I didn't write "equally obscure", I wrote "equally valid but obscure". Considering the fact that the only references anywhere to "soft polymorphism" are this page and SignaturesAndSoftPolymorphism, it's considerably more obscure than MultipleDispatch. That means there's even less reason to include "soft polymorphism" in a reference on operator dispatch, types, or anything else.

My apologies regarding "equally obscure". In that case, I don't know what your point is then regarding the usage of "obscure". And "soft polymorphism" is an internal working name. I'm pretty sure that was pointed out somewhere. Looking globally (public) for an internal working term is of course likely to be futile. And I don't know if it's obscure. Nobody's done a (known) reliable survey. Not you, not me, not Mike. Your guess is based on a very limited sample size. Why are you so full of certainty using such a small sample size? That's a typical symptom of stubbornness. -t

You have argued that we should explicitly include "soft polymorphism" in language models because some non-popular programming languages might use it. I argue that until you identify a language that uses it, there's no point in including it in language models over and above other relatively-obscure dispatch mechanisms like MultipleDispatch, which is demonstrably used in some fairly well-known non-popular programming languages.

Actually I don't explicitly include it. It's only included in examples of operator implementations. And I have no reason to expect MultipleDispatch in product-embedded scripting languages for reasons already given.
Why do you think MultipleDispatch isn't likely to be included in product-embedded scripting languages, but "soft polymorphism" is? MultipleDispatch is very useful, and powerful.
I think you are confusing language features with model features. And those who bundle scripting languages with products are more concerned about familiarity to likely buyers than "powerful". You are thinking like a geek, not a marketing analyst.
Bundled scripting languages are often used inside CAD software, graphics software, videogame development tools, IDEs, financial software, and so on. For such specialist technical arenas, "powerful" is definitely a concern. Even for a marketing analyst, language power becomes bullet points on sales literature. As such, MultipleDispatch is highly likely to appear in scripting languages. It did appear in DylanLanguage. There's no evidence that "soft polymorphism", as you've described it, has ever appeared anywhere.
Regardless, the IF statement approach I used is not more code than OOP parameter signature polymorphism such that EVEN IF the (easy) ability to use "soft polymorphism" is never used, it does not add to the size of the model code. The "penalty" for the flexibility to "do both" is small if any. You, on the other hand would have to implement or switch between two very different dispatching designs.
What "IF statement approach" are you referring to? What "very different dispatching designs" would I have to "switch between" in order to... What?

Your model is based on your best guess, and my model is based on my best guess. Unless you have solid evidence to replace the guess, why complain about guesses different than yours? It makes no sense to me why you insist your guess is more accurate. You are not special. I say LetTheReaderDecide which guesses they prefer (based on their own circumstances and experience). What the hell is wrong with that? Why quibble over guess differences? They are just guesses. It's not like there is "guess math" to apply rigor to pick one over the other. (At least you haven't used any.)

My model is not based on "my best guess", but from formal study of language semantics and empirical study of language implementation internals.

You just claim that; you haven't shown a paper trail, such as quotations and citations. I'm not going to "just trust" you, especially given your odd way of interpreting English.

I've given numerous quotations and citations showing references that accord with my interpretations. That you don't agree with them is not my problem. My "odd way of interpreting English" seems to accord with that of both SoftwareEngineers and ComputerScientists. If it doesn't accord with language used by laypersons, that's not my concern.

I disagree that your references given so far uniquely support your given interpretation. I honestly believe you inject your personal mind models into your interpretation of English descriptions to "see what you want to see". Your "translations" appear highly forced to me. Your preconceived notions are funneling any ambiguity toward your personal interpretation preference. I know you will disagree; no need to state that. I call it as I see it. We'll just have to LetTheReaderDecide who's interpretation of the English they prefer. Also note that I'm not claiming such descriptions are "not in accord" with your model. They often seem to "accord" with both because the English actually used is vague/ambiguous. They don't necessarily contradict your model, they just don't uniquely support it.

In particular, they don't clearly constrain the design of the data structures used to implement/model "variables", "types", and "values" other than perhaps dictating that some computable relationship between them exist for some language features. And that "some" in "some computable relationship" is very general. --top

Overloading a single data structure -- as you do for both values and variables -- is invariably awkward. There's no tenet of programming that would recommend such an approach, and plenty of reasons not to do it. It represents StampCoupling, for example.

We already discussed StampCoupling and "awkward". I disagreed with your assessments. Anyhow, even if it were "awkward" design in terms of software engineering, for the sake of argument, that's a different issue. The documents you cited were not about "designing software well" in terms of maintenance, grokking etc. The issue is "required" properties of said structures. Show me the interpreter-related documents that clearly dictate that such structures "must have property X and must not have property Y" in a general sense. I don't want personal interpretations, I want clear rules, or at least a clear chain that leads to such rules unambiguously. Design choices from a software engineering perspective are a secondary concern.

Footnotes

[1] I used "representation" to avoid triggering a term debate here, not because I agree with its usage. -t

CategoryLanguageTyping, CategoryTypingDebate, CategoryEvidence