Comparing Dynamic Variables

In many strong-typed languages, type-polymorphism or internal "type tags" can be used to determine the type used to compare two or more variables. However, weak-typed or dynamic languages don't have or don't rely on such mechanism, and thus often require explicitly "typed" comparison operators. (Whether this is a "down side" of dynamic languages or not is probably a contentious issue.)

Issues Explored:

(Arguably this topic could be "comparing weakly-typed variables".)


(Moved from DynamicRelational)

Comparisions with regard to typing for dynamic languages seem to have different needs such that SQL syntax may not work as well. For example, to make sure you are comparing as a number instead of string, one has to do something like:

  WHERE toNumber(columnA) > toNumber(columnB)
This is bad repetition factoring. I have proposed comparison functions that allow letter codes:

  WHERE compare(columnA, "n>", columnB)
Here, the "n" indicates it is a number. Or perhaps:

  WHERE numCompare(columnA, ">", columnB)
Some find both of these awkward. However, one advantage is that other comparison features can be added such as:

  WHERE compare(columnA, "nt>", columnB)
Here, the "t" means "trim". It removes white spaces from before and after. This is a common need in my experience. Capitalization management is also something that can take advantage of such. This would reduce monsters such as:

  WHERE ucase(trim(toChar(columnA)) > ucase(trim(toChar(columnB))
To:

  WHERE compare(columnA, "utc>", columnB)
Thus, it kills multiple birds with 1.2 stones. It is better OnceAndOnlyOnce. Some may find these variations more readable:

  WHERE compare(columnA, ">", columnB, "utc")

WHERE compare(columnA, ">", columnB, "ucase,trim,toChar")

--top

What you are doing here is implementing types using casting and tricky obfuscated syntax. In other words, your typeless database has types (or has them poorly emulated, with dangerous decisions now taken in control of the programmer - who is a human and makes errors).

Everything is a string..! But, when you want to convert that string to an integer type, in a typeless language.... you can do so. Makes sense? I thought not.

I'm not sure what your point is. Perhaps you are saying that "typeless" is an inaccurate description of what is taking place. That may be (assuming "types" has a consensus def). I'd prefer "flag-free typing", but that tends to result in arguments. Note that the above tells how to compare the items, not how to store them.

You implemented flags yourself, silly. Every time you make a cast, you are flagging that data temporarily with your own type system. A type is a classification, and you are classifying the data with casts. This, is why a layman's definition of type helps and why it is so important. Do not think of types as in type theory just think of them as classifications. If we classify data as a string, it is a string type. Now whether it is a poor type system or a good one is another story. In your case, it is somewhat like PHP. It bloats up the code with type casting line noise that we don't need to see each and every time. It should be in the schema ONCE and ONLY once. You are violating this and creating silly work arounds to reinvent a type system for the sake of it.

Sure in the other "typeful" databases, once in a while one has to convert types or make casts... but this is only if one explicitly needs to control the system. In your case, you implicitly deal with unsafe binary blobs all the time by default. This is absolutely ludicrous and a step backwards in engineering, computing science, and math. In your case you have to cast all the time to ensure integrity, and humans make integral mistakes. You can't ask a programmer to cast the type all the time - this is like requiring someone program in assembly code each time and expecting them to get it right. This is year 2008 and a high level programming language should not require the programmer manually intervene with his own dangerous error prone binary blob casts.

Your violent opposition to a proper type system is all in your mind and most likely stems on the products you've invested time into, which heavily promote no types. After using these products for several years now, you just couldn't possibly see the value in automating this ludicrous binary blob casting, because all the product brochures you've read have convinced you that type systems are useless (even though you are hypocritically reinventing one yourself without fricking realizing it).

If you are complaining about dynamic/loose typing in general, there are already topics on that and we don't need to repeat that debate here. The assumption here is that one buys into the concept of dynamic/loose typing and wants a database engine that supports that philosophy. This is not an attempt to do the equivalent of selling Perl or SmallTalk to Ada or Eiffel fans. I don't expect strong/heavy typing proponents to accept the idea of DynamicRelational any more than they accept dynamism for application languages.

--top

[I think this a fine example of rejecting a feature in favor of a buggy, slow, more complicated, 80% implementation thereof.]

Your opinion is noted.

You mention Ada and Eiffel: one (many) of the people here that are calling bull shit on you do not use Ada or Eiffel - so please stop making stereotypes and generalizations (next thing you know, we'll here anal language from Top). Even if the people here did use Ada and Eiffel daily, your wording was inflammatory (diverting the topic to language wars instead of staying on topic) you hypocritical piece of shit. Irony intentional: yes this is inflammatory.. I'm fed up with Top - no more arguing - complete waste of time, energy draining, it is pointless to argue with someone like this.

I have no idea why you find what I said inflammatory. If you are Lars, you are heavily sensitive to my wording for some reason and I don't want to bother this time to try to understand your unusual, involved psychology that produces a state of offense. As far as mentioning specific languages, it was an analogy to help people relate to the type philosophies, NOT an intention to turn this into an app language war. You yourself used PHP as an example to illustrate a point. Thus, "diverting the topic to language wars" makes no sense as an accusation unless you are making the same mistake as me. You are too eager to find ill intent. Please, not another ThreadMess about how sinister I am. I did not intend offense, but I doubt I can ever convince of that. I'll just have to learn to man-up and live with the retaliation storms without taking it personally. --top

[EditHint: the above appears to be a standard/typical "type fight", which perhaps can be moved to a type-related HolyWar topic.]

The above may be too unconventional for acceptance. Perl's comparison techniques may suggest some ideas. One down-side of the Perl approach is that one tends to accidentally use the common forms when it's not the only language one works with. This accidentally makes the comparison numeric. Perhaps require a special symbol to indicate the compare type. Examples:

  // a less-than b example clauses:
  WHERE a #< b    // numeric
  WHERE a @< b    // date ("at")
  WHERE a $< b    // string ($ looks kind of like an "S")    
  WHERE a  < b    // syntax error 

Somewhere on this wiki I've proposed a fancier comparing system that extends this idea beyond just "types" because the problems are similar. Comparing often needs to include items such as:

Thus, comparing is not simple. However, few want to use a bloated API or repetitious functions (mirrored on each side) to do comparing. If the language supports key-word parameters, then perhaps we can have something like:

  compare(a, "<", b, type="string", trimright=true, casesense=true)
What I find ugly is syntax such as:

  a.stringLessThan(b).trimRight.caseSensitive
That's about as intuitive as poo, but for some reason it seems popular.

--top

Popular? I've never seen syntax quite like that. It doesn't even make sense. More commonly, it's something like:

 a.trim().isLessThan(b.trim())
This I find elegant; it reads like an English sentence composed of words. It is vastly more readable and expressive than the eye-watering parameter-assignment conglomeration of:
  compare(a, "<", b, type="string", trimright=true, casesense=true)
More importantly, my example implies composable components. Without changing existing components, assuming isLessThan() is case-sensitive, we could support case insensitivity using the following, and gain a handy general-purpose to-uppercase conversion method in the bargain:
  a.trim().toUppercase().isLessThan(b.trim().toUppercase())
Your all-in-one 'compare' procedure is not composable. It presumes all appropriate options (type, trimright, casesense, etc.) have been built into 'compare', and requires that 'compare' be re-written in order to add new ones. I know you probably consider the use of .trim() twice to be repetition, but it is no more "repetition" than using the letter 't' in this sentence twenty three times is repetition. What you call repetition is, in fact, a clear expression of semantics using composable elements.

{TopMind [were he the sort to study and learn from ideas rather than reinvent them from scratch] would probably favor PredicateDispatching. This would allow him to write different 'compare' methods with entirely new operators or features, and automatically select the desired variation. Of course, PredicateDispatching is not particularly composable, either... that is, predicates can overlap, and in the general case it is impossible to identify when one predicate is more specialized than another. But heuristic PredicateDispatching may be GoodEnough for most of the code TopMind writes, which seems to be shallow glue-code between other systems.}

As far as the "dot-path" approach being "more readable", that's subjective. In my approach, the general is on the left and details on the right. In yours, the details and generalness is all mixed together, creating importance-level intermixing. You mention being English-like, however, I find that English is often annoying. That's partly why COBOL is not popular. But if you like dot-paths, then so be it. You know what your eyes and brains like more than I do.

Use of DotDispatch? was intended to be illustrative rather than definitive. The important factor is that your 'compare' procedure is a monolithic brick of code, indivisible, un-extendable, and overloaded with functionality to address a single category of limited purposes. I wouldn't want to see, let alone maintain, what goes on inside it. The use of primitive composable operators permits a degree of extensibility, modifiability, re-use, and elegance that is not possible with monolithic procedures.

{As far as doing the same thing to both sides, that's someplace supporting first-class functions would often shine... and laziness could even keep performance peeked. that is, one could use:
 binOpAfter(lessthan, trim o ucase o midspaceredux, A, B) => ;; evaluates to
 lessthan( (trim o ucase o midspaceredux)(A), (trim o ucase o midspaceredux)(B)) => ;; evaluates to
 lessThan(trim(ucase(midspaceredux(A))),trim(ucase(midspaceredux(B))))

Agreed, but it may also be verbose.

{How so? I'm not seeing a significant verbosity difference, TopMind.}

  compare(A, "<", B, trim=true, casesense=false, midspaceredux=true)
  binOpAfter(lessthan, trim o ucase o midspaceredux, A, B)

And how do we manage multiple operations, their order, etc.? It's not worth it to build a convoluted contraption for such. New compare options may be once or twice a year by my experience.

{The above was multiple operations. If I wanted to change their order, I could: (trim o ucase o midspaceredux) vs. (ucase o trim o midspaceredux) might make a performance difference but happen to be commutative. How do you manage the order of operations in your approach? If you need to extend the 'compare' module even once or twice a year with new options, each time growing it into a larger, combinatorial mess, exactly how is your approach "worth it" compared to the relatively simple functional composition?}

It depends on the language being used. For the sake of argument, I'll agree with you for now. I find it conceptually more palatable than the OO "dot-path" approach, but again the frequency and risk from changing the function(s) to add new compare options is insignificant. Plus, the follow-on programmer likely may not know functional techniques. -t

Re "the follow-on programmer likely may not know functional techniques". I remember when precisely the same argument was regularly applied to object oriented techniques (replace "functional" with "OO"), and before that, to structured programming (replace "functional" with "structured programming"). Many current programming students are exposed to functional programming; it won't be long (for an unspecified value of 'long') until it is considered ubiquitous knowledge.

{I consider modifying code deployed for other users even once a year to be very significant.}

Note that sometimes we want the default to be some activity, and only specify when we don't want it. For example, most of the time I would rather have it ignore case as the default, not the other way around. Perhaps the same for trimming and removing duplicate spaces. -t

{This is a fair point. There is no obvious tweak for the functional approach to have actions enabled by default and disabled by flag. Perhaps something like:

 binOpAfter(lessThan, strcops([notrim,keep_middle_spaces]), A, B) => ;; evaluates to
 binOpAfter(lessThan, ucase, A, B) => 
 lessThan( ucase(A), ucase(B) )
{- where 'strcops' stands for 'string comparison options'. This approach, at least, still provides extensibility to everything but 'strcops' (which may need some sort of central map to extend with new features, unless the language provides some extra features). }

Note also that may approach does not preclude something similar. An optional named parameter could be a list of function names. It could then do an "Eval" on a list loop of those names.

{It is unclear, but are you trying to say something similar to the functional design I indicated earlier? Where would this list of function names fit in?}

Something like this:

 if (strCompare(x,">",y, "trim, caps, foo, zerp"))...
 ...
 func strCompare(a, op, b, optionList) {
   var work_a = a;  //internal altered copy
   var work_b = b;
   ...
   if (! isBlank(optionList)) {
     while (i==listForEach(optionList,",")) {  // iterate list items
        i = trim(i);
        if (inList(i, recognizedList, ",")) {
           // process recognized options, such as "caps"
        } else {
          // make function call based on name with "cmp_" prefix
          work_a = eval("cmp_"+i+"("+escQuote(work_a)+")");
          work_b = eval("cmp_"+i+"("+escQuote(work_b)+")");
        }
     }  // end-while
   }
   ...
 }

The unrecognized "foo" option would call "cmp_foo(...)" for each operand. (Perhaps it should also pass the operation, and maybe even the other parameters to be thorough.)

A downside compared to the UniversalStatement-based keyword approach is that additional parameters cannot be specified. The sub-functions (for lack of a better name) cannot easily have parameters of their own. I cannot think of any useful sub-parameters right now, but couldn't rule it out as a possibility down the road. Maybe rounding level (decimal resolution) for numeric compares? -t


(The context of below seems to have been lost. May need to re-string)

There is nothing, of course, that precludes wrapping an expression composed of primitives in something you find more manageable. For example, trim().toUppercase() could be wrapped in trimToUppercase().

As far as PredicateDispatching, I tend to focus on the "interface" first and then worry about how it's done second. Thus, I won't classify the "how" just yet because that's under the hood. The compare approach I favor is mostly optimized for how I like to work with compare expressions as expressions as a compare-library user. If that disfavors the library builder, it may be worth the trade-off. That also applies to the duplication I want to get rid of.

As far as "composable", I'd need to look at realistic need scenarios for it to comment. It is expandable, but there are different ways and different trade-offs for each approach such that I'd have to see the details to suggest the best way to extend. "I want to add feature X. What are the choices and what are the effort levels and change impact of them?"

"Shallow glue-code"? Sounds like flame-bait. I won't bite this time.

--top

{RE: "I'd have to see the details to suggest the best way to extend" -- that seems very much like saying, "there is no standard, clean way to extend" without being particularly obvious about it. }

{As far as your "realistic needs" scenarios, I never understand why you can't just use the obvious ones. If you are reasonable, you must assume you failed to support all the types and comparisons users care about (which might include addresses, coordinates, colors and hues, etc.). But you do know which features you care about, so the obvious choice is to use those: assume the author of the language/DBMS/etc. forgot to support dates, or forgot to support 'trimming', or forgot to support 'case insensitivity'. You would then show how programmers go about adding that feature so that 'compare' works with the new feature. If you cannot do so without violating code-ownership boundaries, or if you must re-implement the features and types that already exist, then to say "it is expandable" is wrong.}

{Regarding your other comments: "Shallow glue-code between other systems" is a fairly wide domain, the sort of thing at which scripting languages were initially aimed; it excludes SystemsSoftware, but it is hardly flame-bait (unless you're paranoid ;). And your comment on PredicateDispatching is a bit irritating: Dispatching is "interface", not "implementation", especially in context - that being a reply to a comment on composition. I understand what you mean is that you're focusing on the 'compare' interface (i.e. how programmers express a comparison between dates) as opposed to the development interface (i.e. how programmers extend 'compare' to work with dates or add a 'trim' feature), but your utter failure to acknowledge the context is still irritating and strikes me as disrespectful (whether it be through intention or negligence).}

I assure you that I intended no malice. I suspect part of the problem is that we view "types" differently. You and I just think different. -t

{For purpose of this discussion, I have aimed to be consistent in using 'types' the way you used it above when you said: "Type - Number, integer, date, etc.". With 'date' (a temporal coordinate) as an example, there is no reason 'etc.' should not include spatial coordinates. Since date is on an arbitrary scale, it also serves as precedent for colors and hues and so on. If you're thinking something different now than you said some months ago, I suspect that's inconsistency on your part rather than my own.}

If I control an application's library, I can make the compare function(s) take any "type" I want it to. I customize libraries for specific domains & apps all the time (see HelpersInsteadOfWrappers). However, non-linear "types" perhaps do not belong sharing the same comparison function since many if not most idioms are not sharable across them. If the differences are too great, then don't force-fit sharing. (I suppose one could make a general-purpose multi-dimensional "distance" thingamabob such that 1D position analysis is one of many of the combination it excepts. A GodComparitor? would make for interesting MentalMasturbation.) -t

{GodComparator?? as in "Zeus > Hera"?}


No-Change Optimization Via Functions

If the goal is to avoid changing existing compare functions at all costs but still be allowed to create new "type" comparison functions, then one could start by making many small functions intended for re-use. For example, the string comparer may farm off capitalization normalization, space normalization/removal, and actual comparing to smaller functions. If new types come along, they would able to use any of the existing small functions, hopefully without change. For the apps I work on, such is usually overkill, but it's a design option if you really need it. And you don't have to sacrifice syntax to get it.

In general, expressiveness, adaptability, and readability of my utility functions override concerns over "opening existing code" for maintenance, but there are ways to reduce opening without having to entirely switch to ugly interfaces.

--top


This is one of the more interesting debates I've been involved with on this wiki. While a consensus probably has not been reached, at least good cases were made by the parties involved and the examples used were fairly representative of real-world issues without being too "nichy". There is enough info to allow one to compare and ponder. It may make a good student project for code, change, and structural analysis. --top


Foot Notes

[1] Percents meant only for illustrative purposes and are not intended to be official statistics or a peer-reviewed study.


See Also: FeatureBuffetModel


CategoryLanguageTyping, CategoryConditionalsAndDispatching, CategoryFunctionalProgramming


DecemberZeroNine


EditText of this page (last edited October 5, 2012) or FindPage with title or text search