In many strong-typed languages, type-polymorphism or internal "type tags" can be used to determine the type used to compare two or more variables. However, weak-typed or dynamic languages don't have or don't rely on such mechanism, and thus often require explicitly "typed" comparison operators. (Whether this is a "down side" of dynamic languages or not is probably a contentious issue.)
Issues Explored:
- How is the compare-type indicated? For example, do we compare as a string or as a number, because "007" compared to "7" would give different answers under each.
- How are other comparison-related issues, such as spaces and capitalization, dealt with and integrated into the comparing system?
- How are new comparison types added and managed?
- Does improving implementation maintenance conflict with "readability" of the usage and/or duplication of expression?
(Arguably this topic could be "comparing weakly-typed variables".)
(Moved from DynamicRelational)
Comparisions with regard to typing for dynamic languages seem to have different needs such that SQL syntax may not work as well. For example, to make sure you are comparing as a number instead of string, one has to do something like:
WHERE toNumber(columnA) > toNumber(columnB)
This is bad repetition factoring. I have proposed comparison functions that allow letter codes:
WHERE compare(columnA, "n>", columnB)
Here, the "n" indicates it is a number. Or perhaps:
WHERE numCompare(columnA, ">", columnB)
Some find both of these awkward. However, one advantage is that other comparison features can be added such as:
WHERE compare(columnA, "nt>", columnB)
Here, the "t" means "trim". It removes white spaces from before and after. This is a common need in my experience. Capitalization management is also something that can take advantage of such. This would reduce monsters such as:
WHERE ucase(trim(toChar(columnA)) > ucase(trim(toChar(columnB))
To:
WHERE compare(columnA, "utc>", columnB)
Thus, it kills multiple birds with 1.2 stones. It is better
OnceAndOnlyOnce. Some may find these variations more readable:
WHERE compare(columnA, ">", columnB, "utc")
WHERE compare(columnA, ">", columnB, "ucase,trim,toChar")
--top
What you are doing here is implementing types using casting and tricky obfuscated syntax. In other words, your typeless database has types (or has them poorly emulated, with dangerous decisions now taken in control of the programmer - who is a human and makes errors).
Everything is a string..! But, when you want to convert that string to an integer type, in a typeless language.... you can do so. Makes sense? I thought not.
I'm not sure what your point is. Perhaps you are saying that "typeless" is an inaccurate description of what is taking place. That may be (assuming "types" has a consensus def). I'd prefer "flag-free typing", but that tends to result in arguments. Note that the above tells how to compare the items, not how to store them.
You implemented flags yourself, silly. Every time you make a cast, you are flagging that data temporarily with your own type system. A type is a classification, and you are classifying the data with casts. This, is why a layman's definition of type helps and why it is so important. Do not think of types as in type theory just think of them as classifications. If we classify data as a string, it is a string type. Now whether it is a poor type system or a good one is another story. In your case, it is somewhat like PHP. It bloats up the code with type casting line noise that we don't need to see each and every time. It should be in the schema ONCE and ONLY once. You are violating this and creating silly work arounds to reinvent a type system for the sake of it.
- I don't understand the comparison to PHP. PHP has side-type flags whether one uses them or not. What goes on behind the implementation the "compare" operation may or may not use internal flags. It can be done without a flag variable. (Generally I don't like PHP's brand of dynamism. Flags are unnecessary extra dimensions to deal with and anti-WYSIWYG.) -t
Sure in the other "typeful" databases, once in a while one has to convert types or make casts... but this is only if one explicitly needs to control the system. In your case, you implicitly deal with unsafe binary blobs all the time by default. This is absolutely ludicrous and a step backwards in engineering, computing science, and math. In your case you have to cast all the time to ensure integrity, and humans make integral mistakes. You can't ask a programmer to cast the type all the time - this is like requiring someone program in assembly code each time and expecting them to get it right. This is year 2008 and a high level programming language should not require the programmer manually intervene with his own dangerous error prone binary blob casts.
Your violent opposition to a proper type system is all in your mind and most likely stems on the products you've invested time into, which heavily promote no types. After using these products for several years now, you just couldn't possibly see the value in automating this ludicrous binary blob casting, because all the product brochures you've read have convinced you that type systems are useless (even though you are hypocritically reinventing one yourself without fricking realizing it).
If you are complaining about dynamic/loose typing in general, there are already topics on that and we don't need to repeat that debate here. The assumption here is that one buys into the concept of dynamic/loose typing and wants a database engine that supports that philosophy. This is not an attempt to do the equivalent of selling Perl or SmallTalk to Ada or Eiffel fans. I don't expect strong/heavy typing proponents to accept the idea of DynamicRelational any more than they accept dynamism for application languages.
--top
[I think this a fine example of rejecting a feature in favor of a buggy, slow, more complicated, 80% implementation thereof.]
Your opinion is noted.
You mention Ada and Eiffel: one (many) of the people here that are calling bull shit on you do not use Ada or Eiffel - so please stop making stereotypes and generalizations (next thing you know, we'll here anal language from Top). Even if the people here did use Ada and Eiffel daily, your wording was inflammatory (diverting the topic to language wars instead of staying on topic) you hypocritical piece of shit. Irony intentional: yes this is inflammatory.. I'm fed up with Top - no more arguing - complete waste of time, energy draining, it is pointless to argue with someone like this.
I have no idea why you find what I said inflammatory. If you are Lars, you are heavily sensitive to my wording for some reason and I don't want to bother this time to try to understand your unusual, involved psychology that produces a state of offense. As far as mentioning specific languages, it was an analogy to help people relate to the type philosophies, NOT an intention to turn this into an app language war. You yourself used PHP as an example to illustrate a point. Thus, "diverting the topic to language wars" makes no sense as an accusation unless you are making the same mistake as me. You are too eager to find ill intent. Please, not another ThreadMess about how sinister I am. I did not intend offense, but I doubt I can ever convince of that. I'll just have to learn to man-up and live with the retaliation storms without taking it personally. --top
[EditHint: the above appears to be a standard/typical "type fight", which perhaps can be moved to a type-related HolyWar topic.]
The above may be too unconventional for acceptance. Perl's comparison techniques may suggest some ideas. One down-side of the Perl approach is that one tends to accidentally use the common forms when it's not the only language one works with. This accidentally makes the comparison numeric. Perhaps require a special symbol to indicate the compare type. Examples:
// a less-than b example clauses:
WHERE a #< b // numeric
WHERE a @< b // date ("at")
WHERE a $< b // string ($ looks kind of like an "S")
WHERE a < b // syntax error
Somewhere on this wiki I've proposed a fancier comparing system that extends this idea beyond just "types" because the problems are similar. Comparing often needs to include items such as:
- Type - Number, integer, date, etc.
- Trimming - Ignore leading, trailing, embedded white-space, etc.
- Capitals - Ignore or recognize capitalization.
- Collation - ASCII, EBCDIC, Etc.
- Validation - Is validation of types done, such as detecting February 30th, and how is it handled if detected?
- Rounding - Sometimes we only want to compare to a given number of decimal places, especially if forced to used floating point when a "decimal" or "money" type would be more appropriate. (It's good practice to round after each computation if using floating point, but direct floating point comparing tends to be risky either way.)
Thus,
comparing is not simple. However, few want to use a bloated API or repetitious functions (mirrored on each side) to do comparing. If the language supports key-word parameters, then perhaps we can have something like:
compare(a, "<", b, type="string", trimright=true, casesense=true)
What I find ugly is syntax such as:
a.stringLessThan(b).trimRight.caseSensitive
That's about as intuitive as poo, but for some reason it seems popular.
--top
Popular? I've never seen syntax quite like that. It doesn't even make sense. More commonly, it's something like:
a.trim().isLessThan(b.trim())
This I find elegant; it reads like an English sentence composed of words. It is vastly more readable and expressive than the eye-watering parameter-assignment conglomeration of:
compare(a, "<", b, type="string", trimright=true, casesense=true)
More importantly, my example implies composable components. Without changing existing components, assuming isLessThan() is case-sensitive, we could support case insensitivity using the following, and gain a handy general-purpose to-uppercase conversion method in the bargain:
a.trim().toUppercase().isLessThan(b.trim().toUppercase())
Your all-in-one 'compare' procedure is not composable. It presumes all appropriate options (type, trimright, casesense, etc.) have been built into 'compare', and requires that 'compare' be re-written in order to add new ones. I know you probably consider the use of .trim() twice to be repetition, but it is no more "repetition" than using the letter 't' in this sentence twenty three times is repetition. What you call repetition is, in fact, a clear expression of semantics using composable elements.
{TopMind [were he the sort to study and learn from ideas rather than reinvent them from scratch] would probably favor PredicateDispatching. This would allow him to write different 'compare' methods with entirely new operators or features, and automatically select the desired variation. Of course, PredicateDispatching is not particularly composable, either... that is, predicates can overlap, and in the general case it is impossible to identify when one predicate is more specialized than another. But heuristic PredicateDispatching may be GoodEnough for most of the code TopMind writes, which seems to be shallow glue-code between other systems.}
As far as the "dot-path" approach being "more readable", that's subjective. In my approach, the general is on the left and details on the right. In yours, the details and generalness is all mixed together, creating importance-level intermixing. You mention being English-like, however, I find that English is often annoying. That's partly why COBOL is not popular. But if you like dot-paths, then so be it. You know what your eyes and brains like more than I do.
Use of DotDispatch? was intended to be illustrative rather than definitive. The important factor is that your 'compare' procedure is a monolithic brick of code, indivisible, un-extendable, and overloaded with functionality to address a single category of limited purposes. I wouldn't want to see, let alone maintain, what goes on inside it. The use of primitive composable operators permits a degree of extensibility, modifiability, re-use, and elegance that is not possible with monolithic procedures.
- That comes across to me as mere "brochure talk", similar to the stuff that triggered my original "OOP rants" years ago. If you want to convince me, then please show semi-realistic examples of your technique being good and my technique being bad in a domain that's sufficiently similar to mine. The shapes, animal, and device-driver examples in OOP books are not representative of the real world, or at least my domain. Don't sell me a scooter when I need a snowmobile. -t
- *Boggle* To me, the distinction is so obvious and fundamental as to be almost axiomatic. You do understand that you will need to alter your 'compare' procedure every time you need to handle a different type, trim, character ordering, etc., yes? And you do understand that with a set of composable primitives you can add capability without having to rewrite existing primitives, yes?
- If one force-fits most of their models into "types", I can see why such would be a large concern. I don't do such, and therefore such "problems" are rare. And the few times I do add new app-specific "types", it's usually very minor work. And I *don't* have to re-write what's already there. Of all the problems I encounter, that ranks right up there with hangnails to a Middle East soldier. If for some unfathomable reason I couldn't touch the original, then I'd simply make a new newTypeCompare(...) function.
- I don't force-fit my models into "types". The real world is full of types; I model them in my systems. It's nice to work in a domain that maps neatly to character strings, integers, dates, times, and fixed & floating point numeric values. Some domains are like that. Beware, however, of assuming all domains are like that, or even that there are many domains like that. Finance, bookkeeping, product inventory, human resources, payroll, and other core business activities are domains that do map neatly to the canonical types. If you work in any of these areas or a related one, I can see how you'd deprecate types, because the types you need have become so ubiquitous that you're not recognising their typeful nature. Other domains share these canonical types because they're so prevalent in the real world, but many domains need additional types that are no less fundamental than the canonical list. In other domains, complex numbers, polynomials, geographical locations, temperatures, physical dimensions, and so on, may be as fundamental and necessary as, say, the dates and integers in your domain. It is unreasonable, therefore, to define (database/programming) products that provide canonical types without making it at least possible for users to create new types of arbitrary internal complexity and use them with equal facility to the canonical set.
- I don't dispute that such techniques may be more useful in other domains. But frankly, why should one care when building or selecting comparison operations for his or her own domain? That's the real issue here. We're not paid to do the jobs of other domains. Also, I'm a bit skeptical and suspect overuse of "types" in other domains, but will table that for now. -t
- I dispute that your approach is reasonable in any domain. I believe it to be universally ill-conceived.
- I know you are highly bothered by it. That doesn't need repeating. The problem is that you haven't demonstrated specifically why it is bad in my domain.
- Even if you only need the canonical types, it is useful to derive your own types from them in order to ensure, for example, that you can't inadvertently JOIN the EMP table on employee names to the DEPT table on department name. Defining employee name as belonging to the EmpName type, and the department name as belonging to the DeptName type, ensures this kind of safety with no loss of facility and, arguably, an increase in expressiveness.
- This sounds like the standard "heavier typing catches more errors" argument. This topic is about "dynamic" languages/tool for the most part, and they tend to rely on other techniques for QA and/or value nimbleness and brevity over accuracy or compile-time checking. I don't think this is the appropriate topic to rekindle that debate. -t
- I would argue that type checking for data management and type checking for application programming are distinct. The latter may more readily support dynamic typing than the former.
- By the way, my original point had almost nothing to do with types. It stands even if the only type we're concerned with is the string. Your 'compare' procedure is a monolithic brick of code, indivisible, un-extendable, and overloaded with functionality to address a single category of limited purposes, regardless of type. The use of primitive composable operators permits a degree of extensibility, modifiability, re-use, and elegance that is not possible with monolithic procedures, regardless of type. Your approach forces me to deconstruct and alter 'compare', or replace it with a new 'compare', in order to add functionality. Imagine, if nothing else, the risk of breaking existing code by doing this. My approach requires, at best, a novel combination of existing operators. At worst, it requires some new operators and a novel combination of operators. There are no risks to existing code.
- Again, I'd like to see code demonstrations, for the above is not specific enough for me to verify. See TopsToolComparisonTechnique. And I've shown techniques that reduce or eliminate risks to existing code. However, that's not always the primary goal. It must be weighed against all the other trade-offs.
- I've no need for you "to verify" anything I've written. It is sufficient for me to note your wrongness and leave it at that. Indeed, I'm so convinced of your wrongness that I don't feel the need to convince other readers, either. I'm sure they're already convinced; your wrongness is obvious. There is absolutely nothing, and I mean nothing, of benefit in your approach, as illustrated by your ghastly 'compare' function, whose only merits seem to be as un-composable, monolithic, inelegant, and awkward as possible. It runs counter to every bit of good sense developed in the last fifty years of software development. It is a spit in the face of reason, and a vile shit on the doorstep of sense. If I encountered it in the wild, I'd submit it to http://thedailywtf.com/
- I know you are highly bothered by it. That doesn't need repeating. The problem is that you haven't demonstrated specifically why it is bad in my domain. You are not communicating the badness in something that is apparent. If it's truly as horrible as you say, then demonstrating its horror should be easy. "If you don't follow practice X, puppies will die" does not tell anybody WHY they die nor help verify that they do die. ArgumentByTheMasses is not good enough. Otherwise Windows is the "best" OS. Big long dot-paths are ugly in my opinion: they are long and hard-to-read. I'd welcome a different approach to my comparison suggestions, but please make it not be long dot-paths. Are those the only two (that work in typical languages)?
- Specifically, aside from the non-composability, one of its biggest flaws is that you are forced to modify 'compare' and potentially risk breaking live code in order to implement even trivial new functionality. "Dot-paths", in "typical languages", are not the only option. E.g.:
- isLessThan(trim(a), trim(b))
- You still haven't given a real example of useful "composability" that it allegedly lacks. If you are not going to flesh out your criticisms, then please don't keep repeating them. And you seem overly paranoid about changing code. If it's that important for some strange reason, then use the "myTypeCompare(...)" style shown elsewhere. That way no existing compare function need be changed when new types added. [I had to reformat your example, it was confusing wiki among indentation.]
- Composability is a characteristic of a system. See http://en.wikipedia.org/wiki/Composability Your strategy may not need to change an existing 'typeACompare' procedure to add a new typeB, but then it requires that the developer essentially duplicate the majority of functionality from 'typeACompare' in 'typeBCompare'. Also, not all changes will involve new types, so you'll now require changes to typeACompare, typeBCompare ... typeNCompare, etc., for those that do not involve new types. Yes, I am paranoid about changing code! If you had thousands of deployed installations dependent on the correct functioning of a complicated, multi-function procedure like your 'compare', you would be too.
- Again, the devil's in the details. And it would not necessarily need to duplicate existing functionality. For example, another "type" may be able to use the existing "string" or "number" comparitors. And there are ways to further divide if we know we need sub-services. Real-world differences tend to have unpredictable granularity of differences and are non-hierarchical in what they share and don't share. OOP gets messy and ugly when differences are at the sub-method granularity and are not hierarchical. Without seeing realistic examples of what you are "protecting" or extending, I find it hard to trust your word. I'm from the mental "Show Me" state of Missouri.
- If your code is structured to avoid duplication, re-use existing "string" and "number" comparators, etc., exploit sub-services in multiple contexts, and so forth, then you are in all likelihood awkwardly (almost) doing object oriented programming! Remember that programs are not models of the real world, they are machines for manipulating data about the real world. The messy hierarchies of the real world are only a problem if you are creating domain simulations. Creating business applications is not about creating domain simulations. It is about creating computational machinery, in the computational domain, for manipulating business data.
- Without a clear definition of OOP, it's hard to say if such is reinventing OOP. There are many techniques that existed before OOP that OOP may share or borrow from. Some say that objects are really closures and thus OOP is really FP, for example. That's why I'd like to see code so that we don't have to use general, fuzzy, or unsettled classifications to try to communicate. -t
- Your option above is rather Lisp-like, and still has the "symmetry duplication" of "trim" I talked about. Trimming is a frequent need of mine because users often put extra spaces when typing into places that I cannot control the input validation of. Another thing is that you seem to be trying to optimize maintenance while I'm optimizing readability. (It's not that my approach is hard to maintain, it's just a secondary focus for me.) If it was something that needed changing often, then I might re-consider. I'm optimizing for what I feel is where the most time is spent, and reading the app code is more the bottleneck than changing compare function(s) and related side-effects. (Unit tests can help also.) I do think long and hard about these things, my choices are rarely arbitrary. I weigh my time and problem frequency. If I encounter a frequent problem area, then I investigate ways to do it different. It's all logical and rational, not willy-nilly. I am a ponderer by nature and an option-weigher by nature. I suspect you just got some common OOP slogans stuck in your head and keep banging on the same few piano keys over and over out of mantra and habit. I say f8ck to mantra and ponder wider options. For the most part, the common OOP slogans are bullshit in my opinion. At the least, they over and under-emphasize the wrong things. They don't sing to the real problems I encounter, but rather imaginary, contrived, or rare ones. -t
- LISP-like??? Huh? If you have an issue with "symmetry duplication" then create isLessThanTrimmed(a, b), or whatever. I don't know what you're on about re "OOP slogans" and the like. I've been writing procedural programs since the late 70s and object oriented programs since the late 80s, and I prefer object oriented programming because it works better. It's not perfect by any remote stretch of the imagination, but OO makes it easier to write and maintain programs, especially large ones, than pure procedural code. That's from extensive experience with both; it has nothing to do with hype, brochure-talk, etc.
- But maybe you were a poor procedural programmer or using a lame, inflexible procedural language. I've met some of those. I cannot tell without seeing your actual code and actual problem areas and what is compensating for what (WaterbedTheory). Further, I try to break "large" applications into multiple smaller ones. Large application size is a smell. Perhaps some domains require that, and perhaps OO is better when it reaches a certain size if you don't have a choice. However, you seem to be painting with a wide brush. What about medium applications? -t
- The size of application is irrelevant. Breaking large applications into multiple smaller ones is the very essence of object oriented programming! An object is a small application, with input, output, and state. If you are breaking large applications into small applications and connecting them, you are simply doing object oriented programming without the benefit of the syntactic sugar that object oriented languages provide. By the way, if I was a poor procedural programmer, it's highly likely I would likely find object oriented programming even more difficult. Object orientation is not a magic panacea; in the hands of a bad programmer it makes things worse. In the hands of a good programmer, it makes procedural programming easier.
- See above about "reinventing OOP". I find that complicated OOP tends to reinvent database, the hard ugly way.
- And " isLessThanTrimmed()" is an AttributesInNameSmell. -t
- No, it isn't. 'isLessThan' is a subset of the comparison action, 'Trimmed' refers to the action of trimming.
- Most nouns can be turned into verbs, adjectives, etc. and visa verse. The point is to not create combinatorial naming mess of the kind shown in that topic. If we consider all the usual operators (less than, greater than, equal, etc.) and trim-ness and capital-ness and middle-space-ness we WILL get a combinatorial mess. (Plus x-ness's we haven't anticipated yet). That fact that in comparing we want to usually do the same thing to both sides (both operands) seems to be stumping you. And you seem to want to make the usage less natural in order to (allegedly) improve implementation issues. This is backward in my book: spend more effort on getting a friendly interface and THEN worry about implementation. It's true, they both drive each other, but you are under-emphasizing interface in my opinion to serve some nebulous and unproven "composability" manta. Just admit you are currently stymied by symmetry rather than dig your heels into your original position.
- {The question - the one you've been dodging repeatedly - is how do you extend API such that you can compare with, say, all "middle-space" reduced to single spaces. Assume your compare already supports strings, case insensitive compares, and trimming, and you'll also want those features with your "middle-space" feature. What does your API look like after adding the "middle-space" feature, and how do you get there? The other author is asserting that, to add support for middle-space-redux such that you can 'compare(A,"<",B,trim=X,casesense=Y,middle-space-redux=Z)', you'll need to modify the "compare" code - which both risks breaking components that use the existing "compare" function and violates any sort of modularity (you need the implementation of 'compare'). The other option is to use an inconsistent approach, such as 'compare(reduceMiddleSpaces(A),"<",reduceMiddleSpaces(B),casesense=X,trim=Y)'. Either of these approaches suggests your design is NOT extensible except in the most trivial of senses (that any program is extensible if you're willing and able to rewrite an arbitrary body of its code). Your opinion that this seemingly non-extensible comparison interface is "friendly", or that it would effectively deal with the "x-ness's we haven't anticipated yet", so far lacks a cogent argument. Why is your interface "friendly"? How does it handle an "x-ness you haven't anticipated yet"?}
- And, I agree there are places where OOP can improve things a bit. But this ain't one of them.
- {The other author has already shown how one can easily extend an OO API to support 'trim' (and, equivalently, 'middle-spaces') without modifying the 'lessThan' code. You have NOT done anything equivalent. If you feel your approach has the same power as OOP in this case, I'd like to see you extend 'compare' with a feature it was missing initially, without modifying compare, and without becoming inconsistent.}
- I can also "extend without changing" if I duplicate the operation for both sides on each usage. But THAT is what I want to avoid. You are accepting bloat to avoid changing existing modules. I believe I'm picking the lessor of two evils. If the domain code is easier to work with, then I have more time for testing etc. And less domain bugs because it's easier to read and inspect. That is a good thing. Dealing with domain logic is the hard part. That's the bigger time sponge. If I accept a little risk for the infrastructure-side of things to reduce problems in the domain logic, then I am net ahead. I weigh shit like that when I design software. I don't like problems any more than you do. -t
- {If all you're looking for is to do the same thing for A and B prior to comparing them without repeating the process in the interface, that's easy to abstract with any sort of first-class blocks, functions, or even objects. (I.e. OO can do the same thing as the functional example, below, via use of FunctorObject.) However, your 'compare' interface will certainly require a bloated implementation because all those features will be pushed to a central module, which must know about every possible 'x-ness'.}
- Centralized and "bloated" are not necessarily the same thing. Further, it may be better to bloat the implementation than to bloat the interface. A similar issue came up at JavaIoClassesAreImpossibleToUnderstand. Again, I generally put more somewhat more importance on simplifying the domain "language" or API's than the "under the hood" infrastructure. The domain-heavy parts are where most the effort and re-work time is spent in my experience. Related: WorkBackwardFromPseudoCode
{As far as doing the same thing to both sides, that's someplace supporting first-class functions would often shine... and laziness could even keep performance peeked. that is, one could use:
binOpAfter(lessthan, trim o ucase o midspaceredux, A, B) => ;; evaluates to
lessthan( (trim o ucase o midspaceredux)(A), (trim o ucase o midspaceredux)(B)) => ;; evaluates to
lessThan(trim(ucase(midspaceredux(A))),trim(ucase(midspaceredux(B))))
Agreed, but it may also be verbose.
{How so? I'm not seeing a significant verbosity difference, TopMind.}
compare(A, "<", B, trim=true, casesense=false, midspaceredux=true)
binOpAfter(lessthan, trim o ucase o midspaceredux, A, B)
And how do we manage multiple operations, their order, etc.? It's not worth it to build a convoluted contraption for such. New compare options may be once or twice a year by my experience.
{The above was multiple operations. If I wanted to change their order, I could: (trim o ucase o midspaceredux) vs. (ucase o trim o midspaceredux) might make a performance difference but happen to be commutative. How do you manage the order of operations in your approach? If you need to extend the 'compare' module even once or twice a year with new options, each time growing it into a larger, combinatorial mess, exactly how is your approach "worth it" compared to the relatively simple functional composition?}
It depends on the language being used. For the sake of argument, I'll agree with you for now. I find it conceptually more palatable than the OO "dot-path" approach, but again the frequency and risk from changing the function(s) to add new compare options is insignificant. Plus, the follow-on programmer likely may not know functional techniques. -t
Re "the follow-on programmer likely may not know functional techniques". I remember when precisely the same argument was regularly applied to object oriented techniques (replace "functional" with "OO"), and before that, to structured programming (replace "functional" with "structured programming"). Many current programming students are exposed to functional programming; it won't be long (for an unspecified value of 'long') until it is considered ubiquitous knowledge.
{I consider modifying code deployed for other users even once a year to be very significant.}
- Fair enough. For your environment it seems that implementation issues override readability issues in many cases. It depends on the environment and/or domain. -t
Note that sometimes we want the
default to be some activity, and only specify when we don't want it. For example, most of the time I would rather have it ignore case as the default, not the other way around. Perhaps the same for trimming and removing duplicate spaces. -t
{This is a fair point. There is no obvious tweak for the functional approach to have actions enabled by default and disabled by flag. Perhaps something like:
binOpAfter(lessThan, strcops([notrim,keep_middle_spaces]), A, B) => ;; evaluates to
binOpAfter(lessThan, ucase, A, B) =>
lessThan( ucase(A), ucase(B) )
{- where 'strcops' stands for 'string comparison options'. This approach, at least, still provides extensibility to everything but 'strcops' (which may need some sort of central map to extend with new features, unless the language provides some extra features). }
Note also that may approach does not preclude something similar. An optional named parameter could be a list of function names. It could then do an "Eval" on a list loop of those names.
{It is unclear, but are you trying to say something similar to the functional design I indicated earlier? Where would this list of function names fit in?}
Something like this:
if (strCompare(x,">",y, "trim, caps, foo, zerp"))...
...
func strCompare(a, op, b, optionList) {
var work_a = a; //internal altered copy
var work_b = b;
...
if (! isBlank(optionList)) {
while (i==listForEach(optionList,",")) { // iterate list items
i = trim(i);
if (inList(i, recognizedList, ",")) {
// process recognized options, such as "caps"
} else {
// make function call based on name with "cmp_" prefix
work_a = eval("cmp_"+i+"("+escQuote(work_a)+")");
work_b = eval("cmp_"+i+"("+escQuote(work_b)+")");
}
} // end-while
}
...
}
The unrecognized "foo" option would call "cmp_foo(...)" for each operand. (Perhaps it should also pass the operation, and maybe even the other parameters to be thorough.)
A downside compared to the UniversalStatement-based keyword approach is that additional parameters cannot be specified. The sub-functions (for lack of a better name) cannot easily have parameters of their own. I cannot think of any useful sub-parameters right now, but couldn't rule it out as a possibility down the road. Maybe rounding level (decimal resolution) for numeric compares? -t
(The context of below seems to have been lost. May need to re-string)
There is nothing, of course, that precludes wrapping an expression composed of primitives in something you find more manageable. For example, trim().toUppercase() could be wrapped in trimToUppercase().
As far as PredicateDispatching, I tend to focus on the "interface" first and then worry about how it's done second. Thus, I won't classify the "how" just yet because that's under the hood. The compare approach I favor is mostly optimized for how I like to work with compare expressions as expressions as a compare-library user. If that disfavors the library builder, it may be worth the trade-off. That also applies to the duplication I want to get rid of.
As far as "composable", I'd need to look at realistic need scenarios for it to comment. It is expandable, but there are different ways and different trade-offs for each approach such that I'd have to see the details to suggest the best way to extend. "I want to add feature X. What are the choices and what are the effort levels and change impact of them?"
"Shallow glue-code"? Sounds like flame-bait. I won't bite this time.
--top
{RE: "I'd have to see the details to suggest the best way to extend" -- that seems very much like saying, "there is no standard, clean way to extend" without being particularly obvious about it. }
{As far as your "realistic needs" scenarios, I never understand why you can't just use the obvious ones. If you are reasonable, you must
assume you failed to support all the types and comparisons users care about (which might include addresses, coordinates, colors and hues, etc.). But you do know which features you care about, so the obvious choice is to use those: assume the author of the language/DBMS/etc. forgot to support dates, or forgot to support 'trimming', or forgot to support 'case insensitivity'. You would then show how programmers go about adding that feature so that 'compare' works with the new feature. If you cannot do so without violating code-ownership boundaries, or if you must re-implement the features and types that already exist, then to say "
it is expandable" is wrong.}
- So you are talking about "compound types". Addresses, coordinates, colors, hues, etc. tend to be represented different ways in different tools, API's, and languages. I tend to use tables for those, not "types" (which shouldn't surprise you). That way I don't have to do OO's one-node-at-a-time sequential navigation of them. And if OOP is the best way to represent those because the libraries are in OOP and have pre-defined comparers, then I will happily use those that came with the API, such as lighter versus darker colors. Otherwise, I usually store colors in HTML format: hex-RRGGBB. When comparing addresses, I usually treat them as data tables because fiddling often has to be done with street names, address numbers, zip-codes etc. in a bulk basis. For example, somebody was making a map using GIS software and wanted me to eliminate duplicates that used different naming conventions. I created various heuristics that looked at each part (column) in different ways. I created a second or more candidate table(s) for additional comparing, such as changing all "Park of X" to "X Park". Sometimes it's easier to work with such things as single strings, tables, or OOP types/classes. There's no one right way and I don't seek a GodLanguage. -t
- OOP's style is useful the least amount of time among the 3. It takes too much set-up code for single-spot usage. I'm not paid to write a generic color kit, but merely need to do one specific thing in one specific spot and may likely never do it again. That's my domain. And if I did have to do a lot of fiddling with colors, I'd more than likely store them in table form (column: red, green, blue) and do most of the processing via the RDBMS (SQL) and thus there's be less need to bring them into direct app code. I try to do most of the heavy lifting in SQL. The app-code just adds final touches and pretties it up for display. Even if there was an API for comparing colors or any compound type, it may not be efficient to marshal each row into the app language to do something with them all. There might be a way to handle 90% of the target rows with SQL and only 10% need to be brought into the app language for additional fine-fiddling, for instance [1]. "Comparing" with compound "types" tends to be domain-specific and spot-specific anyhow. Write a function FunkyCompareOfColors?(a,b) if it's needed repeatedly. The operations of such are not very standard, as the address case scenario illustrates. Simple. I guess in some sense the "glue code" label is true: I generally shuffle things back and forth between RDBMS and output API's or tools, with various filters, re-mappings, and conditionals in between. If everything was born and died in the same language, then a different more type-centric approach may make sense. -t
- {The above two paragraphs make it clear that you've missed my point, which was simply that you already have ready-made "realistic needs scenarios". To reiterate: rather than demanding a new domain-value type like 'color', you should consider something you need quite often to be a 'realistic needs scenario'. For example, assume the tool had no support for dates and times. From the above, you're saying you'd simply represent dates and times in a dozen inconsistent ways based on the immediate need (e.g. flipping between (column: year, month, day) and (column: seconds since 1970) arbitrarily based on what you needed to do). But I wasn't even interested in that answer. My question - if I expressed one at all - was "Why don't you use something you already know you need, like dates, to show how you'd 'extend' the 'compare' operation if it were not already supported?". It takes a lot of arrogance to assume that the few types you use often are the few types everyone else needs, and many people won't need 'date' any more often than you need 'color'.}
- "Adding" dates can actually be AutoMagic?: just assume that internally dates are represented in the string format "YYYY-MM-DD" (10 digits) where YYYY is the year, MM the month, and DD the day number. Then we can compare as string and don't have to change anything. A fancier version would verify the format via a regular expression or the like if the type is indicated as "date". An even fancier version would convert (parse) to YYYY-MM-DD from MM/DD/YYYY when if it finds a slash or matches the slash reg-ex. Adding a time portion is merely an extension of the same thing. Adding many new linear types would generally follow the same pattern: convert to an internal format (or just verify it if it needs no change) that sorts properly as string. Even numbers can do that if padded with zeros to fit the longest and some IF statements for sign. Color is not a linear type, so it's a poor example. And keep in mind that the PageAnchor "dedicated_option" is an option if you really don't want to touch existing functions due to some purity urge or some real requirement (cough). -t
- {And regarding another of your comments: "I tend to use tables for [compound types], not 'types'". -- I do not agree with your assessment of your own behavior. You say that addresses, coordinates, colors, hues, etc. are 'compound types' (which they aren't, though they are often represented as compound types because languages can't feasibly support every possible DomainValue with a dedicated primitive). This is no less true for the types you listed: number, integer, date, string. In many languages, for example, a string is a compound type: a list (a record of (first,rest) or nil) of integers. You have yet to express your 'tendency' to represent strings, dates, numbers, and other 'compound' types as tables.}
- Multiple times I have split out year and month portions of dates into views and table copies for various reasons. For example, when dealing with yearly reports, it can shorten a lot of code to say if(year==x) instead of something like if(fooDate >= stringToDate("01/01" & year) && fooDate <= stringToDate("12/31/" & year)). And if I have a good reason to break up a string into table elements, I would do that too. I've used tables for word/code-parsing before. If the language doesn't readily support tables, then maps are often the next best thing. -t
- {It seems to me you're comparing against some straw-man code - I'd compare if(year==x) to if(yearOf(fooDate)==x). Anyhow, rare circumstances fail to make 'tendencies'. And transformations don't bother me at all... as an aficionado of FunctionalProgramming, I'll quite happily convert elements to/from trees, lists, graphs, matrices, relations, strings, dates, numbers, or whatever is convenient for the immediate expression. And I'd certainly favor a DBMS that can support whatever I find convenient for the operations I'm performing. It is my impression that you would prefer to make my life difficult, forcing me to use obtuse and non-composable table layouts (RelationalTreesAndGraphsDiscussion) and equally obtuse and non-composable functions (such as that 'treeEq' operator you presented in RelationalTreeEqualityExample) except that you're totally inconsistent about it because you don't bother making my life difficult for working with strings, dates, or TopMind's other favorite types.}
- Where does "yearOf" come from? Not all RDBMS support it. Besides, if functions are allowed to add new functionality then functions are allowed to add new functionality. In other words, yet another tool in our arsenal. As far as "obtuse", people who live in glass houses shouldn't throw rocks. And if I need to use treeEqual 50 different times in 50 apps, then I'll reconsider my position.
- {How would you go about "split out year and month portions of dates" without some equivalent to a yearOf function? If you can perform that sort of break-down, I must have a yearOf. Anyhow, last I checked, glass houses are rather transparent... not 'obtuse' at all.}
- I forgot to also mention that expressions, including function results, are not indexable or only partially-indexable in most RDBMS. But I'm not sure where this is going anyhow.
- (PageAnchor: dedicated_option)
- Also note that having separate compare functions for each "type" is also an option. Thus, we could have strCompare(), dateCompare(), and numCompare(). It doesn't change much from an app-developer's perspective, but if it somehow narrows the scope of type-related changes, if that's what you are after, then it's certainly an option to consider.
- {That's reasonable. Of course, the reason I mentioned PredicateDispatching a long while back is that with it you could heuristically support 'closest-match' extensions to compare for new 'type=' flags - and other features (like 'trim' and 'casesense') - without modifying the existing definitions of 'compare'.}
- If you imply anything I do on this wiki is "reasonable", you will be ostracized here. Sure you don't want to re-word that? Note that it was mentioned above (or at least implied) as an option long ago. -t
{Regarding your other comments: "Shallow glue-code between other systems" is a fairly wide domain, the sort of thing at which scripting languages were initially aimed; it excludes
SystemsSoftware, but it is hardly flame-bait (unless you're paranoid ;). And your comment on
PredicateDispatching is a bit irritating: Dispatching
is "interface", not "implementation", especially in context - that being a reply to a comment on
composition. I understand what you
mean is that you're focusing on the 'compare' interface (i.e. how programmers express a comparison between dates) as opposed to the development interface (i.e. how programmers extend 'compare' to work with dates or add a 'trim' feature), but your utter failure to acknowledge the context is still irritating and strikes me as disrespectful (whether it be through intention or negligence).}
I assure you that I intended no malice. I suspect part of the problem is that we view "types" differently. You and I just think different. -t
{For purpose of this discussion, I have aimed to be consistent in using 'types' the way you used it above when you said: "Type - Number, integer, date, etc.". With 'date' (a temporal coordinate) as an example, there is no reason 'etc.' should not include spatial coordinates. Since date is on an arbitrary scale, it also serves as precedent for colors and hues and so on. If you're thinking something different now than you said some months ago, I suspect that's inconsistency on your part rather than my own.}
If I control an application's library, I can make the compare function(s) take any "type" I want it to. I customize libraries for specific domains & apps all the time (see HelpersInsteadOfWrappers). However, non-linear "types" perhaps do not belong sharing the same comparison function since many if not most idioms are not sharable across them. If the differences are too great, then don't force-fit sharing. (I suppose one could make a general-purpose multi-dimensional "distance" thingamabob such that 1D position analysis is one of many of the combination it excepts. A GodComparitor? would make for interesting MentalMasturbation.) -t
{GodComparator?? as in "Zeus > Hera"?}
No-Change Optimization Via Functions
If the goal is to avoid changing existing compare functions at all costs but still be allowed to create new "type" comparison functions, then one could start by making many small functions intended for re-use. For example, the string comparer may farm off capitalization normalization, space normalization/removal, and actual comparing to smaller functions. If new types come along, they would able to use any of the existing small functions, hopefully without change. For the apps I work on, such is usually overkill, but it's a design option if you really need it. And you don't have to sacrifice syntax to get it.
In general, expressiveness, adaptability, and readability of my utility functions override concerns over "opening existing code" for maintenance, but there are ways to reduce opening without having to entirely switch to ugly interfaces.
--top
This is one of the more interesting debates I've been involved with on this wiki. While a consensus probably has not been reached, at least good cases were made by the parties involved and the examples used were fairly representative of real-world issues without being too "nichy". There is enough info to allow one to compare and ponder. It may make a good student project for code, change, and structural analysis. --top
Foot Notes
[1] Percents meant only for illustrative purposes and are not intended to be official statistics or a peer-reviewed study.
See Also: FeatureBuffetModel
CategoryLanguageTyping, CategoryConditionalsAndDispatching, CategoryFunctionalProgramming
DecemberZeroNine