Database Domains For Numbers

Numbers may be used to represent data in very different contexts. If those contexts are not clarified their use may become confusing (the same way that None, Null and Zero can be confused). Examples:

Numbers for OrdinalScaleData?: In this scale type, the numbers assigned to objects or events represent the rank order (1st, 2nd, 3rd etc.) of the entities assessed. An example of ordinal measurement is the results of a horse race, which say only which horses arrived first, second, third, etc., if we use a number field in a database to represent this, when a horse gets injured at the middle of the race and never reaches the end: What number do we use for that? Note that using Zero may lead to inconsistencies:does Zero mean that it arrived before the first? (if we sort by that column that might well be the answer) or does it mean that it did not arrive?. Also note that SqlNull (Unknown) would not be accurate: we do know that the horse did not end the race, so we are not dealing with unknowns here, what we want to represent is the known fact that the horse did not end the race. So maybe "None" is a better match? Or maybe an special "OrdinalZero?"? or "OrdinalNone?"?

Numbers for IntervalScaleData? An interval scale is a scale of measurement where the distance between any two adjacent units of measurement (or 'intervals') is the same but the zero point is arbitrary. Scores on an interval scale can be added and subtracted but can not be meaningfully multiplied or divided. The formal mathematical term is an AffineSpace? (in this case an AffineLine?). Variables measured at the interval level are called "interval variables" or sometimes "scaled variables" as they have units of measurement. A highly familiar example of interval scale measurement is temperature with the Celsius scale. In this particular scale, the unit of measurement is 1/100 of the difference between the melting temperature and the boiling temperature of water at atmospheric pressure. It makes no sense to multiply 2°C * 8°C. Another example is Gregorian calendar years. (And it also makes no sense to multiply them: year 1999 * year 2000 = ¿what does the answer mean? ). But we generally see this implemented as plain "Integers" in typical PseudoRelationalDatabase?s.

Numbers for RatioScaleData?:Most measurement in the physical sciences and engineering is done on ratio scales. Mass, length, time, plane angle, energy and electric charge are examples of physical measures that are ratio scales.Informally, the distinguishing feature of a ratio scale is the possession of a non-arbitrary zero value. For example, the Kelvin temperature scale.

No need here for new inventions

As for the horse race example, the most practical approach would be to use null for non-finishers, but have a separate "status" column that would indicate the status of a given horse for a given race such as: "finished", "injured not-finished", "injured finished", "disqualified", "jockey didn't show", "other, see booth", etc. There's no need to invent custom nulls or custom types.

Depending on display room, it may be a dedicated column or a footnote letter on the result card or reports. But notice that it's the status of the race, not just the rank. Thus, associating it with just the rank value is perhaps a poor association approach.

Further, with a "status" column, it's easier to make/reuse a CRUD editor so that administrators etc. can add or change status information (perhaps in a ConstantTable if we don't need other attributes). Custom complex types would generally require a programmer visit to make changes to the status info if it's part of a "type". It may be good job security for you, but customers don't like that. Your "type" is morphing into a mini database, but without out-of-box database abilities and tools (GreencoddsTenthRuleOfProgramming).

As far as the year multiplication example, I have multiplied date elements in the past to create a hash for security or file re-distribution. If we create a custom type that forbids certain operations, then such uses may be made more difficult. I'd instead suggest a "warning system" rather than hard rejection of suspect uses if you want to go the "protection" route. We've learned from elements such as Java's "final" indicator that original authors are often not very good at anticipating future uses.

--top

"Further, with a 'status' column, it's easier to make/reuse a CRUD editor so that administrators etc. can add or change status information ..."

Interesting you mention that, because something I've been experimenting with in the RelProject is the ability for clients to automatically download type definitions from the database and automatically generate client-side user interfaces (UIs) to edit type values. This is possible because every user-defined type definition -- no matter how complex -- describes a directed graph of types where nodes of outdegree = 0 are primitive, built-in types. Thus, it appears to be possible to automatically create a client-side UI editor to manipulate values of any type. To test the basic idea, some time ago I successfully created a NakedObjects-like environment for Java, which used Java reflection to allow a user to instantiate arbitrary Java class instances and manipulate the instances with appropriate textboxes, etc., automatically displayed for every primitive property. I employed it in an experimental general-purpose GraphicalProgrammingLanguage called Tomato. See http://tomatoide.sourceforge.net/

Thus, support for complex types permits a considerably higher degree of automated UI generation than is possible in an environment having only discrete primitive types, because a type inherently defines a grouping of primitive attributes. Only having discrete primitive types means the developer is forced to describe such groups every UI "form" or dialogue box where they are used. For example, you would need to explicitly code the fact that your 'Status' attribute is related to your 'HorseRaceResult' attribute every time the two are used in a UI form. With a single type definition that includes both attributes, their relationship is implicit in the type definition and automatically can apply to every UI element that references it.

As you probably expect, I'm going to call GreencoddsTenthRuleOfProgramming on that one. One can use a database to group and manage base/primitive types also. It may be more flexible because it's less subject to encapsulation rules typical of types. For example, the original design above tightly couples the placement (race rank) with the "status". If later we want to decouple it and make it associated with the general horse-per-race row rather than just placement, it's merely a matter of foreign keys (and possibly constraints) if done the DB-centric way, and possibly no schema change at all. As I've said before, RDB's generally make better "relativity engines" than types and object because of the "soft" nesting, or rather, reference-based associations instead of encapsulation. If this trades in protection for flexibility, so be it. In domains where I hang out, flexibility is more important. (I'm not sure it does trade, but am preparing for that claim.)

The domains where you hang out are ERP related, are they not? Working on ERP systems can lead you to believe the entire IT world needs only INTEGER and CHAR, and to a lesser degree DATE, TIME, REAL, MONEY and BOOLEAN types. Outside of ERP -- such as numerical computing, geography, games, simulations, engineering, financial management, and medical informatics, to name just a few -- those canonical types are still popular, but the need for rich user-defined type support becomes significant. Yes, you can define COMPLEX, POLYNOMAL, TEMPERATURE, BINARY_TREE, GEOGRAPHICAL_COORDINATE, INFINITE_PRECISION_REAL, DNA_STRAND, MEDICATION_FREQUENCY, and innumerable user-defined types via database schemas and user-defined procedures tied to specific schemas, but the work this requires is laborious, tedious and error-prone compared to using user-defined types. The reason for this is simple: A complex type defined via a database schema requires either that (a) type operations be repeated in every query, or (b) they must be defined in procedure definitions tied specifically to a given schema or schema structure. With proper support for complex types, type definitions do not depend on any database schema, operators are specified OnceAndOnlyOnce in the type definition, and values of complex types can be manipulated as easily as primitive canonical types like INTEGER and CHAR. Support for providing multiple views of a given value is explicitly available in certain type systems. For example, DateAndDarwensTypeSystem provides POSSREPs; i.e., multiple possible representations for a given type.

I am discussing the example given: horse-racing above, which I didn’t create, by the way. If you wish to change examples, I’d suggest create a new sub-section for it to try to keep the topic flow simpler. For now, may I ask that we finish up the horse-race scenario before moving on to others. -t
Your comments appeared to be general, so my response was general.
I'm not sure I entirely agree, but can I ask that we together finish up the horse race example? There are a lot of potential issues raised by your statement such that I'd rather close out pending items before opening others.
Sure. What items are pending?
Do you agree with my analysis of the horse-race example?
No. My general response applies. The horse-race scenario, implemented as two columns, is obviously less dire than implementing a generic binary tree type via a database schema, however.
No to which? Let me ask something more specific: in the horse-race scenario, do you agree that a "Status" column is at least as good or better than the custom type solution? -t
No. My general response applies.
Could you demonstrate an empirical improvement UseCase without relying on the assumption of a database and/or language that doesn't exist yet?
The empirical improvement I've already described holds true for all database management systems that permit user-defined types, and indeed holds true for all languages that permit user-defined types. There has been no reference here to a database and/or language that doesn't exist yet. The RelProject implements DateAndDarwensTypeSystem. You can download it, run it, and use it.
I didn't come here to get yet another sales pitch for Rel. Let me ask this, why should race status be tied to rank (finish placement) in the first place? It's a conceptual modifier of the horse_at_race entity, not rank.
It wasn't a sales pitch. You claimed I was relying on a database and/or language that doesn't exist yet. I presumed you were referring to Rel or some component thereof, such as DateAndDarwensTypeSystem, that in fact does exist. Race status should be tied to rank because they are inseparable aspects of a horse race result, i.e., we must always consider race status along with rank when examining the result of a race.
But it may affect the interpretation of other info also, such as finish time. And even other entities perhaps, such as jockey pay.
So? Is that any different for HORSE_RACE_RESULT than for CHAR, INTEGER, etc.? The value of a given CHAR or INTEGER might also affect finish time or jockey pay, no?
You lost me. Please explain. What kind of thing is HORSE_RACE_RESULT meant to be?
It's a type, i.e., a description of a set of possible values and operations on those values. I assume it is intended to represent the possible results of horse races, and some (unspecified) operations on them.
I don't want to get into an argument about what it should be called. From a practical perspective, it belongs with the entity, not rank, for both future-proofing and to avoid duplication. Otherwise, you may have to repeat the "status" list for say finish_time also. And it's not unexpected that one may query by race status. Why should one have to chop open rank to find that? And conceptually it is the status of the race for that horse, NOT just rank. It is a fact about the race, not about (just) the rank. That they may be interrelated, sure, but that's life. LifeIsaBigMessyGraph. It's true that rank and finish_time "types" may reference a status, but the status is still on the row, and that's arguably an overly-complicated schema/type design.
Fine. Feel free not to implement it as a type. Feel free to implement an INTEGER as eight individual boolean 'bit' columns, too. :-) By the way, you do not have to "chop open rank" to query on the rank component of a HORSE_RACE_RESULT, and I presume the example was intended only to be illustrative of a particular flavour of type, rather than be a real-world example of best practice in creating gambling applications. As such, a single HORSE_RACE_RESULT type is probably not the best justification for user-defined types in databases. On the other hand, it's not a bad example for illustration purposes, but a far stronger justification comes from types like COMPLEX, GEOGRAPHICAL_COORDINATE, TIME, DATE, MONEY, TEMPERATURE, VELOCITY, POLYNOMIAL, and so on. Oh, and INTEGER, CHAR, REAL, and so on. The canonical built-in types are also an argument for types in databases.
The most important part of that is "Admittedly, a HORSE_RACE_RESULT type is not the strongest justification for user-defined types in databases". So at least in this case, the current "Null" approach is a sufficient approach. It may not make everybody happy, but is arguably one of the top candidates for design choice and a decent reason to have a standard "Null" (at least if we use a "wide" table design). -t
Absolutely not. Use of NULLs is unconditionally dire. Do not misinterpret what I wrote. I wrote that HORSE_RACE_RESULT is not a particularly strong example in favour of user-defined types because there are better examples. That does not mean NULLs are acceptable or sufficient, or even that HORSE_RACE_RESULT should be implemented as anything but a type.
Arrrg. But why? I'm asking for justification using that scenario. What are some specific examples of what it's given me in practical terms? It looks like GoldPlating to me. Are we back to square one? -t
As I wrote above: A complex type (such as HORSE_RACE_RESULT or GEOGRAPHICAL_COORDINATE) defined via a database schema requires either that (a) type operations be repeated in every query, or (b) they must be defined in procedure definitions tied specifically to a given schema or schema structure. With proper support for complex types, type definitions do not depend on any database schema, operators are specified OnceAndOnlyOnce in the type definition, and values of complex types can be manipulated as easily as primitive canonical types like INTEGER and CHAR. That's pretty practical, innit?
No. You still haven't demonstrated it's net better than a dedicated status column for the horsey scenario. You haven't demonstrated a real need for a complex type here. For one, using schemas instead of complex types makes it easier to share info among existing tools because I don't have to write adapter accesors, as discussed already. Second, status-ness and race rank are two mostly different conceptual ideas, as already described above. You over-coupled them. -t
How is your "net better" metric measured? Is it absolute, or is it dependent on the user's requirements?
Let's work on specifics first if possible; for macro-rigor is a big project. Could you provide a few scenarios under which the type-centric approach would shine compared to a schema-centric one for the horse-race example?

Further, while automatic CRUD form generation is a nice idea, in practice one tends to need different groupings and representations for different situations. The EightyTwentyRule reigns supreme in CRUD design. Thus, "soft" groupings are the better route in my experience. Type- and OO-centric groupings tend to be inappropriately "hard" (overly-coupled). --top

That may be true for schemas; it is not true for types. An INTEGER will always be edited as an INTEGER. If it should not be edited as an INTEGER, it is a different type (or at least an INTEGER with multiple POSSREPs; see above). This applies analogously to complex types.

I’m bothered by the idea that the needs of one specific user out of many may dictate a total change in design. But, the devil’s in the details, and I will only comment on a case-by-case basis.

Eh? Did you mean to put your comment here? It doesn't appear related to what's gone before.

Let me see if I can flesh this out a bit. Suppose a system uses an integer ID (key). The system is a bit old and unfortunately uses domain info embedded in the ID, such as the first 3 digits representing the originating office location. If there is a mess-up in the location, then one may have to edit the key in a string-like way. Thus, "An INTEGER will always be edited as an INTEGER" may not always hold.

In Rel, I simply create a new ID_INTEGER type, inherited from the built-in INTEGER, to represent your specialised INTEGER.

Would it break any existing code that assumed it was Integer?

No, because by virtue of inheritance, it is an INTEGER.

At the app side or database side?

Both.

From a Statistics perspective we an see that not all the statistic operations can be applied to all of them. So, this classification may also be relevant to decide which statistical operators can (or can not be) applied to a particular "numerical" database field. See LevelOfMeasurement

RE: Your "type" is morphing into a mini database, but without out-of-box database abilities and tools (GreencoddsTenthRuleOfProgramming)

I can appreciate that concern, TopMind. I'd really like to out-of-box support for "types" as a mini-databases. For example: relations as FirstClass column values, useful for describing complex graph values. ExtendedSetTheory seems to aim in this direction, and has repeatedly been a topic of interest to the author of RelProject.

[Indeed. The author of the RelProject is exploring ExtendedSetTheory as part of his PhD work.]

When you talk about "mini-databases", then it begins to sound like hard-nesting (database inside a database). Encapsulation of "types" tends to go against the database view: that borders, interfaces, and restrictions are data or meta-data themselves, not hard-wired into the design of parts. I agree that hard-boundaries may improve compile-time checking of models and code, but perhaps at the expense of flexibility. It's an age-old trade-off debate. (RelationalBreaksEncapsulation) -t

[Your claim that "encapsulation of 'types' tends to go against the database view ..." appears to be uniquely your own opinion. I've found no equivalent in the literature. Indeed, DateAndDarwensTypeSystem from TheThirdManifesto is in opposition to this, and Codd's writings made it clear that attribute values could be of any type. Use of types in databases is partly about compile-time checking, but mainly about supporting OnceAndOnlyOnce when defining and using typeful behaviour. Appropriate type support makes it possible to manipulate complex types like TEMPERATURE, HORSE_RACE_RESULT, and GEOGRAPHICAL_COORDINATE as easily as INTEGER and CHAR.]

Using "thin" entities as custom types can lead to almost the same thing; it’s just a matter of perspective and syntactic conveniences. However, either you have encapsulation or don’t have encapsulation. I don’t see any middle ground. If I can query the sub-elements with the same language/system that I query traditional entities, then it’s not a true ADT. This conflict has never been resolved. (Didn’t we have this debate regarding x-ray-able "stacks"?) -t

You confuse (or perhaps conflate) Encapsulation with InformationHiding. In an OOP context, the two are often conflated, so the error is understandable. But nobody else in this discussion has suggested anything that strongly implies AbstractDataType or InformationHiding.
In practice they are often intertwined. But below I chose to consider it from an defined-by-operator versus defined-by-value-structure perspective. Hopefully that will simplify the discussion.
If it is 'defined-by-operator' then sharing any equivalent history of operations is sufficient for sharing information. That seems simple enough to me.

[The internal representation of a type can be anything. There may well be particular type values that benefit from being represented in relations, sets, arrays, and so forth. However, that is not relevant. The important thing is a type's operations. A type is defined in terms of its operations, not its internal representation. The internal representation is for the convenience of the implementor and to achieve certain performance minima vis a vis the operations; it is of no consequence to the type user.]

But you are risking paralysis-via-label here. We may initially call it a "type" and define it by behavior, but later decide we need to do database operations on the "data" aspects of types and start treating some of them like an entity. Usage and needs change over time, at least in my domain. To keep info flexible, it should be easy to make it accessible to the regular query system if and when such a need comes along. Even if we don't end up with a domain need, query-ability makes a handy debugging tool. A behavior-centric view of info inherently conflicts with a value-centric view. Your type approach is creating a behavior-value-impedance-mismatch, such as conversion DiscontinuitySpikes when we decide we want to x-ray or change the guts of "types". And I haven't seen that the value-centric view inherently creates OnceAndOnlyOnce violations, as you appeared to claim. Just store the "operation" info at one spot and reference it instead of copy.

[As I asked before, your domain is essentially ERP, isn't it? The specific set of canonical database types we're used to owe much to the traditional prevalance of database systems and the data type values they needed to capture in the ERP domain. Thus, the push for invariant, complex, user-defined types is probably weaker there than in any other domain. The canonical set appear to be sufficient for the vast majority of ERP purposes because, in fact, they were intended to be sufficient for all ERP purposes.]

[As for OnceAndOnlyOnce violations, we've shown elsewhere that even for a simple "complex" type like a GEOGRAPHICAL_COORDINATE, without appropriate user-defined type support, you're forced either to create procedures that make assumptions about schema structure, or you must repeat GEOGRAPHICAL_COORDINATE operations in every query that manipulates GEOGRAPHICAL_COORDINATEs.]

Making assumptions about the schema is an acceptable trade-off in my opinion to obtain typish-to-DBish swappable viewpoints. As usual, I'll place flexibility over "protection", at least in my domain. I know this rubs against your compiler-centric approach to everything. -t

[If you feel a need for "typeish-to-DBish swappable viewpoints" -- which I take to mean support for multiple user-view presentations of a value's internal representation, something that is already supported in some type systems -- there is nothing that precludes providing POSSREPs, on a type-by-type basis, to support them. However, I'd like to see a case that justifies representing, say, a GEOGRAPHICAL_COORDINATE as a table. I suppose a character string type could represent a string as an ordered table of characters, but outside of being a technical curiosity and (perhaps) an interesting academic exercise, I don't see any practical benefit to providing such a view, nor can I see how using relational operators on such a representation would be superior to the canonical string manipulation operators.]

Geographical info can get quite complex. For example, the "state plane" system is generally considered more accurate than lat/long, partly because it's semi-independent from tectonic drift. However, it requires more info to encode, such as the "plane ID". But this is kind of getting away from the point. Functions or operators can "wrap" the guts of data implementation regardless whether it's multiple columns, encoded strings, or sets of tables. The trick is integrating this with the database engine, query language, and app languages. Custom types are generally more difficult to share across such boundaries that base types. The Rel "solution" appears to be a GodLanguage to reduce the need to share, at the expense of tool mix-and-match.

Indeed, representing any given complex value from a description of possible values is precisely what type systems do well, and they do it (in part) by doing exactly what you wrote: Using functions or operators to 'wrap' the guts of implementation.

The intent of Rel is not to reduce the need to share. The intent is to make sharing easier. Rather than making sharing easier by restricting what types may be shared, I seek to make sharing easier by making it easy to share complex types.

Only if the recipient can digest it. If you stick with the "base" types, then it's easier to transfer.

[At the expense of increased complexity everywhere else... Remember that so-called "base" types are only easier to transfer because they're so ubiquitous. They're ubiquitous because of their popularity in the ERP domain, which is still the primary and (almost exclusive, relatively-speaking) domain where relational DBMSes are used. Spatial types are now becoming increasingly common due to their use in other domains. It's only a matter of time before more types, from other domains, become equally common along with mechanisms to transparently transfer complex user-defined types. The need to use and transfer complex types in a variety of non-ERP domains -- without any undue rigamarole to represent them in databases and applications -- will inevitably drive this.]

[I don't know what you mean by "rubs against [my] compiler-centric approach to everything". Was that intended to be an insult? It otherwise conveys no information.]

You prefer that a compiler or compiler-like device checks everything "up-front", such as missing references/links, incompatible formats/types, etc. The more dynamic something is, the more difficult this is beyond a "warning machine".

[I prefer to avoid errors wherever possible and as early as possible, without losing capability. In order to achieve it, I will accept the occasional need for an extra keyword or two.]

When it's a 30% or more "formality tax", it can hinder productivity when working with and reading code.

[Do you have a reference to support the 30% figure or a definition of 'formality tax'? I find no mention of it in the literature.]

That's just a hypothetical example. I don't know the actual figure, for it probably depends on design style. It is quite possible to make "bureaucratic" code. There may be a place for bureaucratic code, but it's not everywhere.

RE: "A behavior-centric view of info inherently conflicts with a value-centric view." -t

That is an artifact or flaw common to mainstream programming systems, not a general truth. Behavior-centric view and value-centric view come together very elegantly in TermRewriting systems. But I'll grant there often seems to be a pick two: behavior-view, value-view, or information hiding issue in the distributed scenario. I've been pointed to work by Joseph Goguen that indicates otherwise, but have yet to take opportunity to read and grok it. If we can get all three properties without significant compromise, I would love to see such features brought into mainstream programming.

What about a work-able way without a complete Armageddon language and tool overhaul. Any new database is going to have to work well with existing apps, databases, and transfer conventions to be more than a narrow niche.

There are many strategies to successful long-term adoption that do not require "complete Armageddon and tool overhaul". A new database or language platform will only need to work competitively well for new apps or services - those that are developed to leverage it. Whether one must work 'well' with "existing apps, databases, and transfer conventions" should not be taken as a granted assumption. Sabotaging the quality of certain aspects of adaptor elements is a not uncommon business strategy to encourage VendorLockIn, or to tweak those benchmarks relative to the competing transfer conventions... which is to say, working well with "existing apps, databases, and transfer conventions" may actually turn out to be counter-productive to one's long term evolutionary success. Where necessary, an adaptor element (intermediate service, protocol, or plugin) of generally questionable quality will be GoodEnough.

If you are in a domain where there's a lot of sharing, then writing translators/adapters can become the bottleneck.

I'm not seeing how that's a problem in practice. It is the standardization that is hard. If you are creating a new entry in a domain with a lot of sharing, then one should assume a few standards for sharing already exist. By nature, these standards will necessarily support a 'lowest common denominator' of the previous technologies. It is not difficult to adapt to a 'lowest common denominator' for a few common standards, and it doesn't pay to spend much more effort than that. Perhaps you should think about the perspective of the platform developer: adaptation-layer standards are often your enemy. They, by nature, turn your platform into a 'commodity' (something easily replaced). You must support standards in order to get people to start using your competing product. But there is such a thing as supporting them too well, such that nobody bothers to leverage the features and qualities by which you intended to market your platform. If you wish to compete for 'best implementation of a standard' (e.g. fastest correct HtmlDomJsCss), then go ahead and give it a best shot. But if your goal is to advance a new standard, to get people to try something new, then high-quality adaptor layers may prove counter-productive to your goals. In that case, it is easier - and wiser - to support a subset of the competing standard. If you want people to integrate your system's features and tie themselves to your platform, then perhaps extend the standard a bit.

If you play too nicely with your competitors, they'll bury you.

[I guess they should never have invented C because COBOL was just fine. Etc.]

They both suck in different ways.

Please don't quibble just for the sake of quibbling. Either address the point, or don't bother posting.

I'm just addressing a dumb analogy with a dumb reply. GIGO. I'd rather use COBOL to sort and merge a deck of IBM cards than C, by the way. As a DomainSpecificLanguage, it did it's job.

You're still quibbling, and you know it. At least you admit it's a "dumb reply", though you apparently didn't recognise the essence of the analogy and focused purely on its literal substance. If you wish to debate the relative merits of C vs COBOL, please take it to some other page.

I assume the (poor) analogy has something to do with evolutionary change versus revolutionary change, but beyond that, I don't trust my guessing skills.

You've almost got it! I award you a 'B', but you should have more confidence in yourself and try harder.

Further, it doesn't hurt to pursue both and see what the market picks. AMD's approach to 64 bits was more evolutionary than Intel's, and it appears that's what buyers preferred. There is some potential overlap between types and entities.

Pursuing different approaches is fine. I shall pursue one, you may pursue another. There is, however, no overlap between types and entities. That is the FirstGreatBlunder.

One can force an overlap, for good or bad.

Generally bad. See FirstGreatBlunder.

It's short on specific scenarios to explore its alleged evilness and alternatives.

The FirstGreatBlunder, or rather the correct vs incorrect equation it identifies, was derived logically rather than empirically.

But it's based on the assumption of "hard" classifications, such as "x IS-A type". It's possible to have "type-like" things without having to buy into the whole behavioral-only-interface thing, for example.

{TopMind, you speak in ignorance. Neither 'hard classifications' nor 'behavioral-only interface' were assumed in Date's argument regarding his "relvars are not domains" argument - at least not in my 3rd edition copy of his text.}

Sorry, I don't see the relevance. Indeed, I don't see the relevance of this threadlet. It seems to be quibbling for the sake of quibbling.

You are quibbling about my quibbling over quibbling? Anyhow, I was hoping somebody would provide an example/scenario demonstrating the "generally bad" comment above that we could bounce around instead of talk about generalities; but it looks like it ain't gonna happen.

*sigh* A common illustration is simply the result of any query employing JOIN and/or projection. If relations are equivalent to classes, then classes can be formed via JOIN or subdivided via projection. They can't, so a relation/class equivalence is clearly incorrect.

I see no technical limitation to joining collections of object instances to get new objects (or views of objects). However, this may end up LaynesLawing on the barbed wire fence of OO's def.

{If I understand you right, you are assuming that object interfaces (classes) are equivalent to 'collections object instances'. I do not understand how that assumption is justifiable.}
Without a clear-cut definition of "class", we probably shouldn't proceed further. And "are equivalent to", and "can be seen as roughly equivalent" are not the same thing. Being a relativist, I'm not "hard" classifying anything here. (I fixed my "of" typo.) -t
{A "class" is a block of text or object that describes other objects. In any case, I do not see how you can even justify "can be seen as roughly equivalent". There is more than a perceptual difference between describing the concept 'horse' and working with a collection of horses.}
There appears to be some communication problems here, but I am not sure where. How about an example draft of "things" you want to join so that we are dealing with hands-on specifics instead of words.
{This isn't about what I "want to join", TopMind. I am trying to understand something you stated. You say "no technical limitation to joining collections of object instances" as a response to an argument that "classes" cannot be joined. Now, there are two cases: (a) Your response was logically irrelevant (which is quite possible), or (b) You have a hidden assumption that "collections of object instances" are the same as "classes". I do not understand how you justify that hidden assumption. So, please, provide a justification.}
I was trying to apply something YOU said about not being able to join objects. (The difference between "class" and "object" is a language-specific thing.) If you allege X cannot do Y, then please show an example/scenario of an X that cannot do Y. Why is that asking too much? Your scenariophobia is showing again.
{Sigh. Even if I had said something like that (which I haven't), I do not see how pointing it out is relevant to justifying your assumption that class = collection. Your tendency to seek scenarios when they are not relevant is a problem in your brain, not mine; if you feel a need to practice your sophistry, consider joining the class of integer stacks to the class of geographic points (lat-lon-alt triples). What is the result? And what does it have to do with class = collection?}
I meant something meaningful domain-wise, not random associations like "banana unicorn".
{I can relationally join 'collections of objects' without any meaningful association: I get a Cartesian product. And if I wanted to join a collection of bananas and a collection of unicorns, I could say: "give me the collection of banana-unicorn pairs such that the unicorn's horn is longer than the banana". What I do not understand is how to join the class of bananas to the class of unicorns, or how any of it is related to "collections of object instances".}
And I didn't say "class==collection". What it best matches to depends on many things, including programming language. You see, you tend to use the formal academic definitions, which may indeed put "hard" limits on things.
{You said "I see no technical limitation to joining collections of object instances to get new objects (or views of objects)" in response to a discussion about "classes" (which does not seem to be a discussion about "collections of object instances"). So, you were either presenting a RedHerring or you meant your audience to assume that "class = collection of object instances". Anyhow, actually read the description I earlier provided for 'class': a block of text or object that describes other objects. That is an operational definition, and you shouldn't have any trouble using it or applying it across programming languages.}
- One view is that classes are similar to schemas. "Joining" a schema would generally mean calculating what the result schema would look like. A RDBMS generally has to perform this step at some stage before delivering the results. -t
- {Your contextual redefinition of "joining" is called "EquivocationFallacy" in a discussion such as this one. But I'll agree that is 'one view'. It is also the view that Date calls FirstGreatBlunder. I still do not see a justification of your "class = collection of object instances" assumption.}
- If this is only about vocab, then I'll bail out. I want to focus on doing, not talking. If there is a practical "blunder", then please provide a specific scenario that demonstrates it.
- {Just because you want to play HumptyDumpty doesn't make for a "vocab" issue. The only "blunder" I've asked about is your "class = collection of object instances" assumption.}
- "Class" is not a formal concept. That's not my fault.
- {Justification doesn't need to be formal either. It just needs to be.}
But I am not afraid to "tweak" around the formality to potentially make conversions and/or sharing easier, as our sets-vs-bags fight shows.
{I would much rather focus on what is justifiable rather than HandWaving potential. The benefits of sets are justifiable. Most of your arguments for the 'benefits' of bags amount to shooting yourself in the foot.}
So we are back to the usual purity-versus-practical debate again. -t
{You have yet to provide any justifiable arguments towards "practicality", TopMind. IBM and Oracle chose to implement the standard that became prevalent - not due to superiority, but due to historical accident and inertia.}
- Other vendors didn't have to follow IBM and Oracle, yet still have had SQL, or something very very close to it.
- {There are plenty of justifiable arguments for supporting a standard (even a faulty one) such as SQL. What is missing is a justifiable argument for SQL to support bags. The justifications they used in the past all fell through, and now there is no more reason than vestigial legacy support (this is your grandfather's bag...). Similarly, we lack justifications for new non-SQL RDBMS's to support bags.}
- Please clarify. It's perfectly possible to create a bag-free RDBMS that still uses SQL, but with some minor adjustments to that query language. It would be relatively easy for them to say "no" to bags. -t
- {As to why IBM and Oracle aren't targets of my insults: they don't make unjustified or unreasonable claims that bags are needed or especially useful. They support legacy, which is justifiable. As to 'SQL without bags' (i.e. defaulting the full row as a candidate key, always using SELECT UNIQUE) - yes, one could use SQL in that manner, but whether it is a minor adjustment really depends on whether you are measuring the change in syntax vs. the change in semantics.}
- And no, I am NOT talking about IBM and Oracle specifically there. And I am also talking about a bag-free RDBMS, not just SQL. This would reduce the burden of SQL itself to avoid/remove bags. Please reread. I added some emphasis.
- {It is not easy to build a bag-free RDBMS because it is not easy to build any DBMS. But as far as making a bag-free RDBMS, that has been done: RelProject is an example. One could create "SQL without bags" (i.e. so that SELECT is always treated as SELECT UNIQUE and similar tweaks), but there are plenty of other SqlFlaws to fix, and if you're going to fix any of them, it would be silly to stop with 'bags' just to prove a point to TopMind. As far as relational purity, scalability, and optimizations go, 'ORDER BY' as part of SELECT is even worse than bags (though certain variations of GROUP BY and 'top N' are okay). By the time all the various fixes are applied, the result isn't SQL anymore and thus needs a new name. So, the idea that it is practical to just go for a bag-free SQL RDBMS is truly myopic. Either accept the SqlFlaws and use SQL (which potentially allows you to compete as a swap-and-drop replacement), or don't and do not (and target new projects instead).}
- I doubt that the work world will want most of your "fixes" anyhow. Non-bag could possibly take hold, but others such as no nulls and no order-by is a hard sell. Query language output is as much for human eyeballs as it is for computers and apps. Thus, some basic "formatting" features are useful in a query language. Perhaps the difference is that I want a collection-handling query language and you want a relational-only query language. Features that you call "formatting", such as sorting and leaving out key columns, could be considered typical collection-handling activities. To me, "formatting" is such activities as converting to CSV or putting pretty boxes around cells.
- [There are a broad spectrum of views regarding DBMS reform, ranging from "SQL is perfect, don't touch it!" at one end of the scale to "SQL and its Relational underpinnings must be discarded!" at the other, with every permutation and degree of language/paradigm replacement or retention in between, including the perennial "nulls are evil" vs "nulls are necessary" debate. Which view(s) will eventually prevail? Unknowable. Realistically, it is probably the least predictable aspect of future computing technology.]
- {I would rather query language just be part of a bigger language (with distribution, security, and persistence properties) rather than something considered separate. I do want support for vectors/lists where appropriate, but I see no reason to introduce the complexity or cost of permutation (order) until absolutely necessary. And, personally, I'm more interested in LogicProgramming based query languages than in RelationalModel, though the two are related (e.g. in DataLog).}
- I don't put much stock in a single GodLanguage. Even if that was the "right" way forward, the world is not ready for that. The same goes for logic-centric query languages. SQL has been quite successful in a multi-tool/language world in terms of usage, and including the feature of basic sorting is part of that success. It satisfies the age-old basics of data processing that go back to punched-card processing machines: filtering (rows and columns), cross-referencing (joins), and sorting.
- [I remember when the pundits didn't put much stock in CeeLanguage, even if it was the "right" way forward, because they thought the world wasn't ready for it. The same went for "structured programming". Note that COBOL and FORTRAN have been quite successful in terms of usage, and arguably were much more ubiquitous than SQL back in their day.]
- CeeLanguage wasn't a leap forward in terms of new concepts. It was just a tight, efficient, and practical packaging of existing concepts.
- [So? SQL is notable in being a language almost universally disliked, even by members of the ANSI SQL Standards Committee. Its "success" has been entirely driven by corporate behaviour and accidental happenstance, not developer desire. Its end is inevitable and nigh.]
- I did not say it didn't have flaws. I'd happily replace it with SmeQl if I could. But it had enough good points to satisfy a need. And I disagree that it's pure happenstance. It was "good enough" to satisfy a gap in standards-ville. One could argue that it's a case of a one-eyed man among the blind. And if it is replaced, it will probably be by something that can "plug in" to existing RDBMS. Your approach doesn't appear well-suited for that because it wishes to introduce restrictions for "purity" reasons. Our goals are pretty different.
- [Perhaps, but as long as my approach is getting downloaded and used whilst yours is still imaginary, I win.]
- Or find a nice little niche among those who think like you. Please don't invite me to that party.
I disagree, per "bag" topics. But we'll LetTheReaderDecide rather than reinvent the fight here.

Joining collections of object instances to get new collections (which is what I presume you meant) of object instances which is fine. The notion of joining two classes to create... Something, but we know not what... ...Is what the FirstGreatBlunder deprecates.

Classes are not required for OOP. Clone-based OO languages follow a biology-like approach and clone to "inherit". Another approach is to "point" to the parent object to "inherit". Classes as we know them are mostly an artifact of compiler-centric or static languages. -t

What does that have to do with the FirstGreatBlunder?

{Sure, we can reify object classes and table schema and such. (Ruby, Smalltalk, Newspeak, SQL's DML or DataDictionary, etc. all give credence to that.) How is that relevant?}

See NullVersusNone

MayTen