Any sufficiently complicated SQL database query contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of a Domain None on top of SqlNull. -- GreenLuxSpesTop? Tenth Rule Of Database Queries
None means Nothing, Nothing is a concept that describes the absence of anything at all. Nothing is sometimes confused with Null, but they are very different concepts because Nothing means absence of anything, while Null means unknown (you do not know if there is a thing or not).
For Nothing, the normal TwoValuedLogic applies (Nothing=Nothing : true, Nothing = Something : false), for Null, ThreeValuedLogic is necessary. Unfortunately, these 2 concepts have been used interchangeably without much thought, to point where the most common use for Null in relational databases is to mean Nothing (even when Null was designed to represent unknown by Codd). This confusion is aggravated by the fact that many mainstream application languages (Java, CSharp, Cee, etc) use the null keyword to mean uninitialized variable which easily maps to the interpretation that null means the variable is pointing to "nothing" (no object).
But for databases, Null was not invented to represent nothingness, was invented to represent that the value of something was not known (maybe be something, maybe nothing, we just do not know). See SqlNull for more on the origin of Null for databases.
Now that ChrisDate wants Null to be removed from RelationalDatabases, so that the incongruence and confusion brought in by ThreeValuedLogic is eliminated, the developers, accustomed to use Null to represent Nothing, resist to the idea asking: How am I going to represent the fact that a Person is not married? I use to do that by marking the MarriageDate? as a nullable Date. Now what? I need to split the table into 2 tables just to represent the fact that the MarriageDate? is not mandatory? That of course seems like the obvious, elegant (if extremely cumbersome answer). But the practical developer refuses to get into that trouble, it is just too much effort, it is simple easier to continue using Null. But... what about Nothing? why not just simply add "Nothing" as a possible value for to the Date domain? That way it is possible to say that the Person has no MarriageDate?, and still stay inside the realm of TwoValuedLogic.
Other Domains:
- The Number Domains may or may not need it (Zero seems enough...)
- Numbers are an interesting case, because their meaning may change if they are used to represent NominalData? or OrdinalData?, and in those cases numbers have different meanings... should we have such distinctions as domains in relational databases?. See DatabaseDomainsForNumbers
- The String Domain does not need it, because it already has support for it (the Empty string).
How about others?
Is this solution, in any way in conflict with TheThirdManifesto?
Real-world reports have Nulls, or at least "empty cells" and that's the way the customer wants them and will fire you if you don't deliver or spend too much time working around that gap.
Mmmm, and will they fire you when the queries using ThreeValuedLogic end up giving your customer the wrong information? (Also, most users I have met do not like empty cells in reports, because it is hard to tell if the value in there is unknown to the system, known but empty, or just a report error, they prefer some standard way of saying "I don't know what goes in here" or "there is no value for this" like the String "N/A"). -- LuxSpes
- "N/A" is often a kind of Null in itself. It's hard to mark/track them without something similar to a Null. For example, consider a cross-tab query/report in which there are not instances for every cell. Regardless of whether you put "N/A" or blanks in those cells, some kind of value or marker has to indicate them.
- ["N/A" and other values within a domain (type) are fine. They require no more than boolean logic. NULLs require ThreeValuedLogic.]
- We'd probably have to look at an example, but if you have to make a decision based on 3 different string constants, for example, then you are essentially reinventing ThreeValuedLogic under a different guise. I've used type-lite languages that lacked a true Null, and working with strings as replacements for Null pretty much leads to most of the same kind of issues. (Although, I do wish SQL had better Null-handling conventions and operations. For example, there should be no such thing as a null string. Numbers and dates, okay, but not strings.)
- ["3 different string constants" in a domain is not "essentially reinventing ThreeValuedLogic". Note that null = null evaluates to null, whereas 'N/A' = 'N/A' evaluates to 'true'.]
- That's a specific kind of ThreeValuedLogic.
- [How so? Can you show truth tables for it?]
- They are usually situation-specific. We seem to be miscommunicating here and it may require more scenarios/examples to straiten it out. Think back to the cross-tab table scenario above in which not all cells have corresponding values. What would REL return and how does it go about it?
- [Can you show any truth tables for any conceivable situation-specific arrangement of your "three-valued logic" arising from 3 different string constants? In TutorialDee (note that Rel {not "REL", by the way, REL is a company that manufactures speakers} is an implementation of TutorialDee, like Oracle Database is an implementation of SQL), "missing" values are specified explicitly. For example, there is a proposed left join operator whose syntax requires explicit specification of a "missing" value for each attribute that could be missing.]
- That's merely an indirect Null in which one can control the value or replacement for Null. And I'm not sure it's better than Nulls. Again, you are not really escaping Nulls, but dancing around them via something potentially uglier than Nulls themselves. I agree that specifying a value default when none is available is indeed a nice feature. But whether it can replace Null is yet to be seen. If the result is a numerical type, what does the app software receive as a substitution, for example? A string? That may not work well in the app, especially a non-scriptish language. At least Null is something recognized by many app languages; a convention that is shared and understood. - t
- In fact, it is not a convention that is shared and understood, specially not for most app languages, for example, in most app languages null = null is true, but in SQL null = null is unknown! In most app languages, null is used to indicate that a variable has not been initialized, in SQL it's supposed to mean Unknown value, most app language have not standard way to represent the idea of an "unknown"... or maybe you can provide me with the name of a mainstream language that deals with null in a way consistent with the way SQL does? -- LuxSpes
- That's mostly a syntactical issue, since one uses "is null" in most RDBMS as the equivalent. And even if I agreed with you, an imperfect translation is better than none, or a Rube Goldberg-like work-around. You seem to be arguing that "since X is imperfect, X is useless". First let's see the alternative.
- No what I argue is that is: X (SQL) is imperfect, and Y is imperfect (TutorialDee) but since Y solves everything that X solves without introducing ambiguities, then to prefer X is not wise. In this case, with TutorialDee domains, the "Date" type could include a value to represent "No Date", which would keep us from falling in the trap of ThreeValuedLogic. This "No Date" would map to the null of app languages in a more accurate way: No Date = No Date (as null = null in Java, CSharp, C++, etc, etc). See? There are just not valid arguments for continuing to use nulls.
- I see where this is leading to: complex custom types (DomainValues??) that are somehow shared between tool/languages. Sure, if you want to invent a sharable inter-tool type system, then such problems "magically go away". I don't think it's worth the complexity. It's an anal, time-consuming, and verbose way to do programming. We are right back to our usual GodLanguage debate. I should have known. - t
- No, you are wrong, no sharable inter-tool type system, java.util.Date in Java would be used in the Java side (no need for a new type there), and Date type for TutorialDee would be used. It just so happens that in the TutorilDee? side we need to remember that it is important to represent the fact that there might be "no date" or "date none" for events without a date (not events with unknown date) so I could implement TutorialDee in C, C#, Lisp, or whatever and plug to it with plain JDBC without modification or extension of any kind.
- It's still not clear to me how it's mapped on the application programming language side.
- Date (none) in Rel maps to null in Java, without the need of any modification on JDBC. There is no simpler way to put it. What else do you need me to write? -- LuxSpes
Prod-ID Product Cancellation Date
----------------------------------------
1234 Fooz Ball (not cancld. yet)
2345 Teeth Cleaner 12/01/2008
- In this example report, even if you use a "skinny table" to store the cancellation date, the result set sent to the display application still needs to indicate that the cancellation date for product 1234 does not exist.
- Great example of how NULLs confuse people, what do you want to mean by using null here? That there is no cancellation date for Fooz Ball? or that you do not know the cancellation date for Fooz Ball? It cannot mean both (because if it does, then you would have to explain me how to distinguish them... Can you? ;-)
- I'll agree that is a limitation of Nulls, but it's not a significant one in my experience. (I have some suggested fixes, but that's another day.)
- It is very significant, you fell in a trap you setup yourself, showing you really did not understand (until right now) what null really means... and your answer (or lack of it) reflects that you have no way around this. It is also significant because this confusion does not happen if you use values within a domain and avoid nulls -- LuxSpes
- I didn't fall in any trap; I just didn't finishing defining what they meant. If you want something concrete, then lack of a cancel date means that the product has not been cancelled yet. I'll rework the example to make that clear.
- All right, so null means no date... now tell me... if null is no date... then what are you going to use for the case when the date is unknown? Also, you have chosen (twice) the wrong path. Null was designed to represent unknown, not "no value"... how do I know that? because if null meant "no value" then in SQL null == null would return true, because the value for the cancellation is known: "none". But no, SQL added null to deal with the case when information was missing that is why null = null is unknown, not true, not false: There might be a cancellation date, but since it is unknown, I can not compare it (I am thinking 2 dates in my head, tell me if one is greater than the other... you can't!)
- I rarely see that as a problem in practice. If the domain or task grows that complex, then usually there are other column(s) introduced, such as a "product status" column to clarify the situation. We don't need to sub-divide null and invent new types. And the distinction between "no value" and "value unknown" is mostly a pointless issue.
- Basically what TutorialDee is offering you here is a chance to have a real "Date(none)" value in your database and you are answering "no, I prefer to artificially pervert SQL null to act like Date(none). Why? no reason, I just like to keep complicating things for myself".
- See below about LaynesLaw on "Null" and orthogonality of language.
- You could just make the application do the join between product info and the cancellation date table, but then you are defeating part of the reason for using DBMSs.
- There is no need to do that, Tutorial Dee can deal with this
- You are making the app do the work, the join so that your precious pure Rel system doesn't have to touch dirty street nulls, instead making the "little people" do it. Rel wants to wear white gloves and be the Zsa Zsa Gabor of RDBMS. - t
- Actually, no. You just don't know this can be done in Rel, and without nulls
- Well, what does it return to the report-drawing application as a result set? Use Java as reference language if you want something more specific. - t
- I am no expert in TutorialDee, but I saw an example of this in on the book Introduction to Database Systems, Eighth Edition... I just do not have a copy on hand right now (found the name of the chapter in Google, I believe it is in 19.5 Outer Join (A Digression)), but let me see if I can remember the syntax (or find it Googling...) -- LuxSpes
- I don't think TutorialDee syntax is the main focus here. For now, I only want to see what Java receives, not how D got it. - t
- I thought TutorialDee syntax and capabilities were the main focus... from the Java side, well, I guess the "Date(none)" introduced for the Date domain would match Java null much better than SQL null: in Java null = null is true (no ThreeValueLogic? in Java), in SQL null = null is unknown (ThreeValueLogic?), in TutorialDee Date(none) = Date(none) is true (no ThreeValueLogic?), you tell me which one translates better to Java null? -- LuxSpes
- What "date domain"? A custom type? See above. And how a RDBMS interacts with other and existing tools is important to any choice I will ever make. Idealistic tools that cannot handle real-world messiness will be ignored. - t
- A custom type? well, maybe a custom type, maybe the default type, if it comes prebuilt with the database is such a big difference? I do not think it will be that hard to convince DaveVoorhis to add support for Date(none) for the built it type in Rel (it will not add ThreeValuedLogic, so there is no reason for not adding it). So that argument is no relevant. Now, on the how a RDBMS interacts with other and existing tools, I have already told you, with plain JDBC without any modification, so I see no objection here, Rel(Date(none)) = Java(null), which by the way, will be a nice thing to have, since the behavior of SQL(null) has never matched Java(null). Again, you seem to be irrationally attached to null even after I have shown you that Date(none) is a much better match for the way you use null. Can you provide me with any rational arguments against Date(none)? Or should I consider that you are just in denial?
- Your "none" is essentially a Null with another name and slightly different comparing rules. Congratulations, you essentially reinvented Null. - t
- Sorry, but you are wrong, Null has ThreeValuedLogic, None does not. My condolences, Null will be killed by TRDBMs and you never even met him (you created a false mental image on what it meant, and enforced that twisted idea with the help of isnull and coalesce, but never, ever, really used null, you always worked with None).
- It may be an "improved" Null, but essentially the same thing. - t
- You only say that, because you don't really know SqlNull (you never did), and therefore, you are unaware that support for ThreeValuedLogic is the essence of null (maybe you should read more about the origin of SqlNull before continuing with this discussion?). You (like many others) perverted it in to an ad-hoc, informally-specified, implementation of None, to the point you lost its essence. And now that a new generation of truly relational databases is trying to remove Null from the landscape you are saying "You can not remove null", "null is essential for databases" when in fact, you never really used null for what it was meant to be used.
- It sounds like this is turning into a definition battle. LaynesLaw? Perhaps we actually are agreeing that Null, or things like Null, are implemented poorly. I'm all for overhauling Null, regardless of what the fix is called. But do we really need to toss out SQL entirely to fix nulls (and perhaps the duplication issue)? Why not fix SQL? It would have a better chance of taking off. If you want to sell Rel, then focus on convenience of syntax and expression because the problems you mention about Nulls and duplications are generally orthogonal to SQL in the larger sense. A kind of SQL++ could fix some of the shortcomings of the current batch. - t
- [''I don't want to sell Rel. It's free and OpenSource. I want to give it away! Note that TutorialDee is, in effect, your "SQL++"...
Discussed Continued at
SqlFlawsDiscussion.
Discussion from A better Null or replacement to Null (NullVersusNone) at HowOtherQueryLanguagesAddressSqlFlaws
.
- You keep driving around and through Nullville and do Nullville-like things and attend Nullville parties and wear Nullville-like clothes and sing Nullville songs, yet keep denouncing Nullville. Rather then reinvent a stand-in for missing values over and over again at the level of each "type", doesn't it seem more logical and better concept, implementation, and support factoring to just create a universal concept? - t
- [SqlNull just doesn't deal with the complexities or represent the RealWorld facts in the database in a way that is explicit enough, and Dates are not the only domain where this happens (as it was demonstrated in the discussion at NullVersusNone). Even if you work only with Numbers SqlNull fails to explicitly represent facts (see DatabaseDomainsForNumbers for examples) it is not that it cannot be used to represent those facts, it is that it makes things unnecessarily harder, and the meaning gets hidden, one needs to "guess" what the null means for each circumstance. Maybe what is needed is page with examples of how SqlNull overcomplicates and obscures stuff for Strings, Dates, Numbers, etc.]
- It is not a universal concept. That's the problem. NULL introduces ThreeValuedLogic (with inconsistencies, in SQL!) and attempts to impose a universal concept on something that is not universal. NULL is semantically void; by definition it carries no meaning. Missing values are not universal. Their representation may be universal, but their semantics are not. Why a value is missing may vary dramatically, but NULL tells us nothing about why the value is missing when databases are all about capturing meaning. Therefore, NULL fails to capture meaning. As such, the above Date type is merely an attempt to adapt to a SQL NULL in the most meaningful way possible. In pure TutorialDee, depending on the requirements, I might create a Date type like this:
TYPE Date UNION;
TYPE ValidDate IS {Date POSSREP {Year INTEGER, Month INTEGER, Day INTEGER}};
TYPE NotExpiredYetDate IS {Date POSSREP {}};
TYPE MissingDate IS {Date POSSREP {Reason CHARACTER}};
.
- Usually a "status" column or something similar is used when such info is needed. It usually works well and doesn't need the invention of new types. (It can be linked to a ConstantTable to ensure validity.) - t
- Your "status" column forces the developer to re-implement the combined "status" & Date semantics every time a Date is used in a table (or elsewhere too, presumably), instead of describing it once in a type definition. Your approach is a violation of OnceAndOnlyOnce.
- The status values are rarely the same from app to app. "Jockey didn't show up" is fine for horse-racing, but not dress catalog sales. Status can get rather complex in some apps. Stuffing all that into a date column or finish time column the like is trying to pass a camel through the eye of a needle. It's force-fitting two concepts into one.
- Note that there's nothing in the type definition which pre-defines the contents of 'Reason' in the MissingDate subtype, so your argument is moot.
- If it's open-ended text, then it's generally equivalent to an open-ended Status column. I don't see how your fact changes the argument much. And, most companies don't like those because managers like status reports on the different statuses over time. If they are free-hand, then consistent measuring is difficult. A free-hand option may be available, but usually discouraged. For example, one of the status codes may be "Other - see note", in which case the "comments" section is expected to describe the situation, and may be required if "Other" is chosen.
- The type definition can be trivially configured to constrain the Reason values to those checked by a function or obtained from a RelVar. The value of building this into a type definition is that it need only be specified once, rather than everywhere your status_column+date_column "type" is used.
- [Again, SqlNull was invented to provide ThreeValuedLogic, and again you (like many others) are perverting it in to a ad-hoc, informally-specified, implementation of None. In this case, you want to use a "status" column as a crutch to make Null limp as if it were None. And it is not a bad idea for a language like SQL that needs such a crutch, but: what is the point of keeping the need for such a crutch when designing a new language?]
- I don't like SQL's version of Null either, but it's for a different reason than needing more "kinds" of nulls. Perhaps it's time to explore some UseCases.
- Here are some Number UseCases that show the value for Domains: DatabaseDomainsForNumbers
See also AddingNoneForTypesInRel, SqlNull, ExBaseRant, GreenspunsTenthRuleOfProgramming
MarchTen