This is to explore the implications of making a "bagless" SQL, one in which each processed "row" has a well-defined primary key-set.
Possible changes to SQL:
- Won't accept "FROM" bag tables (real or virtual). It would throw an error if it cannot determine primary or candidate key.
- SELECT statement must emit a result set that has a known candidate key. An error is thrown if omitted.
- No such thing as UNION ALL, and UNION statements will make sure candidate keys are compatible between tables.
- For trouble-shooting purposes, a way to determine the candidate key of a given table should be provided. Perhaps a keyName() function.
I would suggest substituting "primary key" with "candidate key".
What would be the motivation to produce BaglessSql without fixing the other SqlFlaws?
This just explores one issue. If you want to explore other ones, be my guest.
EssentialComplexity often arises from constraints or FeatureInteraction. For example, to travel is to get from point A to point B and live through the ordeal. If you "just explore one issue" - getting from point A to point B, without the survival constraint - many of your efforts will amount to naught more than MentalMasturbation (we could fire you from a cannon, for example). If MentalMasturbation is your goal, I won't stop you, but I also will not see this page as a reasonable reference for a point in an argument.
True, FeatureInteraction is an important issue. However, one must walk before they run. Often it's useful to first explore issues in isolation and then move on to interactions.
We are already walking.
Can you name another issue that intersects with the bag issue?
Sure. SQL's order-by semantics and NULL handling are both issues that intersect with bagginess.
Wait a second, you are not suggesting we get rid of Order By, are you?
I suggest moving Order By into a separate statement, perhaps part of export. Relations, after all, are not ordered.
Query languages are as much for humans as computers. Anyhow, this will probably turn into the same kind of utility-versus-purity debate as found in the SELECT-without-key debate in HowOtherQueryLanguagesAddressSqlFlaws and BagAtational. Thus, I won't bother to repeat those arguments here other than say the removal of ordered output (or an extra step) will probably be noticed more by users than Selects requiring candidate keys and be a potential sore spot with them. -t
Moving the responsibility to a different keyword is not "removal". Nor is it an "extra step", since the total number of steps is, at worst, the same as in SQL (assuming you don't bother improving SQL's abstraction features while you're tweaking everything else). I'm certain your arguments here would be just as specious as they were in BagAtational.
- I meant coding steps, or perhaps "layers", not under-the-hood steps.
- So did I.
- Why not view ORDER BY as an "export" clause rather than as "part" of relational operations?
- The question is whether you syntactically can express JOIN, UNION, etc. of SELECT expressions that include the ORDER BY clauses - and what it means if you do so. Moving ORDER BY to an explicit EXPORT keyword, or otherwise restricting to non-relation types, would avoid that issue.
- The SQL dialects I'm familiar with only let you put ORDER BY on the final (outer) statement. And if it didn't, I suppose it could merely ignore any ORDER BY. (In SmeQl, one can use the ordering operation to generate an explicit sequence number column, which can be quite useful for further computations.)
- That is reasonable, then, including on SmeQl's part. A bit confusing, though, to have a bunch of special rules for 'outer-most' SELECT vs. inner SELECT. Either way, programmers must think about two steps: generating a result set, then formatting it, so (other than legacy) it would seem reasonable to simplify documentation and avoid confusion by using a separate EXPORT statement for the formatting elements.
- Because EXPORT will be say 90% just like a regular SELECT (or its equivalent). That's not enough to justify a different name in my opinion. It's almost like have a PRINT-COLOR command separate from PRINT-BLACKANDWHITE command. Variations of what's allowed in different circumstances is just a reality of life. Related: AttributesInNameSmell. And it may make people think of saving to files instead of merely viewing results. I'd rather live with the "special rules". FORMAT may be a better name, but still has similar problems.
- I'm not seeing why EXPORT would be 90% just like SELECT. You seem to be assuming that one must re-implement the entire SELECT sub-language inside EXPORT. Realistically, however, the responsibilities would be separated: one might use SELECT to get the results, then EXPORT (or FORMAT) to arrange them into XML/CSV/etc. In that case, EXPORT doesn't need to support any of the JOIN/UNION/etc. stuff, and can work directly on tables or named relvars or the results of SELECT. If you want some syntactic sugar that combines EXPORT with SELECT, that could be achieved easily enough, but that does not mean EXPORT is 90% like SELECT.
- The SELECT clause at least selects which columns to include/exclude and gives them a display ordering (left to right). I don't see any reason to repeat that info in an outer layer most of the time. If the output syntax-layer/function "inherits" the final SELECT clause's column info so that we don't have to repeat it, then it's pretty close to SQL anyhow.
- [This debate is reflective of just how dire SQL is, with its ill-conceived conflation of presentation, evaluation and manipulation. By trying to do all three at once, it does none of them well. Not only that, but it's hard to teach, hard to learn, unpredictable, inelegant, verbose where it needn't be (e.g., 'SELECT * FROM x' where 'x' would do) and too terse where it could use more verbosity (column-order dependence, for example). De-bagging SQL would fix one tiny aspect of it, but pointlessly, like painting the trim in a condemned building. Better we should focus on SQL's replacements than waste further mental effort on it.]
- But before we replace it, we should explore fixes first. It's too common and successful a standard (usage-wise) to not explore such an option. Ideally I'll take SmeQl, but I know people will want to try to fix SQL first.
- [What do you expect to achieve by such an exploration? What makes you think "people will want to try to fix SQL first"? Do you actually think members of the ANSI SQL Standards Committee are going to read this page and make changes based on it?]
- If competitors to SQL start to steal its thunder, then "fixing" SQL does indeed become a goal. Otherwise, they may be without a job.
- [You didn't answer any of my three questions.]
- Let's try again.
- Re: "What do you expect to achieve by such an exploration?" - To see if SQL is sufficiently "fixable", or at least can take on the "relational purity" properties that some find so important.
- Re: "What makes you think "people will want to try to fix SQL first"?" - As already stated, because it's an established standard and because the SQL committee wants to keep its job.
- Re: "Do you actually think members of the ANSI SQL Standards Committee are going to read this page and make changes based on it?" - I don't know and nothing I've said depends on it happening. I don't see it relevant.
- If growing competition lights a fire under their arses, they may start asking many of the same questions that we are here regardless of whether they read this page or not. I don't expect them to stand still. After all, COBOL added OOP.
- [Be that as it may, I've no intention of wasting a moment on some "fixed" SQL. Note that C/C++/C#/Java have now largely replaced COBOL.]
- Java perhaps, although Oracle may be changing that with lawsuits. Not so much the others because they are hardware centric or considered too proprietary.
- [Huh? Oracle owns Java.]
- Oracle is making Java look less and less open through its lawsuits and past behavior. COBOL is still in heavy use in part because it's harder for any one company to dick around with it.
- [FacePalm]
- Whatever turns you on. Note that I am not arguing that COBOL is great, only that it did incorporate the in-style idioms of the day and is still alive and kicking, and it's thus reasonable to expect SQL to do the same (if anti-baggism became the rage). Like COBOL, SQL is an entrenched mission-critical back-room corporate language. A competitor will not kill SQL JUST because it's bag-free. It will have to have other features that others find compelling.
- The FROM statement is useful when there's a long SELECT list. It makes it easier to find the end of the column spec list in a crowd. Brackets are an alternative, but one could argue that's just changing one kind of verbosity to another. I'd chalk that up to subjective preference.
So let's move on to nulls. How will nulls affect de-bagging SQL? -t
The question is of "null handling", in particular whether NULL == NULL is true, false, or something else, especially across joins. It is only a problem if NULL is allowed to show up inside a candidate key.
See DatabaseDomainsForNumbers, BagNeedScenarios