From a discussion on BagAtational...
Let's define a complexity metric that will be used to judge a new feature. Our metric will be based on assigning a count of 1 to each new issue a user needs to learn in order to know the feature. We'll total up the counts to determine the feature's overall complexity.
Call the metric C, such that C(0) means using the feature raises no issues and therefore requires no new learning. C(1) means 1 issue, C(3) means 3 issues, and so on. C(n) is lower complexity than C(n+1).
The new feature, in this case, will be providing a means to emit duplicate rows/tuples in TutorialDee and Rel (see RelProject).
In TutorialDee, the projection or "give me just these columns in the output" operator is represented by {a, b, ... z} where a, b through z are column aka attribute names. See QueryLanguageComparison for some examples. Projection eliminates duplicates.
To make projection emit duplicates, we'll add a hypothetical optional keyword called "DUPLICATES". A projection of {a, b} will remove duplicates but {DUPLICATES a, b} will emit them.
The keyword itself is given a count of 1.
Using a DUPLICATES projection within a subquery is invalid, because relations are, by definition, free of duplicates. So, {DUPLICATES a, b} in a subquery should throw an error.
That's another count of 1.
Using {DUPLICATES a, b} where a and/or b is a key is meaningless, and should at least throw a warning.
Another count of 1, that.
So, modifying projection to support duplicates has a total complexity count of 3. It is complexity C(3).
Now, let's assume we wish to emit duplicates to a file to meet a user's requirements. Rel can use FOR and WRITELN to emit tuples to some external file like this:
FOR myRelvar {a, b} ORDER(ASC a);
WRITELN a || " " || b;
END FOR;
FOR iterates over the tuples in the relation returned by the expression 'myRelvar {a, b}', sorted in ascending order on attribute 'a'. FOR will not iterate over duplicate tuples because the projection {a, b} returns no duplicate tuples.
WRITELN emits the values of a and b for each tuple. I've omitted the file handling bits for the sake of simplicity. It would be the same regardless how we emit duplicates, anyway.
Now let's compare how we output duplicates.
Assume we're using the following RelVar:
VAR myRelvar REAL RELATION {k CHAR, a CHAR, b CHAR} KEY {k};
Assume we wish to emit all values of a and b including duplicates, but not the key k.
One way is this:
FOR myRelvar {DUPLICATES a, b} ORDER(ASC a);
WRITELN a || " " || b;
END FOR;
We're using our modified projection operator, which has a complexity of C(3).
Another way is this:
FOR myRelvar {k, a, b} ORDER(ASC a);
WRITELN a || " " || b;
END FOR;
No special mechanisms are required. We merely do not include k in the WRITELN statement, which is precisely the same WRITELN statement we'd use in the first example!
Total complexity of the second method is, therefore, C(0).
C(0) < C(3). The second approach is less complex than the first.
Therefore, modifying projection to emit duplicates is more complex than not modifying projection to emit duplicates.
There are several flaws in your counting, but I'll focus on the biggest first. Your FOR-ENDFOR should be dinged for many points because it's not declarative; some shops may disable imperative statements because of security risks, optimization risks, and so forth. It's extra complexity. Other query languages can do it within the *query* language (declarative) and don't require imperative statements. Your argument could be reduced to, "Since this ugly thing already exists in the language, we might as well use it to do more ugly things."
And WRITELINE doesn't output to standard tables, but a string. If you output to standard tables, such as ODBC results, then usually there's a way to export to CSV, XML, spreadsheets, etc. Custom strings don't do that unless you manually add delimiters etc. (including dealing with delimiter and quote escaping). It's outside of both "table land" and declarative land, and is therefore butt-ugly.
--top
I think you're not seeing the forest for the trees. Use of FOR is not the issue here. The FOR construct was merely selected as an example, one that implies arbitrary manipulation of "duplicate" tuples, including emitting duplicates. As I've written elsewhere, TutorialDee, and hence Rel, is intended to be a general-purpose imperative programming language with built-in database capabilities. It is more than just a query language. Imperative capabilities are not deprecated, but promoted along with functional aspects like explicit RelationalAlgebra operators. In that respect, it's like ExBase. I could equally have chosen to illustrate this with Rel's EXPORT statement, but as it's still a work in progress (whereas FOR is complete) I chose to demonstrate something that interested readers can execute for themselves.
Shops that disable imperative statements? I presume that means no PL/SQL, no Transact-SQL, and so on? I've not seen it, but I can't say I'm surprised. Such shops will not be happy with TutorialDee or Rel. Fair enough.
Outputting to standard tables, by the way, is done by linking to the external DBMS tables or JDBC datasources as RelVars and simply using them in the same manner as native RelVars. INSERT works on them.
BTW, I believe my counting is accurate. The total of three ones is... Three.
Not needing a stupid loop is worth 20 points. You are pushing the issue onto the app language side, not getting rid of it. That's NOT where I'd prefer it be. It's like saying, "My leaky septic tank killing the grass is not a plumbing issue, it's a gardening issue: get a grass species that is shit-tolerant."
- [{DUPLICATES a,b} within a query was blocked. That's C(1). What TopMind promotes, however, is keeping duplicates within the queries. That means learning how duplicates are handled for every operator. What do duplicates mean on insert? on delete? on union? on intersect? on joins? How do you tell a duplicate entry error from a valid 'new' insert? How many 'points' are these issues worth? In any case, the silly crank seems to be substituting rants for reason.]
- As far as the INSERT situation, if you have primary keys defined, then the DBMS will not allow such. Again, bags are an option, not a universal decree. As far as how it's dealt with, you can study existing RDBMS as working specimens. -t
- [Sure. But that does not eliminate any complexity. Extra choice means extra complexity, unless the choice is meaningless. Of course, adding uniqueness constraints to a data model costs a similar amount whether that data model is implemented is implemented by bags, trees, object graphs, relations, or whatever. (Uniqueness constraints can spend a little complexity in the data model to reduce it in the applications (updates, queries, etc.). If data model is much smaller than application - or is used by many applications - the complexity savings may be multiplied by orders-of-magnitude. WaterbedTheory only applies in a closed system.]
- I believe you are exaggerating the alleged net savings. You've failed to produce a practical demonstration/explanation involving common and realistic scenarios and data. You've made similar exaggerated claims for TypeSafety and heavily-normalized tables.
- [Your 'belief' is acknowledged. I shall not attempt to correct it, as you are never willing to acknowledge or comprehend anything that conflicts with your beliefs. Instead, I shall ask you to evaluate whether your beliefs are consistent. Standardization, for example, is reduction of choice - to better support extensible systems. If your belief about exaggeration is consistent, you must believe complexity in an open system cannot be significantly diminished by standardization. This means, for example, you do not believe that organizing data into a standard format, as per TableOrientedProgramming, will reduce complexity. Similarly, it means that you do not believe that a GuiMachineLanguage will do anyone any good with regards to saving work or complexity. Now, clearly there is a contradiction here relative to your arguments on TOP and GML, so ReductioAdAbsurdum?: one of my premises must be wrong. I suspect the wrong premise is that you are a person who has consistent beliefs. (But I may be wrong. Perhaps you do believe that standards like TOP and GML offer you little more than a hand-waving and unquantifiable advantage in 'psychology'.) Perhaps your ego is too fragile for you to admit to being 'wrong' in any field where you think yourself competent (this is hardly unusual). Or perhaps you like being the underdog more than you favor pursuit of truth. You'd probably be that guy promoting types if you had 'discovered' them on your own, but instead you see types as encroaching on your comfort area and so you resist. Rationality is always the first casualty in any discussion you join, TopMind.]
- It's simply that you have not shown that the downsides of some bag-friendly features outweigh the upsides. Your argument resembles a religious belief, not weighing practical costs and benefits. This applies to your no-null arguments also. They fit the same pattern.
- [You have not demonstrated any 'upsides' of the bag-friendly "features". A few of your arguments suggest that you're simply in the habit of using SQL tools a specific way, regardless of whether you might be better accommodated by integration of other tools and languages. Most of your arguments - e.g. that bags 'better fit the messy world' - seem to be fundamentally unjustifiable... i.e. you express a belief that cannot be argued from any set of facts. I do not see any upsides intrinsic to the bag-friendly "features" to weigh against. However, I do see plenty of downsides intrinsic to the bag-friendly features, and I see plenty intrinsic to introducing unnecessary 'choice'. If you want me to start weighing the alleged upsides of bag-like features, first you must get me to acknowledge the upsides are a consequence of the bag-like features.]
- The benefits are pretty clear to me. I'm not sure how to better communicate them beyond what I've already said. As far as the output side, it requires more tools, more typing, more syntax/commands to learn, and a more complex query language (such as needing imperative loops and a WRITE statement). As far as the read-only event log, you haven't given enough details about how it works in your fav query language, especially if the table is too large to completely copy to the client or local DBMS. After you've supplied those details, then we can dissect the pros/cons.
- *Sigh* It's been pointed out again and again and again that there was nothing "needing imperative loops and a WRITE statement". It was intended only as an illustration.
- Well, I need something concrete to compare (unless you want me to make up some "ideal" Rel of my own). If you are dropping that example, then please present a replacement.
- I did! EXPORT. It's been mentioned multiple times!
- (Comments on EXPORT are below)
- Don't forget to add C(1) for each pseudo-relational operator that accepts either a bag or a relation, of course. :-)
- [Nah. That wouldn't be fair. They're all bags.]
Nice rant. Read my response above, again. Pay particular attention to the part where I "could equally have chosen to illustrate this with Rel's EXPORT statement". No loops. The loop was deliberately chosen to illustrate arbitrary processing of duplicates. In other words, you can do
anything you like with duplicates
in a pure relational environment.
For one, it creates a violation of OnceAndOnlyOnce because one has to duplicate the SELECT list in the formatting side (minus the keys). That's more typing, more code, and more places to make mistakes. With "traditional" query languages, one does not have to write loops and formatting statements to remove the key(s). If that doesn't personally bother you, fine. I just don't want it as part of my tools. I believe most real-world developers will agree with me. The 10% of the population that is anal or purity-obsessed can use yours.
EXPORT, for example, doesn't "duplicate the SELECT list". WRITELN is an arbitrary statement. It could be any code at all including something to export data. That was the point. It was meant to be illustrative rather than definitive. But, as usual, you're merely being pointlessly argumentative.
By the way, I've already got real-world developers using Rel. How many real-world developers are using SmeQL? Why don't you implement it and then we'll see which is more popular. I'd welcome the competition and the opportunity to kick your sorry butt, market share-wise of course.
Let's compare the total code change and effort required to remove the key and create a CSV file using a "bag" query language:
Before:
SELECT a, b, k FROM myTable
Select "run" icon in Toad, right click on result window, and choose Save As, select CSV
After:
SELECT a, b FROM myTable
Select "run" icon in Toad, right click on result window, and choose Save As, select CSV (same) [1].
--top
You're kidding, right? Ok, what language statement implements the export to CSV? Are you expecting me to incorporate Toad into my automated toolchain and run it on my unattended headless UNIX box?
Oh, and here's Rel's work-in-progress EXPORT statement:
No duplicates:
EXPORT myRelvar {k, a, b} ORDER(ASC a) FORMAT CSV TO "myfile.csv"
Duplicates:
EXPORT myRelvar {k, a, b} ORDER(ASC a) EXCLUDE k FORMAT CSV TO "myfile.csv"
No Toad required.
So you add a new command and EXCLUDE keyword to "solve" it. That can't look well on the complexity scale. (It looks more and more like ExBase's ugly sister, too.)
- [You think a toad is prettier?]
- Us trolls love toads: dark, damp, and squatty. You might as well put a GUI into Rel, everything else is there. Why have a KingLanguage? when you can have a GodLanguage? Note that while an ExBase replacement might be a good idea, we still need a "base" query language that improves upon SQL. When that's established, then we can work on app-language-like extensions. You gotta walk before you run. -t
- The notion of a division between a query language on its own, and an application language with an interface to that query language, is increasingly deprecated. As for a GUI, future versions of Rel will introduce a closely-coupled GUI (running over Rel-JDBC or some equivalent), that has been started as a student project. This kind of coupling has been shown historically to be highly successful in products like ExBase, BorlandParadox?, MicrosoftAccess, and so on. However, Rel is also usable as a conventional stand-alone DBMS with imperative language capabilities in the same vein as PL/SQL, Transact-SQL and the like.
- It's also one of the reasons for the down-fall of ExBase and Paradox: they didn't integrate so well with SQL-based RDBMS because of their monolithic integration. Even MS-Access has lack-luster integration with non-MS RDBMS. If you break things into components, then the components can mature and change semi-independently, allowing flexible evolution. God-tools are great when released, but unexpected changes in technology and practices tend to byte them in the arse, and their weakest point drags them down. They are shooting stars. Anyhow, for now I'm competing with SQL, not MS-Access.
- Eh? How, exactly, are you "competing with" anything?
And what command, with keywords or not, do you use to create a CSV file? Without Toad.
This is ridiculous, of course. You can imagine anything you like and paste it into one of your SmeQL pages. I present a working language that people are downloading and using. What have you created except hot air? You are a festering blight on the IT landscape, and a vast sinkhole into which productivity pours. The sad thing is that it happens voluntarily. You have a unique talent for making it appear, briefly, like your ignorance might be addressable with education. I don't blame you, of course, because you're deluded enough to think you're actually doing something useful. For me, however, I could have written some code -- perhaps fulfilling one of my appreciative users' requests -- in the time it took to write this. I should know better. You, sir, are a troll, but it is I who suck for allowing myself to once again be trolled.
Red Herring. A poorly-designed language is still poorly designed even if it runs. I suggest you get the detectable kinks out BEFORE coding. You didn't do your homework. Sure, I could add linguistic bells and whistles to SmeQl for the hell of it, but if I can keep most of it simple, I will. Otherwise, you are just reinventing the mistakes of SQL, but with a purist twist. -t
I presume SmeQL will export to CSV files using Toad too, yes? Troll.
I would note that an EXPORT operation can be added without adding new syntax, just a library entry. This is because it doesn't rely on hard-wired keywords. I think it's a good thing.
But that's all in theory, because you haven't implemented anything. Isn't it you who always advocates "race the damn car" or some such? When's the SmeQL car going to be more than a teenage sketch on the back of your schoolbooks?
That's legitimate criticism, but it doesn't change any flaws in T.D./Rel. We BOTH have areas to work on, okay?
What you consider flaws might actually be features.
But, admittedly, vice versa (in all possible permutations)...
(On a side note, in this day and age, hard-wired key-word-heavy languages are kind of a smell. Why not use some kind of UniversalStatement (including XML or EssExpressions) so that the syntax tree itself doesn't start to look like SQL's and COBOL's. I've considered a UniversalStatement for SmeQl, but so far it doesn't need enough optional keywords to justify going away from positional parameters. But since positional is a subset of U.S., it can be added later without breaking existing code.)
Because LISP, the UniversalLanguage, has been done.
I personally find the UniversalStatement more friendly than nested lists. And it's closer to Algol-style, meaning more familiar to users.
You've used Algol, have you? Troll.
That's what they generally call languages that look like C, Pascal, Ada, etc. They can trace their style back to Algol more or less.
No, they generally call them "imperative" or "C-style", etc., as appropriate. The term you're looking for is "Algol-derived".
Ada is not C-style, and Assembler is also "imperative". "Algol-influenced" may be the best match so far.
Note that I wrote "as appropriate", you argumentative git. "Algol-derived" is a recognised term in the literature. It means what you're trying to say.
I don't understand why you asked, "You've used Algol[?]". You could have just said, "Perhaps you meant 'Algol-derived', which is usually how it's worded in the literature."
Ah, you are weak in the sarcastic spaces.
So that's what it was.
I would have thought the "Troll" at the end would have conveyed what pure sarcasm -- which is, of course, transparent and sometimes hard to see -- did not.
Well, I mistook the "troll" statements as blatant insults. Silly me.
Oh. I thought they were fine examples of accuracy in product labelling.
Is that sarcasm, or an insult?
It's a statement of truth.
When the "projection" filter is applied.
You've been called a troll how many times? Me, never.
That's because I critique existing stuff people care dearly about. Instead, you make obtuse God-tool crap nobody gives a fudge about. "Troll" has no objective meaning anyhow. Emotional anger and vagueness seem related.
I think you're called a troll because you're a troll. You strike up arguments for no other reason than you like to strike up arguments. There is no intellectual, noble, altruistic, or even justifiable reason for your so-called "critique". You incite anger because it amuses you, and that makes you a troll.
Your assessment of my neural activity patterns is wrong. You are not a mind reader and don't have a window seat to their firing patterns, so stop pretending you do. And there *is* intellectual justification: to make a better query language by dissecting the options. You are likely just sour-graping me for pointing out the silliness and ugliness of TD/Rel caused by obsessive-compulsive tendencies. That's my assessment of your neuron patterns.
Your points might have validity if anyone other than you had raised the same criticisms about TD/Rel. That would indicate there are general problems that are generally recognised. But no, the criticisms you've raised are unique to you. Still if you were a recognised expert in the field, noted for your valuable insights into database language design, your unique views might have validity. But no, you're recognised only as an argumentative, uneducated troll.
If it walks like troll and quacks like a troll...
[You give TopMind too much credit. Trolls are often quite clever... and, importantly, they generally know they are trolling. TopMind isn't a troll. He's a crank... and not one especially competent in offering a facsimile of reasoning. He resorts to sophistry, fallacy, and hand-waving because he honestly believes them to be proper counters to your arguments. He doesn't understand actual reasoning (he lacks the competency to recognize it or follow it, and he no doubt projects his own incompetence with reasoning on the value of the whole discipline). He rarely evaluates or introspects his own beliefs, nor does he seek to resolve apparent inconsistencies between them (well, apparent to us, anyway). He isn't stupid, but it takes far more intelligence than TopMind possesses to overcome the sheer level of arrogance it takes to disregard self-education and the skill of correct reasoning.]
Massive projection, number 2. YOU are the one (or two) who cannot produce clear-cut open evidence. For example, you insist Nulls are objectively and scientifically evil (rather than say a personal preference dislike). YET, you cannot produce science-quality evidence that they are objectively bad, just professor-on-crack-sounding mumbling. YOU are the problem. At least I don't PRETEND my personal preferences are universal mathematical truths. -t
[You have been provided mathematical proof that NULLs lead to various anomalies.]
- Where? Please link.
- [I don't memorize topic names on your behalf. Also, if you are honestly interested in the subject, it would be wiser for you to find external resources that explain it rather than opening up a page full of you faking comprehension. Instead, do a little research on your own initiative and review the higher normal forms (4th, 5th, and 6th normal forms) and the specific anomalies avoided by each higher form. It is all available online. Or are you going to whine about 'BookStop' because you'd rather defend your incompetence?]
- It's not "anomalies", it's weighing trade-offs and knowing the domain. Stop the Pseudorigor.
- [You call it a 'trade-off'. What are you buying, besides complexity? Performance would be an incorrect answer, since a database with indexing strategies designed around 6NF could achieve equal performance to anything we have today. Simple interface would be an incorrect answer, since a language developed to query and manipulate 6NF tables can be quite compact (Prolog and Datalog being examples). So what, precisely, are you gaining?]
- I'll accept the "complexity" of Nulls and bag output because it better fits the messy real world and simplifies human effort needed to create output. As a user, a shopper of tools, I personally choose that trade-off. Oracle and IBM appear to agree. If you want to sell cars that can't make a left turn because accidents are statistically higher for left turns, that's fine. I don't have to buy it. Enjoy making 3 rights instead. -t
- [What does "it better fits the messy real world" mean, such that you can reasonably justify this point? Or are you just performing argument by assertion? And "simplifies human effort needed" is relative between toolsets and between tasks. What makes you assert this as a general argument? And how is your car argument usefully analogous?]
- Bags are inherent in the real world, the domain. You can simulate them with sets, but it requires extra translation effort.
- Yes, bags are inherent in the "real" world. The cans of cat food on the store shelf are all identical, and modelling them precisely in a store simulator using a bag might be appropriate, especially if there is some need (what, I have difficulty imagining, but...) to regard each can independently but not uniquely. However, business information systems are not simulations, they are information processors. They manage data. We do not simulate a store, we record facts aka beliefs aka data about a store. Rather than record each individual can, we record one fact -- that we have 'n' cans of SKU 234092048. For that, sets are ideal, no extra translation effort is required. I know of no information systems requirement that demands true bags for accurate representation. There are occasional cases where information presentation, i.e., reports or output like anonymised address lists or mailing labels, demands bag-like mechanisms. However, presentation is very distinct from representation. These are always reporting requirements, and are always appropriately handled in report generators and the like. Information storage is always based on unique identification because information retrieval requires it, and explicit uniqueness is demanded for (at least) auditing purposes. The notion of recording two or more identical but distinct anythings is the antithesis of sound data management. Bags may be inherent in the real world, but we're not representing the real world. We're representing data about the real world, for which sets are ideal.
- You are presenting a classification fallacy. Tool feature groupings should be determined by usage patterns and practical concerns more than arbitrary classification systems.
- What?!?!! Your response has nothing whatsoever to do with what I wrote.
- I guess I misunderstand what you are getting at in your hard partition of representation and presentation statements. I suggest you review the way it's worded. And, I don't think there is or should necessarily be a hard partition between them. There is overlap, perhaps a lot. Different systems often need the same info "stored" a different way. Rows may become columns, columns rows, some added, some dropped etc. because the concerns of different systems are different. This requires similar "reformatting" to what report preparation requires. Using a very different tools for each would be poor tool factoring. -t
- That's an interesting and enlightening response, as is your "phone_page" point below. It goes some way, to me at least, to explaining why this and many debates like these occur with you. When you write "using ... very different tools for each would be poor tool factoring", to me it's like claiming that a hammer and a toaster being very different tools would be poor tool factoring. Presentation and projection are completely different, not the least because presentation can be done in innumerable ways, but projection is always the same operator. To me, it's like claiming that applying '+' to integers is the same as the font and colour used to paint the sum onto a road sign. I guess you see projection as being "pick the columns I want", whereas I see projection as "the operator in the RelationalAlgebra that returns a relation with specified attributes."
- I don't understand your point with the toaster and hammer example. If you rarely use them together, then don't couple them. If you use them together often, then it may make sense to couple them. For example, suppose your toaster tended to freeze up often, locking your toast in, and you've found that a hammer hit on the left side is the only known way to unfreeze it. So, you keep a hammer next to it. However, family members keep walking off with the hammer, making you late for work or hungry. Thus, you get some metal wire and nails and connect the two together via a tether. You've coupled them because they are often used together in your house. Maybe there is a "tool theory" where the optimum partitioning and coupling of features can be found by the usage relationship (weight) between features. -t
- As far as relational algebra versus "formatting", you seem to be wrapped up in "perfect classification". Feature sets will tend to bleed into each other's territories at the edges. Cup holders seem strange in cars because cars are not kitchens. However, if you live in big California cities you will soon find that cars end up being de-facto kitchens because people spend so much time stuck in traffic. It's a marriage of practicality, not of ideal classifications. I believe you have an overly-idealistic personality such that grey areas irritate you. Relational algebra transforms tables, or at least "grid-like things". Managing the inclusion and order of rows and columns is also a form of "grid transformation". Thus, they are related, you just have to widen your classification groups more: things that manipulate grids. Relational is a sub-set of grid manipulation (or at least could be viewed that way). In this view they are not too different after all. -t
- (PageAnchor phone_page) For example, our telephone system in the office can also be used as a paging system because they already have speaker phones. (Similarly, query languages already have a column selector mechanism.) Someone could argue that "telephones are not paging systems" and therefore we should buy a $500 ceiling speaker system instead for paging. It would keep functionality "separate", at least by the concept that paging and phoning are "supposed" to be different things. Human classification systems are a UsefulLie, but still a lie. Economics overrides classification.
- As far as your cat food can example, I may agree. But, I already gave 3 scenarios: 1) read-only access to an event log table without an auto-key, 2) ad-hoc querying chopping out compound keys to make it fit on screen or paste better, and a 3) customer who doesn't want keys in his/her data per contract. (See BagNeedScenarios for more details.) Basically in all three you are arguing, "Buck up and live with the extra step of removing or adding keys because it makes the purity gods happy, which will SOMEHOW pay off down the road in undefined vague undocumented ways". That's what I'm hearing from you.
- [As I understand it, the domain of the RelationalModel is data, which is modeled in terms of relationships. Data has a few relevant properties: hearing the same datum twice never introduces any new information (thus, duplicates are never relevant), and the truth of a datum is independent of the order in which it is presented to me (thus, order is irrelevant). These two properties directly match those of the mathematical 'set'. Now, it seems you believe the 'domain' of a table is 'the real world'. What do you mean by this? (How does one put 'the real world' into a table?) Why are bags 'inherent' to 'the real world'?]
[You have been provided
evidence of these anomalies happening 'in the wild', as one might say. That doesn't make them evil, but it does make them harmful. The problem is on your end,
TopMind, because whenever you are provided these proofs, you successfully demonstrate to us your obscene levels of incompetence, and your unwillingness to actually attack the argument - instead you resort to fallacy and sophistry and hand-waving attempts to disregard the proofs. The problem is on your end,
TopMind, because you honestly believe that
you are the judge for whether you've won or lost an argument.]
"Disregard proofs", pfffft. You really think your personal judgment is iron-clad math-proof kind of stuff. Self-delusion. Next you'll "prove" that purple is the prettiest color. Dolt. Pseudorigor.
[Where, precisely, have I taken personal judgement and called it a proof?]
You are full of it. Usually one finds there's trade-offs involved, not a free lunch, and you handwave away the downsides without any rigor. Imperative loops and WRITE statements? Puke. How 1960's. Oracle and IBM were smart to ignore reality-detached fools like you. Are they also "trolls"?
Dude. You have so gotta work on the comprehension skills. Note, above, where I wrote "the FOR construct was merely selected as an example" and "it was meant to be illustrative rather than definitive" and "TutorialDee, and hence Rel, is intended to be a general-purpose imperative programming language with built-in database capabilities." As such, continuing to harp on about "loops and WRITE statements" merely demonstrates your inability to read, recall, absorb, and understand. It makes you look like an idiot.
[The fact TopMind's irrelevant utterances of blind incomprehension are additionally inconsistent with his arguments in other topics - i.e. that ProceduralProgramming goes well with TableOrientedProgramming, and that ExBase is all that - makes him look even more like an idiot.]
- The point about the FOR loop is that one shouldn't need it for excluding chosen columns and "widening" (null-filling) output tables. It's fine to be able to use both imperative and declarative, but don't force imperative via purity zealotry.
- [One could just as easily utilize a declarative input/output language (indeed, one was demonstrated for you: EXPORT), so your point was never valid. Can you justify your assumption that data processing languages should be responsible for ordering and presentation format? Or do you never question your assumptions?]
- How tool functionality is partitioned generally depends on which features are most likely to be used together. Hammers have nail removers built-in because the need for nail removing is statistically high when hammers (the blunt end) are used. I do know that when I am doing internal ad-hoc queries, I need to look at the output either as a preview, or because I'm studying something, such as looking for particular records or data patterns. Color and cell boarders are not really important for that. If those are needed, then usually they are needed together with many OTHER formatting features such as for presentation purposes, such as fonts, page numbers, titles, paper margin control, etc. At that point I get a report writer or export to a spreadsheet to pretty it up. But for my use, internal use, those don't matter much. However, the existence and ordering of rows and columns do matter. I cannot see the "grid" I want to see without them. They are features that are commonly needed in association with the rest of the query language; at least for my usage patterns. As long as a given shop, machine, or tool has SQL, I have at least the ability to get the grid I want. Those machines don't have to have installed formatters also along with the baggage formatters bring with them.
- [Sure, certain tools are used together so you want them in the same toolbox. Import and export of data, as part of a database management service, is one example. But it sounds to me as though you're compromising the query language to gain a slight advantage towards your goals, and that you are so habituated on that particular "solution" that you don't even bother reconsidering it upon introducing other tools to your toolbox. For example, I suspect some debugging facilities and a TableBrowser (which can order-by columns, and hide them, among other typical features) could achieve the same goals you are aiming for, do a higher quality job, and do so without requiring any compromises of language that might hinder optimizations or semantics. Ask yourself: have I fallen into the common mental trap of specifying my solution rather than specifying my requirements? It is an easy mistake to ask for a shovel when what you really want is a hole.]
- A table browser is nice, but not always available to do the column chopping. For example, what if I want to script it to export it to another tool using CSV. (A CUI command line query language should support CSV in my opinion.) And it's just plain silly to not allow one to skip keys in telling it which columns to show or output. How is that different from sticking a second tool on the end of the stack? Something is going to have to do the chopping anyhow. Why have middle-men? Why is it okay for a middle-man tool to commit "bag sins" but not the query language column output selector? It's okay for the butcher to steal money but not the baker?
- [Why do you call a table browser a "middle-man tool"? For which query does it sit in the middle? Indeed, where am I promoting that any tool sitting 'in the middle' of the query process be capable of introduce duplicates or order then pass these issues onto the remainder of the query?]
- I meant between the query language and me, not in-between queries. An "extra tool" may be another way to say it.
- [It is okay for tools outside the queries to do formatting and such because doing so does not damage the semantics or properties of the query language, does not hinder analysis or optimization. Your argument amounts to a slippery slope. Why is it okay for code outside the query language to have SideEffects, but not okay for the query language itself? Why is it okay for code outside the query language to format a table into XML, but not okay within queries? Why is it okay for code outside the query language to format order or introduce duplicates, but not okay for the query language? The answer to each of these questions is exactly the same: because maintaining the properties of the query language is critically necessary to keep the advantages from having a 'query language' at all! Compromise costs a non-trivial amount. You, while ignorant of these costs, declare them a worthy expenditure for some alleged marginal benefits in two specific circumstances: presenting duplicates, manipulating order of presentation (i.e. of columns and within columns). We have no reason to believe so, especially since 'tools outside the queries' can do these jobs admirably well - and without compromise. HandWaving appeals to 'messiness of the real world' do not help your case. A job of computer languages is to manage and partition that messiness, not promulgate it. You should go program in raw AssemblyLanguage if you want to keep the messiness of the world.]
- Your point in the side-effects and XML examples is not clear: Many query languages *do* have side-effects, such as SQL's UPDATE statement.
- [SQL is not a single language. SQL is made up of several smaller languages, covering various aspects from transactions and access control to data definition and data manipulation. Only the stuff you are allowed to place inside the 'SELECT' statement is truly part of SQL's 'query' language. While I do not practice SQL very often, I do not recall the ability to embed UPATES into the SELECT statements, even though vice versa is okay. If my recollection is correct, then SQL is properly a 'layered language' with respect to SideEffects. We are suggesting that RelationalModel query languages should, similarly, be layered with respect to presentation details (such as ordering, or XML format)... i.e. such that views and implementors and such cannot complicate the 'query language' proper.]
- For example, SQL could add syntax like: SELECT * TO FILE 'data.txt' FORMAT XML FROM foo. ExBase was sort of like this.
- [SELECT statements can be embedded within one another. So one question is: what, exactly, would that mean (in terms of semantics, performance, concurrency issues) in the event it was embedded within another SELECT statement? Similarly, SELECT statements may be used to define UPATEs and VIEWs and such. What would SELECT TO FILE' mean in those cases? Additionally, SELECT statements are also performed over the wire, between unlinked FileSystems What would SELECT TO FILE mean for purpose of security and accessibility issues?]
- Most non-proprietary info processing tools can import and export standard formats. That's how they talk to the outside world.
- [Sure. So can RelProject. It simply isn't part of the 'query' language. Export doesn't need to happen in the middle of a RelationalModel query. Sometimes import is needed (i.e. lazy import, or to handle updates) but creating 'new' links shouldn't be part of the query itself, and so linking of external data resources is also expressed outside the query language.]
- Yes, other tools can do the necessary formatting, but it's usually more set-up steps. If you find yourself having to install, use, and run two tools together, that suggests they should become one (or at least extract and refactor the related features).
- [No, that means the tools should be part of the same toolset. Just because you need to use ProceduralProgramming and SQL queries together doesn't mean SQL queries should be capable of procedural behaviors. This has already been explained to you, multiple times, and you've seen plenty of examples... i.e. TutorialDee/RelProject can perform EXPORT or run WRITELN in a FOR loop; you do not need to install or run a separate OS process to use these export and imperative tools, and yet the RelationalAlgebra of TutorialDee cannot call EXPORT or WRITELN, and thus remains independent of (and uncorrupted by) those external tools. Maybe it would help if you start thinking of RelProject and SQL DBMSs as 'toolboxes' rather than as individual 'tools'.]
- You have not demonstrated clearly what the "compromise costs" are. You say I'm ignorant of such costs. Yes, you're right, I am. You are not explicit enough. Your justification is HandWaving ("bad things happen, just trust me."). Reduce your explanations to a finer granularity and maybe you'll hit it.
- [These were explained in BagAtational, and have been explained elsewhere, and are well known in the literature. Your ignorance is not due to stupidity on your part, nor due to lack of detail available to you. Your ignorance exists because you are an ignoramus: you actually like to be ignorant. You are nobody's student. You don't even self-educate.]
- And there's also the issue of being able to hook up to existing DB engines. If you follow their conventions, then hook-up is easier.
- [Sure, it's easier. RelProject will need to include some extra facilities for hooking up to SQL DBMSs, such as auto-numbering tables that potentially possess duplicates. On the other hand, why have a new DB engine if you're simply going to follow the conventions of existing ones? If 'easy implementation' is a priority, you should probably stick with SqLite or MySql or Oracle... because the 'easiest' implementation solution is to use an existing implementation.]
- Continued at DuplicateTupleOutputDiscussion. This is getting TooBigToEdit.
It appears that for Top, maintaining an argument ad infinitum
is infinitely more important than maintaining a coherent and consistent argument. I actually thought the FOR loop would appeal to him, given his past arguments. Silly me.
Projection again. You are the one (two?) who started repeatedly calling me a "troll". If you wanted to stick with technical discussion, well, most normal people wouldn't do such a thing. In fact, I'd call that "trolling".
[Very few of your comments in this entire page have any relevant technical content whatsoever. You have been extremely disrespectful. I do not believe you are a troll, but you're certainly an arrogant asshole.]
Very few of your comments in this entire page have any relevant technical content whatsoever. Your complexity "count" conveniently disregards factors you don't want to hear about. You have been extremely disrespectful. I do not believe you are a troll, but you're certainly an arrogant asshole. -t
[Oh? Which factors did I disregard that I don't want to hear about. Be precise.]
Summary Comments
I believe this debate comes down to whether mixed bags+sets or just sets are the best structural fit for business applications. I lean toward mixed because I believe the real world is "too messy" to fit closely to a set-only tool-set, as described above. One is modeling business and political rules created by irrational or poorly-informed managers and users, not PhD's, and bags can better fit this environment. Sets-only are just too idealistic. -t
As has been explained repeatedly, in a true relational system (typically a DBMS) it is trivial to (a) eliminate duplicates on import/link and (b) trivial to produce duplicates on output for those exceedingly rare cases where users require duplicates.
In other words, bags are only (slightly) relevant at the boundaries of the relational system. Within the relational system, bags are not needed at all. Supporting them only adds complexity and potentially diminishes optimisation. Therefore, there is no reason to support bags outside of import/link (i.e., input) and output, as described above.
I have, by the way, never seen a business case for maintaining duplicate rows in a table in a database. I have seen it done, obviously, but never with sound justification. In most cases, it's been done purely out of carelessness or ignorance.
The scenarios in BagNeedScenarios are not that rare in my experience, but we can LetTheReaderDecide those frequency issues rather than continue bickering over that point. As far as optimization, it would only be an issue if there was a bag in a resource-critical portion of a given query. Again, I'm for having choice. -t
Do you expect users to know what the "resource-critical portion of a given query" happens to be?
As I mentioned before somewhere, a poorly-skilled query writer will often make many other mistakes. Duplicate-management being just one. Plus, if the option of bags is removed for them, they may make other messes to compensate, ie, force a bag. WorseIsBetter often just works better in the real world. Idealism often proves uneconomical. -t
Why would anyone "force a bag"? Anyway, at worst that would mean a table with an auto-generated primary key. Your comment that "idealism often proves uneconomical" is based on what?
Auto-generated keys can be unwanted overhead, and may not solve the "bag problem" anyhow, simply putting lipstick on a pig. They may remove "technical duplicates" but not necessarily "logical" or "domain" duplicates. As far as the economics of idealism, I'll refer you to WorseIsBetter.
What is a "logical" or "domain" duplicate? I've never seen that term in the literature.
[The longer you discuss a topic with Top, the more terms he makes up.]
- Why can't you just polity ask for clarification instead of turning it into a personal dig? Does everything have to be a personal dig? It gets old. It really seems to me that you wear sinister-colored glasses when you read my writing and view every action of mine with the most negative possible interpretation of motives possible. Now, back to the topic. I generally meant unique keys that are part of the domain and not created just to have a unique key for some temporary or internal process. For example, in the "logger" scenario from BagNeedScenarios, one could add their own Auto-Number type to the extracted data set(s) just to have a reference number for internal processing. But, it's not part of the domain and is not guaranteed to reference the same "record" the next time around. It's only put there to have a temporary internal reference ID, not to make a one-to-one correspondence with anything real in the real world (or domain). If you can suggest a better name for it, I'd be happy to consider it and re-write the above. -t
- [This is a good example why I stopped asking for clarifications a long time ago. That's not a domain "duplicate", it's a domain key -- and if they're duplicates, they're by definition not keys. You do such violence to terminology by using established terms inaccurately, or by making up subtle distinctions that only you understand, that half your arguments are inscrutable, and there's not enough time in a day to waste it asking for clarification at every turn.]
- I feel the same about your writing. You ain't no Carl Sagan.
- [Well, at least if you ask me for a clarification, you'll likely learn something that you can use in conversation with others in the field.]
- [As for a logging application, one needs to find the true domain keys if they haven't been handed to you on a platter. Duplicate log messages that can't be distinguished are useless noise, and that's true of most duplicate rows in general. If all you want is a count of how many times a log message occurs, then for god's sake count them. A more likely domain key would be e.g. time of arrival at sufficient resolution to order the messages, or by source, or a combination of several such things.]
- It either has perfect identifiers or it's useless? That's not the world I live in. Your excess idealism is showing again. Wear a longer dress. (Those things are already addressed in the write-up.)
- [It has nothing whatsoever to do with idealism. That's another typical Top distraction/cop-out/obfuscation and a reason that discussions with you run around in circles forever filling the wiki with pointless words. "Key" means unique identifier when you're talking about data in a database. A "key" that is not a unique identifier is not a key. See what an utter waste of time it is when you don't know and use common terminology correctly?]
- Your statement, "Duplicate log messages that can't be distinguished are useless noise" is rather extreme. Suppose the log record indicated that something unusual happened, but we can't tell, due to a misconfiguration of logger identifier info, whether it happened at location A or location B. We still know that something unusual happened at the location of at least one of the loggers. Further analysis of the data patterns may improve the guess.
[This is a really bizarre discussion. In a decade of working on accounting systems, I have also never seen a legitimate use of duplicate rows within a system of queries, and none of Top's "reasons" for offering it as an "option" make the least bit of sense to me. When duplicates occur in systems of SQL queries, it's almost always a mistake, or, when deliberate, leads to later mistakes when the output of such a query is used in another context. The cases where an artificial key has been needed in an intermediate query result have been very, very few. It makes tremendous sense to be able to generate non-relational outputs with duplicate rows, and almost none to incorporate them within a relational system. --
DanMuller ]
Accounting is probably a domain where one is least likely to encounter such a need because it's unforgiving and requires careful schema design and query writing. I gave some non-accounting scenarios in BagNeedScenarios. If by chance you disagree that those are likely, just say so and we'll move on. No reason to make it a personal conspiracy to mislead. LetTheReaderDecide if a given scenario is likely in their own domain/experience. Removing the unique customer identifier for privacy reasons was something I did fairly often in one shop, by the way (BagNeedScenarios scenario 3).
Also note that a "forced" temporary artificial key can create the risk that somebody will use it as an (assumed) permanent identifier. -t
[Labeling my field of experience as unrepresentative is a completely evasive cop-out, and completely untrue. Applications where errors are tolerable in the manipulation of data that's worth storing in a database are certainly in the minority.]
- I have to disagree with that assessment. But again, in the end we'll have to LetTheReaderDecide whether the stated scenarios are likely or not and/or reflective of their own domains or experience.
[I skimmed most of the
BagNeedScenarios page, and others have already critiqued the applicability of the scenarios to your argument. They either describe scenarios at the boundaries of a relation system (and thus don't indicate anything close to a need for discarding relational algebra), or they are the result of inadequate analysis of the domain, or they accommodate mistakes made in legacy or external systems that can be relatively easily fixed or addressed at the system boundaries. No show-stoppers in there at all.]
[If removing a customer identifier results in duplicates, replace it with a count and eliminate the duplicates. Even in a report (which is outside the boundaries of the relational system), a count is more informative to a human reader than having to visually identify duplicates, and is obviously more useful to other software. I go so far as to argue against duplicates even in user interfaces and reports in almost every case where someone suggests them, because the plain fact is that something distinguishes duplicates, even for human readers - if nothing else, there's implicit information in a position in a list.]
That's already addressed in the write-up. If the contract specifies columns A, B, and C, you don't give them D out of some purity religion. I've been chewed out multiple times for giving extra info that I felt would be useful to the customers.
[Stop ignoring half of what everyone says. This statement ignores the fact that you can give someone output with duplicates at the boundaries of the system. This adds nothing but bulk.]
- Please clarify. It's an extra step to run it through yet another filter. Calling it a "boundary" doesn't change the fact it's an extra step. Boundaries are relative. I'll agree with a warning from the query language or whatever about the results not having a unique key, but it shouldn't prevent me from chopping out columns I don't want. I want to be the "god" over the tool, not the other way around.
Again, you guys seem detached from typical workings and dealings of the business world, thinking you will be allowed to play Techy-God and/or be given infinite resources to fix all the clunky systems in a typical business network. I used to be kind of like that out of school, wanting everybody to make cleaner data with more validation and more research into hiccups and oddities from other department systems, etc. After a while I learned to hard way to shut up and work within what's given me.
[More typical Top crap. If circumstances require you to deal with a bad situation, you don't turn around and argue with the rest of the world that the bad situation is right. And you definitely don't propagate the problems into new systems if it's easily within your power to avoid that.]
- I never called it "right". You are putting words in my mouth and I am offended. It's a matter of working in the environment given to you without becoming an aspergerian whiner. If adding an auto-key helps things on the side I can control, I will add them. Sometimes it doesn't, especially if refreshed data would get all new auto-keys anyhow.
[So many people with more experience than you have argued with you on this wiki over the years that nobody wants to hear your "detached from reality" spiel anymore; it's simply ludicrous and has nothing to do with anything but stroking your own ego. The fact that you have so much experience and have taken mostly wrong lessons from it does you no credit.]
- Is that supposed to mean something?
AllRoadsLeadToBeeMinus is generally true. You may make your little spot in the world A+, if you can pull it off, but rarely does one have the power to force that on everybody else. I used to be a contractor, seeing a lot of different systems in different kinds of companies, including huge ones that everybody's heard of, and was given lots of clunky data to work with as-is. If this differs from your observation, well, so be it. I'm reporting it as I observed it. -t
[And that makes not one whit of difference as to how you approach working with such data within a system that you are writing. More distractions.]
Scenarios 2, 3, and 4 do affect one's own tasks.
Re: "Toad" query tool:
"... Dark, damp, and squatty."
Stop it. You're making me aroused.
Please don't post any images of your partners.
Coward.
In this case, yes. Bok bok bok.
[1] I'd like to clear up the Toad example. Often query-related tools will take the result sets and produce CSV etc. as needed. The SQL just produces the "result set" (a table) and the tool does the rest. It's not meant to promote or assume Toad itself, but rather uses Toad as a stand-in for a typical "query tool" or TableBrowser-like tool. Ideally the tool wouldn't care if the result set was generated by a SELECT statement or the proposed EXPORT statement. -t
No one objected to TOAD per se. It's your use of a manual tool in the midst of an otherwise-automatable processing chain that seems particularly nonsensical.
I'm not sure what you mean. Toad is only an example. Note that most RDBMS have "export" related commands, but they run at the server and are thus usually of little help to the query user on the client side. Sometimes a shared file area is setup on the server, but this is a shop-specific decision.
See BagAtational
CategoryQueryLanguage
AprilTen and OctoberTen