Continuation of BagAtational:
A second practical benefit we lose is the ability to detect and prevent double-entry errors. Presumably, if we want duplicate rows in a table, then duplication has meaning. In a system that allows duplicate rows, that meaning is hidden. The existence of a duplicate obviously means something (otherwise we wouldn't be allowing duplicates), but we choose not to indicate it. So, if we accidentally double-enter a row, how do we distinguish that erroneous extra row from a meaningful duplicate?
Allowing duplication and using it "all the time" (paraphrased) are two different things. I never suggested "using it all the time", rather allowing it as an option. The real world is not programmed by a fastidious god(s) such that we need to be able to handle true domain bags without a lot of fussing.
I already gave an example of a customer/partner request that primary keys not be included in data sent to them. Another scenario is records from logging devices. Suppose the logging devices don't send sufficient info to guarantee a unique record, maybe because whoever set them up were lazy about making sure the ID's are set correct, and/or it only sends seconds instead of milliseconds, etc. I've worked in many projects where we have to accept customer or partner data as is. We were not permitted to lecture them on relational purity or whatnot. In another topic I describe how we found out that a state goofed and accidentally made duplicate car license plates. We can't send them to relational jail; plus the plates were already out there in the wild such that jailing them wouldn't fix it.
As far as performance, if it's possible for the RDBMS to know that all tables involved have a unique primary key-set, then it can use uniqueness in it's computational assumptions. I don't dispute that using bags in a given relational process may result in performance loss (or at least am not challenging that right now). But IF the involved tables are known to be unique, why would the mere option of allowing "bag tables" (outside of the current concern) noticeably impact performance of queries that only involve unique-slated tables? You seem to imply the existence of the option of bags alone dooms performance, which is almost as weird as quantum physics concepts. I request justification or clarification.
--top
Adhering to the RelationalModel and eliminating duplicates within the model does not prevent trivially outputting duplicates, via either an ARRAY (in TutorialDee) or explicit code, should this be what your client demands in a report or file. Can you show any circumstance outside of that where maintaining duplicates within the database makes sense, and that cannot be trivially and more appropriately addressed with surrogate keys and the like?
Sure, sticking a surrogate key (such as auto-increment integer) on our "sloppy logger" example will de-bag our table. In fact, in many if not most SQL-DBMS, an internal unique row ID is created anyhow. They just have properties that makes them unsuitable for display, such as post-deletion reuse. Thus, most existing RDBMS are not actually using the dreaded bag anyhow and you may be fussing over nothing. I just don't think a query language should have to do goofy programming to simply remove primary keys from the final output. -t
You haven't answered the question: "Can you show any circumstance [...] where maintaining duplicates within the database makes sense, and that cannot be trivially and more appropriately addressed with surrogate keys and the like?" Obviously, the "sloppy logger" can simply define a surrogate auto-number key.
- In my scenario, I can only read the log table, not alter it. This is quite common in a big org. Thus I cannot "simply define a surrogate auto-number key". And further reason why being able to work with existing tables and RDBMS is a wise thing. Your tool risks being a WalledGarden, at least not without a bunch of imperative pre- and post-fiddling.
- Let's assume, in your scenario, that we're using a true relational system like Rel. That means you're either dealing with a native RelVar (i.e., table) that someone else has created -- which means that user was required to specify a primary key when the RelVar was created -- or you're linking to an external data source and have declaratively told the system to supply a surrogate auto-number key because no other key could be automatically inferred. If you need to emit duplicates, say to meet a client requirement or because you're generating 'n' identical address labels, then you can trivially omit key attributes in the relevant WRITELN statement, report definition, etc. No "imperative pre- and post-fiddling" is needed.
- Assume that every DB we link to in the organization uses Rel? So Nissan is going to dump all their Oracle and DB2 and switch all their existing data sets, views, stored procedures, and Pl/SQL scripts to Rel? Not in OUR lifetime. And Rel may get wider acceptance if it can integrate into existing bag-based relational engines. -t
- Huh? If you're not using Rel or a similar true relational system, then this whole debate -- and arguably this whole page -- is meaningless, isn't it? I've already described how Rel integrates with "bag-based relational engines". Do you read what we write, or do you just pick up a few keywords and respond, ELIZA-like, with some pseudo-random objection?
- The argument appears to be, "if I rule the world and everything that's born or dies follows my rules and only my rules, then there's no problems." That may be true, but not realistic. And you have not described it in detail. Try breaking it into steps with sample data-structure or memory dumps. You know, like "For scenario X: step 1, the query engine gets the table's schema. Step 2, the query engine gets the primary key. Step 3, if the primary key is not found in the select statement, then.....". Show the X-rays of the food being digested in the patient in 5-minute increments. It may be tedious, but often that's the only way to discover unstated assumptions.
- *boggle* I don't know why you think there's some attempt to "rule the world" etc. Anyway, maybe what I've failed to make clear is that some true relational systems can link to external DBMSes. Rel will support links to tables in external DBMSes in the next major release, in addition to continuing to provide its own native storage. Dataphor only provides links to tables in external DBMSes, or did the last time I looked. In Rel, you will link to an external database table by specifying it in a statement. That statement can be used to declare an autonumber surrogate primary key, if the external database table does not specify a primary key. Clear now?
- No. See MS-Access linking discussion below.
{Top's argument regarding the "internal unique row ID" is non-sequitur. I'm not "actually using" the internal implementation details. As a client to a pseudo-relational service, I'm "actually using" the pseudo-relational interface.}
- They are being used as a kind of free surrogate key, just not necessarily shown.
- {Those "internal" row IDs are not available for the relational operators, and are not maintained across intermediate compositions (joins, unions, differences). That is not like a surrogate key.}
- They usually don't need to be. Actually, Oracle makes them available if you really want them, but I've yet the need.
- {Wow, it's like my words bounced right off your skull. Internal row IDs are not a substitute for a surrogate key, precisely because they aren't preserved across compositions. The fact that you haven't needed them is a total RedHerring.}
- I think you are confusing a perfect substitute versus a practical substitute. They don't need preservation to be a practical substitute.
- {Justify your claims. Demonstrate how the 'substitute' can be practical if it is incapable of all the relevant features offered by surrogate keys.}
- YOU are the one claiming it gums up Rel's implementation. I otherwise don't give a fuck.
- Tsk. Don't you feel a bit embarrassed getting angry over this? And what does this threadlet have to do with Rel's implementation of anything?
My point is that "bag" is a viewpoint in practice. If the internals really need to see unique rows for performance reasons, they can if they so choose.
{The performance issues are associated with the external semantics, and are unrelated to the internal implementation. (In particular, duplicates multiply across joins.) The performance issues for bags, though, aren't nearly so severe as they are for 'ordered' bags (aka Arrays), which are what TopMind really promotes. My own concern about bags is more semantic: the proliferation of duplicates has unclear meaning, and depends heavily upon the 'access path' to certain data. (Bags sacrifice some of the AccessPathIndependence achieved by the RelationalModel.)}
- You mean "ListAtational?"? And again, I am not promoting the use of bags, but merely saying the option should exist should it be needed, including interfacing with existing DB's and data sets.
- {Options are expensive, TopMind. "Merely an option" is an OxyMoron.}
- You have not shown that it must be expensive.
- {Options are always expensive. Choice is the root of all complexity. YOU have not shown the option provides benefits to outweigh this cost.}
- "Always"? That's bullshit. I sense artificial drama.
- {You have not shown that providing this option within the query language - which to be meaningful means 'within sub-queries', since anything at the edges can be handled by presentation or data formatting tools - provides benefits to outweigh its costs. It has already been pointed out what this option costs: lost opportunity for optimizations, unclear semantics, extra cases to handle in implementation, extra cases to reason about for end-users, an open vector for duplicate entry errors on insert or union, meaningless proliferation of duplicates on joins. These costs are known. And, yes, there are always, always, always costs for introducing 'choice'. Any meaningful choice introduces complexity. Complexity is a price we must sometimes pay to gain some other benefit, but neither choice nor complexity should be ends of their own - nor treated trivially as you attempt to do. If you don't see the costs, it's probably because you don't want to see them - ConfirmationBias. You are so insistent on being right that you refuse to "see" anything that might contradict your 'belief'.}
- Especially when you don't present anything specific that does so.
- C'mon, you must be trolling. You've been shown that retaining or maintaining duplicate rows causes "lost opportunity for optimizations, unclear semantics, extra cases to handle in implementation, extra cases to reason about for end-users, an open vector for duplicate entry errors on insert or union, [and] meaningless proliferation of duplicates on joins." All of these are explained somewhere on this page, even to the point of providing specific optimiser-level examples for "lost opportunity for optimizations". How much more "specific" do you need? Please, at least make some attempt to rationally counter our points instead of mainly waving your arms and hurling the occasional curse or content-free sarcastic quip.
- You have not done so with realistic example scenarios or just focus on worse-case situations. You seem divorced from the real world when you "explain" things. Either that, are a lazy documentor. I cannot tell.
- You missed a possibility: Your poor comprehension skills.
- I've seen good technical writing that was clear and easy to absorb. Yours is not it.
- Though quite informative, "Fun With Dick and Jane" is not considered to be an example of "technical writing". Note that technical writing is targeted at varying levels of reader ability. Most WardsWiki content is targeted at "Intermediate" to "Expert" level. It appears you require "Beginner".
- You'd screw even that one up too. You'd claim it violates conservation of pleasure for Richard and Jane to have fun at the same time and their pet Rex will get hit by an ice-cream truck if they try.
- You're right! Somebody's gotta display some responsibility here. Will someone please think of the children? And dogs...
{Why does Top believe the query language should be responsible for "final output"? And why would he wish to enable this for
intermediate compositions? Shall I expect
SmeQl to print formatted HTML? Am I to expect the ability to perform relational operations, such as joins and unions, on HTML-ized views? To what degree should formatting outputs - hiding certain columns or marking negative numbers to show in red - be the responsibility of a query language?}
I generally expect the query language's abilities to stop at the "cell level". In other words, it should be sufficiently powerful enough to deliver the "cells" that I want in the order I want (as a grid/matrix). Sub-cell formatting such as borders or color are outside of the realm of the query language. In other words, "Here are the columns I want and here are the rows I want". Why would I want a crippled SELECT statement that forces me to include keys? You are asking the formatting tools to have a more powerful/practical SELECT ability than your query language. If it's useful for the formatting tool, then why is it not useful for the query language? Why the "shift"? Formatting tools are lower-class citizens that are allowed to get down and dirty with the mucky bags, but your query language is aristocracy that needs the purity of uniqueness? -t
{Nobody is forcing you to include keys, TopMind. If you project a relation to remove the keys, you'll only lose any multiplicity. If you only wish to exclude keys from a user's view, it should be as simple as hiding the key column.}
So you agree it should be easy to hide a key column from the final result? And please define "multiplicity".
{Sure, you should be able to hide a column from a report just as easily as hiding a column in a spreadsheet. And the reference to 'lose multiplicity' above refers to duplicates that will not exist in a proper relational projection that removes the key from a table. I did not believe saying you'd 'lose duplicates' is appropriate since, by assumption of 'project a relation', there are no duplicates to lose prior to projecting away part of the key.}
{And perhaps you should spend some time answering your own question: when IS the formatting useful within the context of the query operation? Ignore the formatting for the end-user or report for the moment, and answer: where would this feature be useful for sub-queries, that the end-user will never see?}
Your formatting question is not clear. Controlling which columns to include and in which order is a useful feature of a query tool, especially for ad-hoc queries that are not part of a formal application. I often use ad-hoc queries to inspect data such as hunting for bad data or suspicious patterns. I don't want to have to go to a second tool just to lop off columns that make it too wide for the screen or change the display order. That slows me down and costs the company real money.
{Nobody is saying the different languages need to be in different tools. SQL involves several languages in the same tool, i.e. for transactions and access control and cursor management, for data definition, and for data manipulation, and even for procedures and triggers. These languages interact in various ways, and are often somewhat layered. TutorialDee has been brought up before as using 'ARRAY' as a presentation format. Having been faced with such well-known counter-examples, repeatedly, why do you persist in maintaining the straw-man argument that a second language is going to "slow you down" or force you "to go to a second tool"? Is obstinate ignorance part of your grand strategy for convincing others?}
- What counter examples? I've seen no decent ones. You meander off-topic, out of reality, when trying to make realistic examples.
- {I named SQL and TutorialDee as examples showing that distinct or layered languages are often squeezed into one tool, and I justified this assertion by naming the different language components. If you are going to claim the examples are "not decent", burden lies on you. If you are going to claim you "didn't see" them, then I suggest you visit an eye doctor. Or maybe an elementary school literacy teacher.}
- The above paragraph is not clear to me. You have a curious writing style. All I am saying is that it is practical and helpful for a basic query tool to be able to control which columns appear and which don't. I don't think that is asking too much; it's not a gold-plated cup-holder request, but for some reason that escapes me, you do. It appears due to a silly, irrational obsession with purity. You seem to be arguing that if we allow some "formatting", then we should allow all. Inclusion and exclusion are arguably not formatting anyhow.
- TutorialDee is intended as a RelationalModel teaching tool and an illustration of DateAndDarwen's abstract D specification. In and of itself, it's only intended to be illustrative, not practical. As the RelProject (an implementation of TutorialDee) is intended to evolve into a practical database tool, it will provide means to assign unique IDs when data is imported and does provide means to remove unique IDs when data is exported.
- Via imperative loops? Fugly. What's so hard about just leaving the damned keys out of the SELECT list? Is that really rocket science, or Kryponite to "pure" relational engine? Utterly silly.
- Where did I write "via imperative loops"? If you leave keys out of SELECT lists in subqueries, does that cause an exception to be thrown because you'll be supplying duplicates to the remainder of a relational expression? If you leave out the keys, by default should SELECT remove duplicates or retain them? Presumably, you'll want to be able to do both. It's simpler and least likely to cause confusion if such output and presentation concerns are left to the output and presentation mechanisms, and simply maintain the rule of no duplicates within relational expressions, which means the projection operator should strictly never emit duplicates.
- I disagree that's the simpler route for reasons already stated. I could perhaps agree that intermediate sub-queries should throw an exception, but not the final output. It's natural to put a virtual place-card over the columns you don't want to see. We did it with paper for thousands of years and it didn't kill puppies. Are you going to also disallow scrolling left and right for wide output because it might hide the keys???? Your "cause confusion" problem is exaggerated. If one doesn't know the data, they will make tons of other errors anyhow.
- I'm not sure what you're on about re paper, dead puppies and disallowed scrolling, but I think you are exaggerating the "problem" you imagine by not allowing duplicate tuples/rows in relational expressions.
- I'll ignore my experience for the sake of argument and entertain the possibility that we are perhaps both over-magnifying our personal experience and philosophical biases. The truth may be somewhere in-between. That being the case, then the practical solution still tilts to allowing bags because the existing DBMS engines allow them and thus there's tons of "imperfect" data floating around. I truly doubt the benefits of forbidding bags is sufficient to accept complicating access and query integration to millions of existing datasets. Maybe if you factor the benefits over hundreds of years, and ignore the respected principle of FutureDiscounting, bag tossing would then be the most economical.
- Why provide bags when importing bags and exporting bags is trivial, and bags serve no purpose between import and export (i.e., within the query engine itself) and merely increase the probability that a user will use the duplicate "option" to cause errors?
- Trivial is leaving the keys off the SELECT clause/operation.
- That would be a false economy, because there is no single SELECT operation (other than the name of the RelVar itself, which is equivalent to 'SELECT * FROM RelVar'), but there are a collection of relational operators, all of which exhibit closure under the RelationalAlgebra, i.e., they accept arguments of at least one relation and return a relation. There is no RelationalAlgebra operator that returns a bag. There are operators outside the RelationalAlgebra that accept a relation and return a bag. ORDER, for example, returns an ARRAY of ordered tuples. Other operators and statements permit excluding key attributes and can be used to emit duplicate tuples. Having the projection operator either return a relation or a bag, depending on an optional keyword and the context (i.e., only the final operator in a relational expression) simply adds complexity and potential for end-user confusion with no particular gain in expressivity or simplicity.
- You've claimed that already, but haven't demonstrated it with clear-cut evidence. Show exactly where is this "complexity" that it adds. How are you measuring complexity? Where is it? Point to it. How are you measuring end-user confusion? How often does it happen? How can the reader verify your frequency claim? And you haven't addressed the fact that most RDBMS technically don't use bags because they use internal unique row numbers regardless of existence of primary key.
- Okay, see ComplexityOfOutputtingDuplicateTuplesInTutorialDee.
As far as sub-queries, the only examples I can think of right now are times where the key was compound and it slowed down processing and made the query verbose to include it all. What SQL engines usually do it assign an internal surrogate key, a kind of "row ID" per above. I simply took advantage of that feature to avoid carrying around the baggage of the compound key into the rest of the query. In other words, the query engine or RDBMS provides a de-facto surrogate key, and I took advantage of it without having to actually see it. (For this reason, SQL RDBMS are not really
BagAtational after all, as described above.)
Another case is where the table is controlled by another party and they didn't include a proper key. My stuff can only read it. It is somewhat similar to the logging example above. I couldn't practically say, "You've committed a relational sin, and therefore I won't do business with you." If people get word that Rel DB's don't play well with the existing world, they will fail.
{RelationalDatabase will work fine with the existing world. If someone provided data with duplicates, of course you'd need to reformat it for relational: either remove the duplicates, or add a row number. Formatting data to make it consumable by your database should be nothing new to you, TopMind. Would you condemn SQL as being unsuitable for the existing world because sometimes you need to format input data? No? Well then why engage in hypocrisy by attempting to condemn true relational for the same reason?}
But you are making the task harder by being personally obsessed with purity.
{If I provide you data in XML, do you claim that dumping the data into 'tables' is "making the task harder by being personally obsessed with tables"? Or are you just being a hypocrite at the moment?}
I don't see your example as equivalent. It's apples to oranges. It's not about which base tools we use, but about the features of them. Due to real world situations AND compatibility with existing RDBMS, it appears rational to accept the option of semi-bag tables until the point where clear downsides are shown. I don't want to give up useful features become some people are baggophobics or puriphiles.
There's probably a good reason why early SQL DBMS allowed semi-bags. They had working lab examples of early relational systems and could see the impact of strict versus loose when they tested it on real-world data and queries.
Early SQL DBMSes allowed "semi-bags" (!?) because there was a (largely mistaken) belief that eliminating duplicates would represent an unacceptable performance and resource hit. That's the only reason.
- Mistaken? Have they admitted their mistake since?
- Who is "they"? Do you mean HughDarwen? He was a member of the ISO SQL Standards Committee until 2004 and has always deprecated violations of the RelationalModel. ChrisDate is considered a leading authority on all aspects of databases and is the author of the definitive database text, AnIntroductionToDatabaseSystems, and he too deprecates violations of the RelationalModel. That is clear indication of recognition of a mistake, but one that neither of them made. I doubt Oracle is going to own up to duplicates being a mistake, for obvious commercial reasons.
- Why should they, nobody's shown it's a big problem in practice. It only irritates pie-in-sky idealists.
- Actually, duplicates are sufficiently known to be a problem that SQL has a 'DISTINCT' option and UNION defaults to 'DISTINCT'! Many database educators advise students to always specify 'SELECT DISTINCT' and never use 'SELECT' on its own.
- Wow, a choice, what a concept! What about cleaning it up so that the "proper" way is the default (full key required), but an option marker to override it. Similarly I wish a Cartesian join required an explicit keyword when desired. Encourage good practices, but don't shove it down people's throat.
- Choice belongs where choice makes sense. In relational expressions, it makes no sense. Where it does make sense, at import and export, appropriate duplicate->non-duplicate and non-duplicate->duplicate facilities are provided. As I've asked before, can you conceive of any circumstance where it makes sense to maintain duplicate rows between data import and data export?
- Please clarify "between". I thought I gave plenty of examples already, such as the query user not being the (permitted) table creator, a common occurrence in big shops.
- As explained toward the top of this page, when importing/linking an external data source in a true relational system, if no primary key can be inferred an autonumber surrogate primary key can be declared. Not having created the table (or other data source) is not an issue.
- Declared where? If I don't have write/alter access to the table, I can declare diddly squat.
- In the true relational system. You know how MicrosoftAccess lets you "link" tables? You'll be able to do that in Rel, too. You can declare assorted options in the table linkage statements, including specifying that an autonumber surrogate primary key should be used.
- Linking tables in Access has limited functionality. It doesn't magically cut down all the boundaries between different DB's. Where would the surrogate index be stored? If there's 10 million records in it and we only have read access, does it make the 10 mil index on the client? Yikes. I've seen it get pretty bogged down if you don't plan those kinds of things well with big tables.
- Indeed. I don't pretend that linkage to external DBMSes is a wholly solved problem. This is the current work-in-progress. At present, when linking to external tables without an explicitly identified primary key, each retrieved row/tuple is simply given a unique ID. The table is not indexed.
- Back to our example, where is it "given"? How would Rel deal with such? Where would it put the "given" unique row identifier? Server? Client? Baghdad? MS-Access may be using the RDBMS's internal row number for such a table. Thus, technically it wouldn't be a bag. The devil's in the details.
- The unique row identifier is assigned when each tuple is obtained from the external table, and cached as appropriate. I don't know what MS-Access is doing these days, but Access 95 was notorious for exhibiting unpredictable behaviour when working with linked tables from external DBMSes without explicitly-specified primary keys. Some, but not all, ODBC drivers allowed row IDs to be used as primary keys.
- Access may just do it wrong. Not sure you want to stand by their product as an example. And how would the client know the identity of each row before assigning a key to it? What keeps it from identifying the same row later as a different one if you say scroll up and then back down?
- Access certainly did do it wrong in Access 95. It's straightforward to generate row IDs when none are available from the DBMS. They're typically nothing more than incremented integers, starting from zero, and the current cursor position can be used to resolve the row numbers if the retrieval supports bidirectional scrolling. Sometimes caching is used to make the row numbers semi-permanent within a given execution context, but they may have a lifetime limited only to a given retrieval. Whilst this may sound like a problem, it almost never is, as the original table had no notion of row identity.
What you propose is not nearly as trivial as leaving out keys from a SELECT statement. Does REL give out a "Purity Error" and dial the Purity Police?
Actually, what the RelProject does is almost precisely as trivial as leaving out keys from a SELECT statement. It's a WRITELN statement, by the way, and REL is a company that makes subwoofers. Rel is a name, not an acronym. (Though I do plan to buy a REL subwoofer at some point, just for the name. {And because I'm not that happy with my KEF subwoofer.})
As noted above, it's trivial to add unique IDs when importing data and trivial to remove them when outputting data. There is no reason to maintain duplicates between these two endpoints. Let me emphasise that: In a system based on the RelationalModel, there is no reason to maintain duplicates between data input and data output.
Again as noted above, one of the main reasons for avoiding duplicates is because they represent semantics but lack any description. Duplicates mean something, but what? And how do we distinguish them from erroneous double-entries?
Again again again, if it causes problems, then don't use them. I'm only asking for an option, which you say costs gazillion dollars to fulfill.
Absolutely it causes a problem. It adds complexity, both for the user and the developer. Setting aside user issues for now, to provide the capability to preserve duplicates from input to output in the RelProject would require a parallel set of data structures (bags, instead of relations), a parallel set of operators (to work on bags instead of relations), and an entire bag-oriented query optimizer. Given that there's no reason to preserve duplicates between data input and data output, I will not be implementing bags.
Existing RDBMS don't either: they have internal surrogate keys.
How (and whether) duplicates and/or row IDs are handled internally varies from DBMS to DBMS. Given that there's no reason to preserve duplicates between data input and data output, this is a non-issue in Rel. Neither duplicates nor row IDs are required (nor even desirable) anywhere between the data input and data output points.
Internal Versus External Bags
Moved from OfficialCertifiedDoubleBlindPeerReviewedPublishedStudy:
["Kind of topics???" Is there a theme to our arguments? I hadn't really noticed, other than to note your dislike of object-orientation and truly-relational databases, and an inclination to disagree with anything that isn't promoting ExBase or some close cousin to it. These days, that's pretty much everything. I've posted links to two articles that deprecate bag-oriented databases, one here and one on the BagAtational page.]
That was about machine performance, not the software creation and maintenance (query language) issues. And further, they are not really bags and/or can be converted to sets under the hood, and you conveniently deleted that issue in what appeared to be DisagreeByDeleting. A more practical performance question would be how explicit artificial primary keys (auto-keys) compete against internal primary keys and/or manufactured keys in cases where there's no natural domain key. I don't propose one use bags if there is a natural and strait-forward domain key. As I said in the stuff you deleted, it's not about bags versus sets, but one flavor of sets versus another flavor of sets.
[It wasn't DisagreeByDeleting, it was refactoring and removing all the cruft that added absolutely nothing to the points made at the very top of BagAtational. Anyway, if bags aren't really bags, then this and the entirety of BagAtationalDiscussion are moot. We can, as it turns out, dispense with bags. We can import them and convert them to sets, we can export sets as bags, but in between -- within the relational system -- we don't need bags. It's what I've said all along. I'm glad you agree with me.]
I disagree with your assessment that it "added absolutely nothing to the points made at the very top...". Anyhow, it can be a "bag" to the user but be a set internally, and usually is. When you say "we", are you talking about the internals of the system, or the user? It is possible to give a set to the internals and a bag or list to the user.
As far as the title of the page, perhaps it needs a revisit, but won't change the issues being compared.
Efficiency Relativism
As far as the optimization claims, it's a trade-off. Certain types of operations are more efficient under either. For example, inserts are usually cheaper under bags because the system doesn't have to check for duplicates. (There may be special implementations where the difference is small or non-existent, but they have other performance trade-offs.) I'm pulling another EverythingIsRelative on this one. No structure has been blessed by the gods more than another. (Under MaspBrainstorming I claim that maps may be "better" than nested lists as a base structure, but it's generally a WetWare claim, for I don't claim any fundamental law of nature favors them so far.) --top
See Also: DynamicRelational, RecordBasedDatabase, BagSetImpedanceMismatch
CategoryDatabase
AprilTen