Bag Need Scenarios Re Work

This is an attempt to produce a summarized or more compact version of BagNeedScenarios. It is not necessarily intended to replace BagNeedScenarios, but rather provide an easier-to-digest version. If the experiment succeeds, then this topic will be renamed BagNeedScenarios and the original renamed BagNeedScenariosDiscussion?.

Two version of the summary are now given due to irreconcilable differences between the pro-bag and anti-bag camp.


Summary Version A - "Anti-Bag" Camp [under constrution]

1 - Limited Access to Large Log Table which may contain duplicates due to imperfect logging equipment.

2 - Ad-Hoc Column Trimming for Fit 3 - Contracted Delivery Columns 4 - Compact Sales Summary Please keep responses brief, no more than about 50 words. Link or PageAnchor to relevant longer discussions or descriptions.


Summary Version B - "Pro-Bag" Camp [under constrution]

1 - Limited Access to Large Log Table which may contain duplicates due to imperfect logging equipment.

2 - Ad-Hoc Column Trimming for Fit 3 - Contracted Delivery Columns 4 - Compact Sales Summary Please keep responses brief, no more than about 50 words. Link or PageAnchor to relevant longer discussions or descriptions.


Removed from above:

You've lost track of the context. You were trying to back up your claim that Bags provided an advantage somewhere. A "con" that they don't hurt sometimes hardly qualifies.

This is a property of the data set. It's why the first solution (fix the data source) is preferred. If you can't fix it, you have to deal with that issue regardless of whether or not your data processing engine uses bags or sets.

1 - Limited Access to Large Log Table which may contain duplicates due to imperfect logging equipment.

Sometimes you have to deal with data sources that aren't ideal. People do make bad decisions and sometimes you're stuck with the aftermath.

The question on BagNeedScenarios is whether the technology should support people in making these 'bad decisions'. Unless you can show that bags were 'helpful or necessary' in creating Scenario 1 in the first place, it's a fine example of circular reasoning: you're assuming your conclusion in order to prove your conclusion.

The question (as I read it) was whether or not you needed to support bags inside the data management system. The issue raised by scenario 1 is what you do when an external source of data is a bag. To me, that is a real concern that can't be handwaved away by saying "only accept sets as data sources". --AnonymousDonor

I'm not clear on the above "circular reasoning" paragraph either. The author seems to consider the scenario in which we can reboot the world and start the history of RDBMS' and related tools over. Perhaps we can evaluate each scenario under the world-reboot situation and with the as-is situation. I've generally considered the as-is situation. -t

Under TopMind's laughable excuse for logic, we would also conclude that the world needs traffic jams and smog because they are part of the situation as-is. I agree that bags are part of the 'as-is situation', but that was never under contention. The argument is that bags are necessary or useful. If TopMind wants to argue that bags are more necessary or useful than traffic jams and smog, then he can't depend upon the as-is situation; he'll need to justify bags from first principles. By pointing to bags as part of the as-is situation, one can only justify a much weaker claim: that a data management system must be able to import from bag-like data sources. And that has already been demonstrated.

Like I said above, we should evaluate BOTH the scenario where we can restart history and where we cannot restart history and only control a small corner of the world. And yes you have demonstrated that non-bag systems can deal with external bags, but sometimes with extra steps and/or overhead. In other words, there is a conversion tax. Whether this tax is "worth it" or not is the real debate, not its existence. -t


Bag Pro's and Con's Summary

Con: Sets are in theory subject to better optimization

Con: Allowing bags risks inadvertent duplicates Pro: Bags are more compatible with existing database and query tools.


Meta Discussion

Your "cons" are highly unconvincing. Each example (other than 3, which explains nothing) either demonstrates some technical misunderstanding, or addresses a marginal, rare, or even exceptional case compared to what DBMSs are mainly used for. Even if such circumstances do occur, the penalty for eliminating bags in the DBMS is negligible at best. However, providing support for bags inside the DBMS deprecates optimisation and invites a category of errors (inadvertent duplicates) that are almost invariably more serious than making some developer to click an option or type a keyword in order to import or export a bag from a true relational DBMS.

You've said that all already and I disagreed then and I disagree now. The "bad things" that non-bags protect one from are also "marginal, rare, or even exceptional" in my experience. Those who do make multiple duplication errors tend to make many other kinds of conceptual errors also. Anyhow, let's focus on summaries here. Maybe we can re-work those arguments into a summary also. And, your optimization arguments are also suspect in practice.

I see. So now not only are you an expert in DBMS design and query optimisation, you know for a fact that inadvertent duplicates are marginal, rare, or even exceptional? What "experience" has led you to hold that view?

It likely wouldn't improve the optimization of any of the given scenarios (for things the scenario user has control over).


PageAnchor: economic_A

(TopMind actually believes this is a rational counter, as opposed to buzzword laden bullshit.)

I object to your rudeness. If you have a usable counter-argument, please present where appropriate.

(I object to your fallacy. Your "counter" wasn't usable in the first-place; it was first class HandWaving. If you have a usable counter-argument, please present it. There is no need to counter your bullshit.)

I honestly don't believe it's a fallacy; and even if it was, that's not an excuse to be rude. Tools and techniques that give users/decision-makers/owners more options are generally a good thing, even if you personally disagree with their final selection. I know you are upset with my content, but you are not helping with communication by expressing your feelings in such a way. I InviteModeration to help resolve this issue.

(Your implied claim - that 'baginization' is offering an objective (and therefore rational) trade-off decision on risk, time, labor, cost, etc. - is utter bullshit. And that earns rudeness towards you, you crank - you should be raked across coals, laughed at, treated as the 'butt' of a joke for presenting that sort of 'counter' without doing extensive research beforehand. If you want civilized discussion, you first need to play by the rules and present a reasonable argument. Instead, you befoul WikiWiki with page upon unending page of your irrational tripe. BagNeedScenarios was enough - you were shot down there, and it doesn't need 'rework'.)

I'm sorry, but I don't understand your complaint. It seems perfectly rational to me. I cannot see the computations in your head that makes it clear where I am "going wrong" from your point of view. I cannot read your mind. You have to articulate the alleged problem via specific and clear text. I only see what looks like an immature, emotional response from you. Does somebody else want to make a try at it? Would an informal "economic simulation" be of any help?

Removing the primary index in that example gives one lots of extra immediate space with only a slight risk and no need to convert to and learn a different database. If that is not a legitimate benefit to at least consider as a company choice, then perhaps I really am as fucked in the head as you claim. It's flat-out common sense to me. It's just every-day obvious. I cannot make it anymore obvious. It's frustrating that you see it as somehow bad or wrong or sinister. Fuckitofia Arrrrgvile. The only rational explanation I have for your reaction is that you have an obsessive and idealistic personality which ranks "purity" above any other factor out of an inborn nature. We just view the world so differently and weigh factors so differently that we will never see eye-to-eye. And, I do believe a jury of randomly-selected practitioners would mostly side with me. -t

Revisit: Are you claiming that it does not provide such options to decision makers, or that providing those options doesn't matter? As far as "extensive research", you haven't done extensive research for your side either. The default is not your position such that me not providing extensive research does not make your position the truth by default. What kind of extensive research do you want anyhow? At least let's find out what we agree on in the economic argument before doing research that doesn't affect either side's viewpoint regardless. Clearly there's an economic and time and training cost in abandoning typical shop DB's such as Access and MySql for something that may be optimized for this particular scenario. Couldn't we try to agree on a rough figure, or is that not the kind of thing that's bothering you to begin with? Communicate. -t


Title Misnomer II

None of the above -- even allowing for a certain amount of imagination and tolerance of the highly-contrived Scenario #4 -- has identified a bag need. "Need" strongly implies that bags are required, and that without bags a particular problem cannot be solved, or that it can only be solved with bags. Are there going to be any bag need scenarios, or there going to be more pointless quibbles over bag want scenarios?

As already discussed, the topic name is a misnomer. They are not needed in an absolute sense, for work-arounds usually exist. However, we are studying the net costs/benefits, not absolute need. Sets are not absolutely required either; they are just a useful tool when applied skillfully.

Be that as it may, you haven't even identified where bags would be desirable. At best, you've described scenarios where bags might mean eliminating a keyword or two at development time, but have entirely failed to address the fact that this increases the possibility of erroneous duplicates with no way to distinguish them from legitimate duplicates.

You seem to start with the assumption that sets are the default and only extraordinary evidence would dethrone sets. That's a false assumption. In practice, actual RDBMS vendors made bags the default. -t

Where have I started with that assumption, and what does it have to do with this page?

That's what your writing implies in indirect ways, at least by my interpretation.


Re: "(Note: If the RDBMS exposes a unique row id (e.g. Oracle's ROWIDs), then including the row id in the query is an example of this solution)."

I don't if it's necessary for many queries. It's available for use if desired, but it would be silly to force it's usage.

You aren't making any sense. Please clarify.

Let's back up a bit. What will including the ROWID solve? I can serve as a temporary surrogate primary key, but what else?

Further, the existence of ROWID makes such result sets more comparable to lists than bags.

{How so?}

Every entry still has a unique "position" just by being in the data structure. However, nothing beyond that is "enforced". You can have A,B,B,C for example. This can be technically converted into a nominal set: {1,A}{2,B}{3,B}{4,C}; however, from the conceptual side of the domain, it still has the properties of a list, not a set. -t

{Sorry, not following you. Not seeing any relevance, either.}

Sigh. I'll try to think of another way to say the same thing.


EXPORT Keyword Availability Problem

Moved to RelExportDiscussion


Why do you keep moving my "count" counter-argument away?

Read the statement following the new location of your "counter argument".

I'm sorry, but I cannot find a response to it, at least not one that's clearly related.

The immediately following statement is, "This is a property of the data set. It's why the first solution (fix the data source) is preferred. If you can't fix it, you have to deal with that issue regardless of whether or not your data processing engine uses bags or sets." Not sure why you couldn't find it.

Agreed, one has to "deal with it", but adding a count is not necessarily the right way to deal with it. It's a suggestion, not a fix. -t

The thing is, adding a count doesn't deal with the issue either. But, in this case, that's a good thing. We don't want the processing engine to decide what's the best way to clean up the data.

Then remove that bullet item or merge with above. Its issues may not be different from a generated key anyhow. Maybe it just needs a little re-wording to make it agreeable to both parties.


Two Summaries

The attempt at a summary is not very successful. Neither party is happy with the results and some EditWars have ensued. Instead, I propose that two summary outlines be created on the same page; one for each "side" of the debate (pro-bag versus anti-bag). This will hopefully avoid edit-wars and let each side present their viewpoint. As a good-will gesture, I'll let the pro-bag outline be first. (Dec. 2010) -top


On "Contrived"

It has been stated, or at least implied, that the examples are "contrived" and therefore "useless". They are based on actual cases with some changes to simplify the examples. But regardless, I'm not sure that being "contrived" makes them useless anyhow. For example, when writing software it may occur to me that if the user presses the Esc key at a certain spot, the program may crash. I thus may catch that situation and handle it in code. Technically, pressing the Esc key is a "contrived" scenario since it never actually happened. But that's not a decent reason to ignore that scenario. What is important is the probability of it happening. If the user of the word "contrived" wants to argue for low probability, that's fine by me. But please be clear on what is meant by "contrived". It's a somewhat vague word. -t

Top, give it a rest. I have far, far better things to do than trot out the same arguments again and again and again. Don't you?

I am convinced, more than ever, of the need for pure relational systems which will support bags at the boundaries (i.e., link to bags but represent them as relations, import bags into relations, and perhaps export bags) but never support bags inside the sytem. I won't waste my time arguing; I will spend my time building them. -- DaveVoorhis

Being "convinced" is the easy part. Humans come by it at a snap. I thought the discussion was an interesting way to explore the "purism versus practicality" issue that keeps popping up around here. Your language/tools could fit in better with existing systems and tools if you weren't so insistent on resisting bags/lists. "The boundary" can be a place where IT people waste a lot of time. Further, forbidding bags in a DB engine and/or query tool could be a configuration switch. Encourage purity, but don't force-feed it. As a user, I want to tell the tool what to do, not the other way around. -t


[1] Internal row numbers should not be used with "live" data, as they may be reused upon deletes. One has to know the nature of data source to rely on them.


NovemberTen


EditText of this page (last edited November 9, 2014) or FindPage with title or text search