Bag Discussion

'Bag' is another word for MultiSet, i.e. a set which can contain more than one instance of a given thing.

In Smalltalk, a Bag is implemented as an unordered collection of keyed objects, likely using some kind of hash. It holds any number of objects, in arbitrary order, but optimised for looking them up according to some key value, and allows multiple objects that share a single key. The typical operations on a bag are "insert this object", "remove this object", and "give me a list of all the objects with this key".

A Java Map/HashMap is not a bag. A Map in java maps one key to one value.

RdfLanguage also uses "bag" to refer to an unordered collection.


It's actually difficult to implement a true bag. Apps usually need a way to reference dynamically-allocated things/objects, which requires a unique identifier. In practice this usually means it's really either a map or a list. Such an API will usually do one of 3 things:

If it's a write-only data structure, then perhaps it can be a true map (from API user's perspective), but such is rarely useful. The closest I've seen is logging systems that simply record events. But even these usually have some order to them, such as line number in a file, or record ID in a database. Most databases either have an internal record ID (extracted with a special function or dummy column name), and/or preserve order (list). Plus, it's usually useful to create an auto-number index key to serve as a reference number. (Although in a few cases, I simply had to use existing tables and had no DBA privileges to add an auto-number. See BagNeedScenarios.)

--top

Your argument leads to a conclusion that "true bags are not useful for most applications" as opposed to your thesis that "true bags are difficult to implement". I agree. Bags are mostly useless, in practice.

The two issues are related because one usually needs a way to identify something to "retrieve" it. Perhaps I need to reword some of it.

Efficient retrieval of a specific element is not part of the `bag` abstraction. You can always remove elements one at a time until you find the one you're looking for.

I think your concept of how "most databases" are implemented is not well tested against actual database implementations.

The one's that I'm familiar with either have an extractable internal "record ID" of some sort, and/or preserve order for non-keyed tables. They behave more like lists.

Which databases are you familiar with? To what extent do you use sharding and related features?

No, I have not tested their inherent ordering characteristics under sharding.


See also the Dictionary of Algorithms and Data Structures: http://www.nist.gov/dads/HTML/bag.html --ChristofferHammarstrom

See OrderedBag

CategoryDiscussion


EditText of this page (last edited December 5, 2011) or FindPage with title or text search