Categorization Models

Models or methodologies for categorizing things.


Trees

Benefits: Easy to understand and very visual

Drawbacks: Don't handle multiple orthogonal traits very well (see LimitsOfHierarchies)


Mutually-Exclusive Groups (a "flat tree")

Benefits: Easy to understand and implement

Drawbacks: Don't scale well


Sets

Benefits: Flexible, scale well

Drawbacks: Not very intuitive to uninitiated users, lacks easily printed "visual" representation

(Variation: fuzzy sets. The link between sets has a weighting factor.)

Example: AccessControlList, http://www.geocities.com/tablizer/sets1.htm

see SetTheory


See-also Lists

Benefits: Intuitive

Drawbacks: Require too much effort to update


Directed Acyclic Graphs

Benefits: Flexible, scales well, and easy to understand. Handles orthogonal traits easily.

Drawbacks: Possibly some of the issues cited in NavigationalDatabase

Examples?


The Cadillac[1] of classification systems in my opinion would be a combination of weighted sets and weighted see-also's. For any given item to be classified, one could create a list of weighted categories and weighted see-also's. Example:

  Item: "Perkins Lab Coat #6"

Categories ----------- Medical Supplies: 1.0 Scientific Supplies: 1.0 Clothing: 0.8 Safety Equipment: 0.4 Halloween Supplies: 0.3

See Also: -------------- JVM medical washing machine: 0.9 Perkins all-in-one surgery kit: 0.6

Weights are assumed to be zero to one. In practice, this could be implemented with an RDBMS using two many-to-many tables with weights. If an interface did not want to use the weights, then they can default to some "safe" number, such as 0.7.

Note how the "see also" list allows associations related by use rather than similarity. Washing machines are in no way similar to lab coats, but they are still related. One possible limitation is that we may want to reference the category of washing machines instead of just an individual product. One could argue that such goes in the category list instead of "see also", but the relationship may not be clear. Perhaps an optional "note" column could be added to both the categories and see-also tables to explain the relationship. In this case it may be obvious, but maybe not in others.

[1] AmericanCulturalAssumption: "Cadillac" used to be the most expensive car. Imports generally have taken over this niche, but the phrase has remained.


Related key-words: taxonomy, classification


CategoryOrganization CategoryHierarchy


EditText of this page (last edited January 2, 2008) or FindPage with title or text search