Classification Philosophy

The philosophical notion of "classification" or "categories"


Classification is the use of generalization. Generalization allows a creature to treat a specific instance of something the same as it did to similar items encountered in the past. That is, if enough traits of a target match something encountered in the past, then it can be considered the same for practical purposes. (Actually, fuzzy matching is done so that partial matches are weighed also.) While hunting for worms, a fish does not need to investigate every rock by biting it to make sure it is not a shark. That would take to too much energy.

This ability to generalize traits allows "inference". A human hunter does not need to study and dissect every animal to know the characteristics of a given animal. If it fits the usual traits of a fox, then the hunter can hunt it just like he/she did with any other fox based on appearance alone. (If it turns out to be a possum instead of a fox, then the hunter may have to rework his mental nodes to be able to distinguish.)

It is hard to describe a notion with any precision or comparability without presenting some kind of model. If forced to pick a model for the human mental model of "classification", I would select a graph where each node is a thing or idea, and the edges (arrows) are weighted. We may want to make a distinction between concrete items, like a given person, and general, abstract ideas such as "family member".

We may also want to introduce the idea of "traits". For example, a fox is an animal that has a pointy nose and pointy ears. We could also associate these to our above nodes, with weightings. When a person sees an animal with a pointy nose and pointy ears, the "animal" neurons, "pointy-nose" neurons, and "pointy-ear" neurons may all fire, and the node(s) with the best fit may signal activity. Since "fox" is a good match, the fox neuron(s) may signal the loudest. Their input filters best match the multiple incoming "wires".

The rabbit node may have an incoming wire for "pointy ear", but not for "pointy nose". Thus, it will not respond as loudly, and thus given a lower rank or less attention. (It perhaps may have a link to "pointy nose", but the threshold value or "multiplier" on that line may be weak. Some speculate the brain may cut off such links over time since they don't contribute much yet take up space. Lack of a wire between concepts is equivalent to a wire with a zero threshold, and a close approximation of one with a low threshold.)

This is kind of a form of fuzzy PredicateDispatching where matches are based on weights instead of Boolean hits-or-misses. (I'm oversimplifying how human neurons work here for the sake of illustration.)

I am going to not make a hard distinction between concepts and traits in the model because they are relative. A trait in one circumstance is a concept in another. Thus, here is the working model in relational form:

 Table: concept    // includes "traits"
 -----------
 conceptID
 conceptDescript
 isConcrete         // 1=concrete, 0=abstract  

Table: conceptLinks --------- conceptRef1 conceptRef2 weighting // -1.0 to 1.0

(This model excludes the concept of differing belief systems. For example, reasoning about what an enemy may and may not know about one's own tribe, such as location, involves creating multiple "viewpoint universes" which may have somewhat different graphs. It also allows negative weights, which biological models only provide indirectly. It is thus a shortcut.)

This fuzzy approach (via weighted values) comes from our biological roots and could be called the "organic approach". However, for typical computer systems this approach is usually difficult to use and is relegated to AI-like tools and research where non-organic approaches are preferred.

The fuzzy approach is avoided mainly because it is too hard to reason about. It is hard to create theorems about and hard to debug. Thus, we tend to use simplified or "trimmed down" abstractions of this approach.

One such simplification is using Boolean associations instead of weights. If somebody orders a "fox" from a zoo animal distributor, fox-ness is treated as a Boolean trait in our formal automation systems and legal practices. If the vendor does not know for sure whether an animal is a fox or not, they better not ship it. Leave a note for the biologist contractor to take a closer look at it when she comes in next Tuesday. The also vendor couldn't get away with producing a catalog that listed prices of something that might be a fox.

It is too hard for society to interface quickly with each other and make timely transactions if there is that much uncertainty[2] such as continuous traits. Thus, we Boolean-atize stuff to create a UsefulLie. Buyers will get confused and verification becomes difficult if we sell too many hybrids or uncertain items. Commoditization tends to lower prices for this reason. Buyers can focus on a single trait: price, instead of dealing with a bunch of weighted factors.

In computerized systems we may apply even heavier forms of abstractions for the sake of both computer efficiency and programmer efficiency. A medium-scale classification system may actually somewhat resemble our organic model above: they tend to be based on sets, which are a form of graphs. The "links" usually don't have weights (although a "primary category" is not uncommon to simplify reporting), but there are other limits, as we shall see. A production category system may resemble:

 Table: products
 -----------
 productID
 productDescript
 supplierRef
 listPrice
 etc...

Table: categories --------- categID categDescript

Table: productCategories ----- productRef // "Ref" means foreign key categRef isPrimary // yes/no

(How people mentally associate a price with a product when they need to memorize is harder to model. Since I am not good at remembering specific numbers, I may not be a good introspection subject. I often have to use tricks like associating groups of numbers to birthdays or years of events, etc., and thus am perhaps using something similar to the opening model still. Humans, at least me, don't seem to have a well-designed dedicated "trait" system, but rather piggyback on the association framework instead. Otherwise, we wouldn't need notepads and computers.)

An issue with our product categorizer approach is that categories are not associated with other categories. We can't categorize categories here, only products. However, this is usually good enough for business systems because categories of categories are difficult to maintain and comprehend for most product clerks, and most companies don't wish to hire dedicated "category managers" who have the time and skill to deal with the tricky issues. Thus, the product system is actually a DAG (DirectedAcyclicGraph), and not a true graph.

So, back to the topic of regular IT automation. On a smaller scale, such as the "module" level in programming, a DAG-based model like our product example may also be overkill. The smaller the scale, the simpler the classification model we tend to use. I observe a rough pattern to the kind of classification system/model used based on the scope or complexity of the classification needs. From most general/flexible down to the simplest goes something like this:

The simplest of the classification systems used in software are based on a single selection from a basic list. A typical list may contain Text, Boolean, Integer, Real, and Date (a "list" of five categories). Languages with more complicated typing, including user-defined types, may use Trees or DAGs. These two approaches make substitutability both possible and predictable, trees more so than DAGs.

They tend to use the "is-a" viewpoint whereby if something belongs to a given classification (or "type" perhaps), then it inherits the trait of its parent(s) classification. This allows the treatment of an item to "grab defaults" automatically. If X is-a Y, then X automatically inherits all the behavior of Y, except for the sub-parts that we explicitly alter (override) or define. (See DeltaIsolation) This is similar in concept to the inference that the fish did with rocks in our earlier example.

Inference is also part of our original model so that Trees and Dags have no monopoly on generalization-based inference. But the original model is too unwieldy if we only want inference abilities. It may result in multiple answers, and may have cycles (loops). Our brains probably handle these by limiting the time to investigate triggerings of relationship links and take the best fit available at the time of decision. Or perhaps, each generation of the "loop" gets weaker and weaker until there is not enough energy in the trigger path to keep going and the last remaining signal is the one used. It is a form of attrition.

Either way, the brain's exact mechanism is not fully known and may involve a combination of culling techniques. Thus, our software tends to avoid them and uses un-weighted trees, DAGs, or lists instead. There is either one "answer", or the multiple answers are easy to determine during the writing of the software by both programmer and compiler such that run-time surprises or compiler errors are far less likely. We know how those work and know how to process and debug them without deep AI knowledge and complex interpreters/compilers.

But, there are limits to Trees and DAGs. The real world[1] tends toward a graph, not Trees and DAGs. Trees and DAGs are merely UsefulLies that work at the small and medium scale or scope. (See LimitsOfHierarchies)

The bigger the scope, context, and complexity, the further up the classification system complexity scale (above) we go. AI systems that try to model common knowledge of the real world almost always have to go above Trees and DAGs. Even tools like the Google search engine will more or less resemble the opening model, including weights. They are building a giant knowledge machine that spans multiple domains (topics) and locations. Yahoo once tried to make a giant tree of web categories, but it failed in the market place. It was too large to comprehend, and dealing with overlapping categories became an ever growing problem. After a few hundred or so nodes, human modelers (including programmers) have difficulty dealing with trees.

The above list of levels may not be exhaustive. They are simply classification abstraction models that humans have found useful. There are other possible techniques both discovered and undiscovered that may someday catch on. It would thus be premature to conclude the classification abstractions are somehow a "magic set" written into the fabric of the universe. They are simply UsefulLies.

It may also be possible to create a larger-scale classification system based on lots of small ones glued together somehow. Software engineering attempts to tackle these kinds of problems, and arguments often brake out about how much or if it should be based on a centralized model versus lots of smaller models hooked together to make one big model.

[1] Or at least the most general way humans interpret it.

[2] Old-style markets were better equipped to deal with such fuzziness via bargaining and negotiation. We tossed some of the organic approach as a trade for efficiency and model simplicity. Haggling does not scale well.

---


EditText of this page (last edited September 18, 2007) or FindPage with title or text search