Classification Is Tough

At lot of problems in my domain (custom biz apps), programming code classification, and security classifications seem to point to a common theme: classification is tough. Reasons for this include, but are not limited to:

--top

I agree. ClassificationIsTough. Not only in your domain(s) but in any suitable complex domain. Kind of an IncompletenessTheorem?. There are also results from the social sciences that you should never let a group determine classifications for items. At best classification along simple measurable indicators (e.g. time, priority) may be used. Everything else inevitably takes up lots of time and no results.

Often the group has to determine classifications because they need them for their work. I generally try to let a power user of a particular group have a way to manage their own classifications for their group's needs. There still may be a master or default classification outside of specific departments, though. Almost any non-trivial business system is going to need a CMS - Classification Management System, of some sort. Trying to be a central classification cop that makes every department or user happy can be a daunting task. Thus there comes a time when it's easier to let them create their own as long as it doesn't clobber others'.

The beauty of set theory is that being a member of one set can be independent of membership to another set. (The sets themselves may need to be members of other sets, such as for indicating which department manages a given set.) The down-side is that sets are difficult for most users to grok. A power-user or "business analyst" is hopefully available to help with that so that programmers are not spending time shuffling instances between sets all day. -t

Methinks you misuse the phrase set theory, which focuses far more upon structure, description, and properties of sets than membership. A proper rephrasing would be "The beauty of sets is that being a member [...]".

As far as being 'difficult to grok', well, while I disfavor inventing statistics about what 'most' users have trouble with, I cannot name anyone who has had difficulty distinguishing 'red things' from 'pencils' or grokking that some things are 'red pencils' and fall into both sets.

Rather than difficulty grokking sets, I suspect there would be more trouble with using them to drive code. If you're supposed to perform behavior X when you see a red thing, and behavior Y when you see a pencil, what are you going to do when you see a red pencil? In general, X and Y won't even be compatible behaviors... i.e. 'X' might be 'turn left', and 'Y' might be 'turn right'.

Specialization of behavior based on flexible classifications is also a problem:

Consider that predicates - functions that return true|false for their argument, usually without any SideEffects? - are a powerful vehicle to express arbitrary sets (including infinite sets, and even non-enumerable infinite sets - there are many infinities in set theory).

 (define (member? X Set) (Set X))
 (define (union A B) (lambda(X) (or (member? X A) (member? X B))))
 (define (setdiff A B) (lambda(X) (and (member? X A) (not (member? X B))))
 (define (list->set L) (lambda(X) (contains? L X))
 ;...
 (define Naturals (lambda(X)(and (integer? X) (greater-or-equal? X 0)))
 (define MyFavoriteNumbers (list->set '(6 7 12 42 108)))
 (define RussellParadox (lambda(X)(not (member? X X))))

By RicesTheorem, we know that we cannot, in general, know whether one set expressed by such a flexible mechanism is a subset of another. That is, it is impossible to write a 'subset?' function that will simultaneously be correct and terminate. While a sufficiently smart human may be able to tell that (subset? MyFavoriteNumbers Naturals) is 'true' by destructuring the above definitions, even that relatively trivial case is a challenge to automate.

If one cannot automate a 'subset?' function, then one cannot, in general, automatically compose two or more modules (with their new rules and classifications) in a manner that respects dispatch to the 'most-specialized' code. This adds to the problem mentioned earlier where incompatible rules might apply in the event of overlap. In general, a human will need to be involved every time two or more modules are brought together. This modularity issue has undermined the promise of such powerful classification-driven-behavior mechanisms as PredicateDispatching.

That said, the flexibility of rules and code driven by domain-data and domain-classifications can be very nice. Perusing some examples of rules-driven programming might be of interest; many may be found at the InformLanguage websites - consider http://inform7.com/learn/eg/bronze/source_2.html or http://inform7.com/learn/eg/bronze/source_28.html . Inform uses MultipleDispatch (restricted up to ternary verbs - the highest used in English) and a SingleInheritance taxonomy for the domain (simulation/game) elements - so it isn't as flexible as TopMind might favor, but remains impressive. But MultipleDispatch is probably the most powerful of practical (modular, high-performance) mechanisms for classification-driven behavior available today.


CategoryClassification, CategoryAbstraction


EditText of this page (last edited May 14, 2013) or FindPage with title or text search