How Many Things Is It

An excerpt from DataAndReality.

A single physical unit often functions in several roles, each of which is to be represented as a separate thing in the information system. Consider a database maintaining scoring statistics for a soccer team, both on a position basis and on an individual basis. The database might have representatives for 36 things: 11 positions and 25 players. When Joe Smith, playing halfback, scores a goal, the data about two things is modified: the number of goals by Joe Smith, and the number of goals by a halfback. That human figure standing on the field is represented as (and is) two things: Joe Smith and a halfback.

Consider the question of "sameness". Suppose Joe switches to fullback, and scores another goal. Did the same thing make those two goals? Yes: Joe Smith made both. No: one was made by a halfback, the other by a fullback.

Why is that human figure perceived and treated as two things, rather than one or three or ninety-eight? Not by any natural law, but by the arbitrary decision of some human beings, because the perception was useful to them, and corresponded to the kinds of information they were interested in maintaining in the system.

If the file only had data about player positions, then the same physical object would be treated as being different things at different times. Joe is sometimes a halfback and sometimes a fullback. From the perspective of this file, his activities are being performed by two different entities.

Also consider two related people (e.g. husband and wife) who work for the same company. When considering medical benefits, each of these people has to be considered twice; once as an employee, and once as a dependent of an employee. How many people are involved?

Or suppose a person held two jobs with the company, on two different shifts. Does that signify one or two employees? Shipping clerk John Jones and third-shift computer operator John Jones might be the same person. Does it matter? Sometimes.

The notion is also applicable to warehouses. From the point of view of another application, the thing involved is not a warehouse at all, but a building or property on the assessment rolls.

It is plausible (bizarre, perhaps, but plausible) to view a certain employee and a certain stockholder as two different things, between which there happens to exist the relationship that they are embodied in the same person. There would then exist two representatives in the system, one for the employee and one for the stockholder. It's perfectly all right, so long as users understand the implications of this convention (e.g., deleting one might not delete the other).

Transportation schedules and vehicles offer other examples of ambiguities, in the use of such terms as "flight" and "plane" (even if we ignore the other definitions of "plane" having nothing to do with flying machines). What does "catching the same plane every Friday" really mean? It may or may not be the same physical airplane. But if a mechanic is scheduled to service the same plane every Friday, it had better be the same physical airplane. And another thing; if two passengers board a plane together in San Francisco, with one holding a ticket to New York and the other a ticket to Amsterdam, are they on the same flight?

Classification, e.g., of skills, impacts the notion of "sameness" as much as the notation of "how many". The way we partition skills determines both how many different things we recognize in this category, and when we will judge two things to be the same. Consider a group of people who know how to do such things as paint signs on doors, paint portraits, paint houses, draw building blueprints, draw wiring diagrams, etc. One classifier might judge that there is just one skill represented by all of these capabilities, namely "artist", and that every person in this group had the same skill. Another classifier might claim there are two skills here, namely painting and drawing. Then the sign painter has the same skill as the portrait painter, but not the blueprint drawer. And so on.

The same game can be played with colors. Two red things are the same color. What if one is crimson and the other scarlet?

The perceptive reader will have noticed that two kinds of "how many" questions have been intermixed in this section. At first we were exploring how many kinds of things something might be perceived to be. But occasionally we were trying to determine whether we were dealing with one or several things of a given kind. If you can't apply that distinction to the preceding discussions, then please don't become a database administrator. I fear your database may well become a minefield of semantic traps.

For another example of the latter kind, consider program problem reports (known as APARs in IBM). Considerable effort is often expended in determining that the symptoms reported in two APARs are caused by the same programming error; thereafter, the two APARs are considered to be the "same". (The correctness of this view depends on whether you think the entity involved is the programming error or the problem report.)

And analogously, much of the fuss in many insurance claims and court battles revolves around determining whether several things relate to the "same" illness or injury.

Any suggestions for how to model something like this? -- JeffGrigg

Like, does the player object, Joe Smith, "have" a position object? When the player object "scores a goal", it forwards the method call to its current position object; both objects keep statistics.

You might want to model each pair in the player/position example as a tuple (person/position). Each of these possible combinations is unique. You keep the scores for these tuples, not a person or a position If you want to find out how many goals John Smith shot, you aggregate along the person dimension. (This has probably a very poor performance). The point I'm getting at is that that a person doesn't have a position and a position doesn't have a person. The relationship is symmetric. I would use explicit relationship objects that update the respective scores for positions/persons.

-- ThomasMaeder

If your main BusinessObject is the individual person (i.e. your customers always talk about JoeSmith? scored this, JoeSmith? the halfback, JoeSmith? was reassigned, ...), then I think the RoleObjectPattern is the right choice. This is because ObjectIdentity is relevant only for the person, not for its role (e.g. halfback). This pattern allows you to handle the different roles separately, without loosing the identity properties, that seem to be so important to you. -- GunnarZarncke

A related concept: let's say you own a computer. You decide to swap the harddisk for another one. Is it still the 'same' computer? Are the licenses still valid? Then you change the case. The motherboard. The cooling fan. The wires. Everything. Is it still the 'same' computer? For licensing purposes? Then, since you kept all the parts you swapped out, you put those together to form another computer. It looks exactly like the computer you started with. Actually, in a way, it is. Now what? Which is which? -- AalbertTorsius