Object Identity Examples

Refactored from RelationalHasNoObjectIdentity.

The first one is an example purported to show that it would be useful to have a data/object model where ObjectIdentity is just there, and the fact that the relational data model really has to suffer for lack of support of ObjectIdentity. The example was provided by RandyStafford,and I thought I convincingly shown to him why this example disproves the theory that ObjectIdentity is an useful feature in a data model.

Other may contribute examples as well.

First example

Consider the following DomainModel:

 import java.util.Iterator;
 import java.util.Vector;

 class Taxpayer
 {
 private Vector events = new Vector();
 private String name;

 public Taxpayer(String name) { this.name = name; }

 public void addEvent(TaxEvent event) { events.add(event); }

 private boolean eventsContainsAuditEvent() {
 for (Iterator i = events.iterator(); i.hasNext(); )
 if (((TaxEvent)i.next()).isAuditEvent())
 return true;
 return false; }

 private boolean eventsContainsFilingEvent(){
 for (Iterator i = events.iterator(); i.hasNext(); )
 if (((TaxEvent)i.next()).isFilingEvent())
 return true;
 return false;}

 public boolean hasBeenAudited() { return eventsContainsAuditEvent(); }

 public boolean hasFiled() { eventsContainsFilingEvent();}
 }

 class TaxEvent {
 public boolean isAuditEvent() { return false; }
 public boolean isFilingEvent() { return false; }
 }

 class AuditEvent extends TaxEvent {
 public boolean isAuditEvent() { return true; }
 }

 class FilingEvent extends TaxEvent {
 public boolean isFilingEvent() { return true; }
 }

Note that there are no instance variables declared in the TaxEvent hierarchy. I could build a working application with this DomainModel in GemStonej with no further modifications. This is the SimplestThingThatCouldPossiblyWork. I could build a UI that displays all Taxpayers, allows the user to add TaxEvents, and allows the user to see which Taxpayers have filed, which have been audited, and which have done neither.

But I could not map this DomainModel to a relational database without modifications, because there are no instance variables in the TaxEvent hierarchy to translate to columns in a relational table. There is nothing to store other than the basic intrinsic identity of the objects, which includes the identity of their class, which is what allows the UI to display which Taxpayers have suffered which events.

I'm not interested in critiques of whether this is "good design" or not, and I'm not interested in other ways of designing it so that relational persistence could be used. That's not the point.

The point is that this simple example shows the power of object identity, which is leverageable with an object database, but not (in this case) with ObjectRelationalMapping. This difference in power is one of the elements of the ObjectRelationalImpedanceMismatch.

-- RandyStafford (sometime in 2001)

Response to this example.

Doing some modeling work before I have the full specification of the problem is a waste of time. What exactly is the functionality those classes are supposed to accomplish ?

If the functionality is only to show which person filed taxes and which has filed audits, and which did both and so on, then the biggest fault of his model is that it is redundant - no need to maintain a collection of events for that. If the application is allowed to delete events from one Person's record (as SHOULD be the case, since a business system should always be able to deal at least with human errors in data input), than the biggest fallacy of what he proposed is that an Event has no logical identity. -- CostinCozianu

An event does not need a logical identity because its intrinsic object identity is completely sufficient - that's the whole point. Adding delete functionality would be trivial - it would require only the addition of a removeEvent(TaxEvent taxEvent) method to class TaxPayer?, which delegates to Vector.remove(Object object), which operates on the intrinsic identity of the collection members. -- RandyStafford 2001/07/23

And how do you identify that object identity to the user of your system ? I guess you forgot this argument. I guess you really need to delegate to Vector.removeAll(Collection c) and not even that would be enough. Dr. Codd discussed all the issues that need to be discussed about object identity way back in 1968. I guess we haven't made that much of a progress, if we are still discussing it now. See AnIntroductionToDatabaseSystems, FundamentalsOfObjectOrientedDatabases. -- CostinCozianu 2001/07/23

Therefore the user of the hypothetical application will have NO means to identify which exact information is to be deleted.

Yes, he will - see above.

So you expect your user to see an FileEvent 0xaabbccdd label in a list and identify that this is really the information to be deleted ? -CostinCozianu 2001/07/23

his is a general and fundamental problem in dealing with objects that have "internal identity", while they don't have any logical identity (such as account no, and so on). The 'object identity' as is defined and used in some ObjectOrientedProgramming models (not all OO models support object identity) has no business in a valid domain model whatsoever. Therefore RelationalHasNoObjectIdentity is not a problem but a virtue of the relational model, it may be a psychological problem for the people who think in OO only terms, and this will lead them to serious logical mistakes in data modelling and therefore in domain modelling .

If he will clarify his position, I'll be able to offer him one of the many possible solutions for the problem at hand. -- CostinCozianu

Judo time :). Perhaps it would be more accurate if we say that objects have no intrinsic identity? It is impossible to say if an object is a copy or not unless it contains within it some logical identifier. Simple physical identity cannot do that. This is a heavy limitation and confuses physical(implicit) with logical(explicit) identity. If you want to separate these concerns, and if you want your data to be internally consistent rather than coincidentally consistent, then each object requires a logical key. Logical associations cannot rely on a memory location, so they must also use these logical keys. This is the relational model, a model where identities are explicitly defined as internal logical concepts and thus may be verifiably consistent and complete. -- RichardHenderson.

Exactly right Richard. The above model is not a model at all, if you put it like that in a database, be that an object database, you are in flagrant contradiction with DatabaseIsRepresenterOfFacts. You represented no facts at all, nothing identifiable to the end user. Let's try to see what the correct solutions are. The risk is that RandyStafford will come back and tell us that the intended use was larger and we might have needed a richer schema, but his objects with no data can't be of much use anyway. So under the reserve that a good schema can be designed only after you know its intended usage, and storing objects SHOULD NOT BE the intended usage of any schema (not even for an OODB schema !), we'll see how easy it is to model the case presented above (pretty lousy presentation).

Whether we want to find out only if a tax-payer has filed or if a tax-payer has been audited , we can very easily use two columns HAS_FILED, HAS_BEEN_AUDITED- both booleans, and we can attach that to an hypothetical TAXPAYERS table, of course we have to assume that the database will contain information for at most one year and many other assumptions that belong to the fallacy that Randy Stafford wants me to do modeling without at least summary information on what he wants to do.

If we are really interested to know how many times has filed or how many times has been audited (suppose I can file more than once, or I can be audited more than once), I'll have instead two columns NR_OF_FILED_EVENTS, NR_OF_AUDIT_EVENTS,which would be enough to show me if a tax payer has filed at all ( NR_OF_FILED_EVENTS>0), and even how many times (essentially this seems to be the ONLY purpose for which RandyStafford wants to keep a vector of events, otherwise two events of the same type are essentially unidentifiable).

And finally, if I have more info on each tax file and on each tax audit, I won't have ANY extra column at all because the whole information can be deducted from where I store the primary information on files and audits and I DON'T WANT a denormalized schema.

In this way I am within the principle of DatabaseIsRepresenterOfFacts, while Randy Stafford is not - even if he pretends he presented a domain model. As it can be seen the facts that I intend to represent in the database are not exactly what RandyStafford really wanted (the image of his precious objects stored on the disk), but are the essential facts and are understandable and are biased towards the users of my system and not towards the compiler and Java runtime.

This is the difference between an information model (domain model if it so pleases Randy Stafford) and an object model. Not all object models are proper domain models. ---CostinCozianu

The above model is not a model at all, if you put it like that in a database, be that an object database, you are in flagrant contradiction with DatabaseIsRepresenterOfFacts

I don't see that. Just because the Randy's implementation does not represent facts as columns in a table does not mean that a model doesn't exist nor does it mean that information isn't represented. As a matter of fact, I believe his model is 'normalized' in the sense that no proposition is stated multiple times and can be represented as a tuple (given the proper mapping), thus satisfying the definition of a database in DatabaseIsRepresenterOfFacts.

-- MarkAddleman

Randy's model above is denormalized because events of the same type are logically unidentifiable. How do you expect the end-user to understand the following as facts: Mark Addleman had 5 audit events , such as: 0xaabbccdd, 0xaabbccff, 0xaaeeccdd, 0xaabbc1dd, 0x2abbccdd. How is the user suppose to correct the fact that an event has been erroneously introduced twice in the system, choose an address at random?

The key here, is that in case we decide to store the above objects as such in an OODB, then we don't represent facts about the underlying reality that we are suppose to model, we only represent facts about our internal OO runtime. Those facts have no logical identity , they do not exist in what we try to model, because object identity ultimately doesn't model anything outside a runtime, be it even runtime extended through persistence to images on disk.

The minute you want to pass the barrier of pure example (you can't do that in relational) and you want to actually represent something about what happens to the taxpayer, you'll see that you will need some logically identifiable information that will make your logical identity about your objects. Then it would constitute a design flaw if you give up the logical identity and base your design on the instance physical identity. -- CostinCozianu

Maybe the above is a special case of the more general "logging events". When events are logged, there may not be a unique domain key; they are simply events. (For the sake of argument, let's not assume we create a sequential key.) There is no guarantee that two events cannot happen at the same thing, so a time-stamp is insufficient. Still, it seems there needs to be some way for the user, at least, to uniquely identify a given event. Suppose it is a web application and the user clicks on a line in a event listing to do some operation on that event. The web browser has to send information back to the server telling it which event was clicked. If we don't have a unique identifier, then how is the server going to know for sure what was clicked? Practical issues generally "want" a unique way to identify things exposed to the outside world. Whether this violates some paradigm rules somewhere or not is moot when you just plain need a feature.

Seems to me the big difference here is that the relational people are arguing from a different perspective than that OO people. Logically, at the domain level, I believe Costin is correct, object identity is unnecessary, only information that a human could use to distinguish the object, is necessary. There must exist a logical candidate key. However, this is only at the logical level. At the implementation level, object identity seems to be necessary for OO. Given this argument, logically, a database should only contain keys consisting of actual domain data, however, current databases punish us severely when we actually implement this, both in performance and maintenance, especially when the candidate key is multi column. OO people solve this by simply creating a OID, and using it for implementation purposes like easily joining tables, and allowing the natural key to change and evolve as the business changes an evolves without breaking the OID. This of course creates a divide between the logical key and the physical key which can get out of synch, but its benefits seem to OO people, myself included, to be worth the cost. Object identity seems to be a compromise between the ideal implementations of relational, and the reality that none of the current databases actually implement it very well. You cannot have agreement when one side is arguing from an implementation perspective while the other is arguing from the abstract ideal of a theory that isn't actually properly implemented. Of course, that's just my take.

First of all, as I noted in AutoKeysVersusDomainKeys the swiping assertion that using an OID for primary key provides implementation advantages (like speed, size, etc) is altogether unwarranted. It may be so in your average case, but the exceptions are so many and so common, that you are better off pretending that you don't know what's the average case and analysing each table/domain object on a case by case basis.

I assume you meant sweeping assertion, if so, I didn't say it applies in every case. Certainly, it would be wise to analyse on an object/table basis, and apply either a IOD, or a natural domain key, if a really good one exists. However, if everything is a special case, it can actually make implementation more difficult than picking one side or the other, and just being consistent. The OO side of the issue is that good natural keys are rather rare, because they change too much at the whim of the business. It seems easier, to us as least, to have two keys, a logical key, and a physical key, the system is implemented with the physical key, but the UI is presented to the user with the logical key. Lookups are based on the logical key, and when it happens to change for business reasons, nothing breaks. You have to admit, it's a tough issue, there is no single correct answer, either way you go, there will be trouble.

If we set aside the question of implementation advantages - where I agree that there may be cases where you want to use an OID for physical optimization reasons, the important principle remains that the logical level (in Fowler's terms called conceptual modeling) applies to object models as well. The example object model shown above is not a very good design just because it suffers the logical defects of not having value identifiable objects. It can be substantially improved by not using the inherent object identity available as java pointers, and doing a better job of object modeling.

Possibly, but it was a toy sample meant only to highlight the issue that that particular model could not be stored relationally, which it does, you can't really critique it as a good domain model, it wasn't meant to be one. In truth, Randy's model does contain data that could be stored relationally. We only need to look at the public interface of the class to see it's data, and the private fields to see what we need to store.

    class Taxpayer {
        private string name;
        public boolean hasBeenAudited();
        public boolean hasFiled();
    }

can be stored as

    Table Taxpayer {
        name varchar
        hasBeenAudited bit
        hasFiled bit
    }

The fact that he implemented hasBeenAudited with a collection of objects each representing either true or false, just complicates the fact it's a boolean being stored. Adding a tax audit event is equal to setting hasBeenAudited to true, same applies to filing events. It doesn't matter if you've been audited 10 times, since the model represents them all as a single bool. However, the object model has some hidden information that the relational one can't store, you can add 4 audit events, remove two, and the object knows it still has audit events, but the relational model doesn't know there were 4, you'd have to add a column numAuditEvents and numFiledEvents to keep track of how many had happened, if this were actually important. You could then make hasBeenAudited and hasFiled computed fields, rather than mutable fields. The tax objects are basically being used for counting, thus a collection of them can be represented as an integer. The following should store that model sufficiently. However, since object identity is being used for the add and remove methods, I'm not sure that this model could be resurrected properly back into objects, without changing the model and making the tax events into value objects.

    Table Taxpayer {
        name varchar
        numAuditEvents int
        numFiledEvents int
    }

In this particular case, I don't see the value of using the IOD, I'd probably just make the class

    class Taxpayer {
        private string name;
        int numAuditEvents;
        int numFiledEvents;
        public hasBeenAudited(){return numAuditEvents > 0;}
        public hasBeenFiled(){return numFileEvents > 0;}
    }

and only worry about making a tax hierarchy, when I knew for sure I'd have many tax events, and they had some behavior. It also occurs to me that his model is actually using the inherent existence of the object, to polymorphically count events, their type is the information he's using, as well as their IOD. This is the problem with toy samples, there are too many unknowns to use them for analysis. Maybe someone should attempt a more real world sample of how object identity is important.

And just having re-read the page again, I see that I have actually repeated Costins earlier argument against the model, and even used the same names doing it, we definitely need a better sample to demonstrate the OO side. I'll have to see if I can think one up.

The irony of software bugs: the QuickChanges as of 2004-09-09 19:45 showed ObjectIdentityExamples twice in the listing :) Of course there's a rare bug in Ward's script that pops up once in a blue while, I wonder however what was the probability that it would choose this subject. Arguably a relational backed wiki would not have this bug, and we have a few wiki implementations storing data in databases, but to root out such a bug entirely we'd have to AlwaysUseSelectDistinct (yet another "controversy" that needs a funny example).