Object Identity Crisis

We have an elderly client server product first released in 1996. Although not truly OO, some of it is close. It has a thick client that does a certain amount of processing on the user's desktop machine. In this product I see two common styles (patterns?) of client server application:

The client loads an object via the application server (or creates a new object) and modifies the object's attributes. The object is then returned to the database via the application server that does various business logic checks.
An object is loaded or created by a state-full application server and then the client remotely changes attributes or calls methods on it.

The first style is used for maintaining small objects where no business logic is present. The second is used when the object contains business logic.

In this scheme objects are identified by their database primary key (EJB seems to suggest something similar). If the key for a new object is not unique when it reaches the database it is bounced. However, our keys typically incorporate the name of the object as entered by the user. These names cannot be changed once references to them exist, or the references will break.

During some preliminary work for a re-write in Java (later canned), we re-examined this and tried to come up with a way of uniquely identifying these objects which would allow them to be renamed by the user. Using a GUID was suggested. An object could be created in the client or the server and assigned a GUID locally and this would be used as the primary key once it reached the database. As we tried out a prototype a number of flaws appeared:

We have always exposed an API for our customers to write their own clients that access our business logic. Not all languages or platforms they might use would provide the means to create GUIDs.
GUIDs are bigger than some databases can store as a number.
The prototype code we wrote to maintain and use our first objects uncovered an identity crisis. It is probably there in the original product but the GUID had made it more acute. Was an object a copy of a validated object in existence on the database, was it completely new, or did it have the same ID as an existing object but contained changes? We tried to track these by use of state in the object but it got messy.

I soon left that project, and it was canned soon after in favour of a thin client solution(?). However, I have continued to ponder this. Here is my new ideal way to deal with object identity: clearly separate the idea of the business object from the OO language object that represents it, and give the business object one home - on the database.

An object (in the business sense) should only exist once it has been validated and has hit the database where it will be persisted. It can then be "represented" within code by an instance of a class (immutable preferably). Before it hits the DB there is only a request to create an object (which may fail) and this request should be represented as such as an instance of a different class. Maintenance can be achieved by similar requests.

Once the point of creation becomes the writing of an object to the database. A unique ID can be assigned as it is written. This can be done by the server business logic using a clever bit of SQL, or even by the database using a serial column.

This probably isn't a new idea. It may be totally off beam. Any comments? -- Reg Whitton.

What I've done in the past with some success is to assign every object a unique ID. The ID generally comes from some service that guarantees uniqueness. Most modern databases can store 64 bit numbers -- at least the ones we used did -- and that's the integer size that we used. We called this an ObjectId and I believe that is the common usage. We used it as the primary key in the database and provided a service that would convert an ObjectId to a pointer to a live object on demand -- loading the object from the database if needed. Of course some business objects didn't live in the database, but that didn't matter, we still gave them ObjectIds so everything worked consistently. Every instance got it's own ObjectId (you could actually have two physical instances living in two different transaction caches but the transaction scheme is a whole different story), so if you made a copy of an object you had two objects that contained the same data but had two different ObjectIds.

If you are working on an project that uses object/relational mapping you basically have to use ObjectIds as your primary keys. Some DBAs will push back on that and insist on using some piece of domain data as the primary key, don't let them do that, object identity is an important piece of domain data and it is the logical choice for a primary key. -- PhilGoodwin

The problem is navigating to the object without having a pre-existing root. Nobody knows the IDs, but people do know their name etc, which is why having primary keys based on attributes is also a good idea. You will still need the object ID for internal efficiency.