Inferring Object State

There are times when you need to know something about some sort of an entity, and although that data hasn't been explicitly stored, you're pretty sure you can get it from other data that has been stored. So you end up using rules like:

If a catalog order has a tracking number, then it's been shipped -- otherwise, the warehouse has yet to ship it.
If a text in a content-management system has a status of 6, that means it's live and visible.

Sometimes these little rules seem really convenient, and storing two types of info in the same field feels all right. I'd rather just have the trackingNumber field, and not have another field named hasBeenShipped, if you can always infer the second from the first.

But then, it also seems at times that relying on such logic can be brittle and overly complicated. Can anybody really remember what all those CMS status codes mean? Is it ever valid for the warehouse to ship something out without recording a tracking number?

Does anybody have rules of thumb to guide them when they're trying to figure how much delineation is enough, and how much is too much?

I disagree. I never like InferringObjectState. I would much rather keep an explicit hasBeenShipped flag. The issue is clarity of code and data. Now, having said that, I suspect that both YAGNI and OOAO would conspire to get rid of hasBeenShipped and use the rule that you described.

See DualityBetweenStateAndClass. It doesn't answer the question, but places it in context. There is a spectrum of discriminations running from separate classes and subclasses through explicit state and implicit state to simple property. It's just a matter of degree. One rule of thumb would be to incline towards separate classes and away from simple properties, hence to incline towards explicit state and away from implicit state. But YouArentGoingToNeedIt, so don't. RefactorMercilessly implicit states into explicit states (or subclasses) when that makes the system simpler. Of course, if you're not OO, implicit states can become a form of pathological coupling. So if your persistent data ends up on a traditional database that is referenced by non-OO applications (or people), err on the side of explicit states. [Any thoughts on databases as user interfaces, by the way?]

Well, I'm more careful about this stuff when you're talking about DBs and persistent data, because of possible snarls with migrating data between different versions of the same table.

At any rate, I'm not interested in what's correct, just what's useful. Seems to me that you could do it one way, or the other way. I'm just trying to tease out the forces that might make one or the other a better option.

At least we agree that ItDepends. Maybe we agree that there is no right or wrong answer, just different implications of different approaches. If an implicit state is 'naturally' and simply implied by the data, it is useful to treat it like any other derived property. Store it if that makes the system simpler or usefully more efficient than deriving it. If the inferred state is artificial, or based on a more complex combination of data, or depends on non-obvious business rules, it's probably better to go for an explicit state.

In the example of the tracking number, it is quite possible that we might want to track things that haven't been shipped, or perhaps to ship things that we don't need to track, so the connection between shipping and tracking may not be strong enough to base the state inference on. Conversely, if we had a shipping date or shipment reference, say, as mandatory properties when in the 'shipped' state and prohibited properties when in any 'unshipped' state, then the connection may be strong enough to support state inference. In such cases, the properties in question tend to be part and parcel of the meaning (or definition) of the state, in the real world at least. If 'shipped' means was part of a shipment, then it is logical to infer the state of 'shipped' from information about which shipment. But if we cannot guarantee that the system will always have the detailed information, that logic is ultimately valueless, and we would be better off sticking with an explicit state.

I find myself thinking about this most when I'm dealing with business rules, since business rules change all the time. Today my client has a tracking number for every shipped order, tomorrow they may change their mind. (Maybe they'll be shipping free premiums via USPS, for example.) So maybe my question has to do with CostOfChange, too. If it's painless to change it later, then it's sort of unimportant to wonder about it today -- just do what works at the time, and change it later.

But if I end scattering lines like

 ordersToShip << catalogOrder unless catalogOrder.trackingNumber

everywhere, then it worries me that when business logic changes I'll have a heck of a time tracking down all my necessary one-line changes. But maybe I just worry too much.

Why wouldn't the state inference be coded OnceAndOnlyOnce, like any derived property? Then if the business rules change, the derivation can change (including conversion to an explicit property) without any one-line changes being required.