Object Vs Model

Encapsulation and data-hiding (as encouraged in ObjectOriented programming) seems to be precisely inappropriate for modeling, i.e. it does the exact opposite of what is necessary. I'll attempt to explain myself succinctly. (Being succinct is not one of my strengths.)

I feel I need to give a bit of minimal background from WhatIsData and DataManipulation:

A datum is, most generically, a proposition held to have some truth value in some world. Being 'held', a datum must be stored within a cell as a value with a representation. Being a proposition, it must state something about a world. The nature of the proposition is usually implicit (e.g. 'label:value' within an object, or 'table:entry' in a Relational database) since the most generic propositions are both incredibly expensive to utilize and often of limited value. The nature of the cell can vary between hidden/encapsulated and globally accessible if you have the right permissions (e.g. in-a-table).

Regardless of representation and storage, data may either be reflective or projective.

By reflective I mean that the data reflects some truth about the outside world. It is, thus, subject to error and, for worlds that vary over time, to falling out of synchronization unless each datum has an 'at time T' as part of the proposition.
With projective, the opposite holds true - the world is defined by the data. Projective data includes that defining fictional worlds as held by the author (and in video-games as held to be true by the system), the data and axioms held to be true in mathematical universes, and (in a sense) even the physical reality we all experience.

Reflective and Projective are categorical duals of one another. (That is, duals in the CategoryTheory sense.)

DataManipulation occurs for exactly four different reasons.

The open world reasons are:
- projective manipulation -- manipulate the data in order to change the world
- reflective manipulation -- manipulate the data in order to better reflect the external world (e.g. to remove error, to update due to a world changing over time, etc.)
The closed world reasons are:
- inference -- obtain more value from existing data - e.g. to obtain valuable information from less-valuable data (e.g. in response to a query), or to shift computational costs from a time when they are more expensive to a time when they are less expensive (e.g. in anticipation of a query).
- maintenance -- forget data and information that is no longer necessary because it can either be inferred from other data and information or will not be relevant to any future query. This is done to reduce computational costs for present data (space) and future searches (time).

Most important above is the dual between reflective and projective data.

Object - The idea of 'object' is too fundamental to explain in English. However, objects can be described by some common properties. Objects are projective, meaning they carry their own reality and inflict it upon any observers. Most objects can be manipulated in order to change their realities, which is done for the reasons of making them more valuable realities (or for maintenance or inference, of course...). Objects can also be used to manipulate other objects (e.g. one uses hands to manipulate a tool, and the tool manipulates the device.) Ultimately, manipulations are performed by actors... which don't really exist anywhere. (- Who manipulates the arm? the brain. Who manipulates the brain? the actor. Where is the actor? nowhere. The actor is an abstraction. -) The sorts of manipulations an actor is expected to perform depends on the nature of the object. Objects are classified by the sorts of properties they have - in particular: how you can manipulate them, and how they manipulate other objects in response to your manipulations. (Note that this is behavioral substitution, and should be considered separate from the sort of property substitution that exists among values.)

Where one class of objects can be substituted for another, more-specific class, you have sub-classing. Where it can be done so blindly with regards to some actor, you have polymorphism. If you abstract the concern about how manipulations occur, you have virtualization.

Model - A 'model' in common terminology is inherently reflective; that is, to say you have a model is to say there exists something that you are modeling. Models exist to aid in making predictions about the outside reality. They carry no reality of their own. Models can be adjusted so they better predict reality based on the data to which they have access, and better infer information from the data they possess. Models, like Objects, can interact in the sense that predictions from one model might affect the predictions made by another... but note that the direction is opposite: model predictions 'draw upon' other models, while object manipulations 'push' manipulations of other objects. Models are necessarily constructed of objects. E.g. a toy soldier is a model of a real soldier, and a terrain map is a model of a battleground. However, the object that constitutes the model is, very fundamentally, not the modeled object. Predictions made utilizing models, and adjustments to models, are necessarily performed by actors. Models can be classified by the predictions they can make from the available data... and virtualized over a set of predictors.

Ultimately, Objects are the exact opposite - the categorical dual - of Models.

I believe that many ObjectOrientedLanguages? (including my *ahem* favorite) suffer a major conceptual disparity in attempting to treat models AS objects. This is because the source of reality is different between the two. Manipulations to models are intended to make the models better reflect reality, while manipulations to objects are intended to make reality more useful. Models require correction and updates and are used to make predictions. Objects are intrinsically correct (albeit possibly broken for their purpose, but a broken object is still an object). Manipulations to objects are used to make real changes.

However, the sorts of manipulations on the two vary significantly -- with models, you tweak the small parts (subcomponents and attached models) until the whole model is giving valuable predictions (and, thus, allowing you to make good decisions). When the thing you're modeling changes, you must update the model (possibly just the data associated with the model). Model-adjustments are bottom-up. With objects, however, you manipulate the whole object, which in turn properly manipulates its own subcomponents and attached objects... and, as a result of the command, objects change. Objects are handled by an imperative language, Models with an inquisitive language.

The object-oriented approach is well adapted to simulation (where each object is 'real' in the simulated world), but is not well adapted to modeling. Object-orientation gains virtualization by encapsulating data, but Models gain virtualization by globalizing the data in order that any predictor have access to all the data it might need. Data Hiding simply doesn't make sense with regards to a reflective system where the data must regularly be updated by observers of reality (i.e. by one or more actors) and where the data inherently comes from the outside.

This is, perhaps, one source of ObjectRelationalImpedenceMismatch?. Relational is designed for modeling data that came from an outside world whilst object-oriented is designed to... well... create and manipulate objects. That's pretty darn fundamental. You can make them work together until you try to add virtualization - abstract objects for which the associated data isn't known. That's where ObjectRelationalImpedenceMismatch? will be at its worst.

I bring this up primarily because I believe it is an interesting observation. However, I'm also looking for ways to unify objects and models into a single, coherent system - such that objects can exist as normal, but predictors can directly utilize the object data. This is proving to be... eh... very difficult to do if I at all wish to maintain virtualization. I'm playing with the concept of abstracting 'data' access itself - separate it from the 'cell', and allowing information-processing to occur with each 'data' access. That is, I'll need a globalized table of abstracted data accessors in order that abstracted objects AND data from outside can be placed in the same set of tables, while objects can simultaneously encapsulate and hide the nature of their data. The alternatives I best know are to set an observer on the local objects such that it continually or periodically obtains useful data by some abstracted means (i.e. polling), or to force every object to send data changes to channel to which a global observer will subscribe (i.e. publish-subscribe)... either of which may be better at providing historical data... unless, of course, history is kept at the cell-level. Finding an elegant solution to this, ideally a distributed one, would be highly useful to my current tasking at work... which involves using models to give human and artificial-intelligence operators the information they require to make intelligent decisions, and then follow immediately the reverse direction - construct objects (commands) and issue them to the robots being remotely operated. (I work with robots. I love my job. ;)

-- DavidBarbour?, 2006 December 15

Edited in response to comment in http://groups.google.com/group/pilud/t/587ab3f0a3c87148?hl=en

-- DavidBarbour?, 2009 June 28