Relational Database Table Rows Have No Intrinsic Object Identity

<refactoring>

(DeleteMe: this page is intended as a more clearly-named refactoring of the discussion whose raw thread is on RelationalHasNoObjectIdentity)

A row in a table in a relational database is not an object in the sense of ObjectOrientedProgramming, because it represents only state and not behavior. Therefore a row in a table in a relational database cannot have intrinsic object identity as implemented in numerous ObjectOrientedProgrammingLanguage's (typically by the "==" comparator). To an application programmer using an ObjectOrientedProgrammingLanguage, the true identity of an instance of a class in the address space of the process running the program is derived from the memory location where that instance's state is stored. See the JDK 1.3 JavaDoc for java.lang.System.identityHashCode(Object object) and java.lang.Object.hashCode(), for example. This identity is not a function of the values of any instance variables of the instance. A row in a relational table is identified by the values of its columns and has no other intrinsic identity to the application programmer.

A row (tuple) is an immutable thing; a ValueObject.

If rows are immutable then why can I change the values in them? Sleight of hand. What you are really doing (if ChrisDate is to be believed) is creating a new relation - which is the same as the old relation, except with a modified copy of the row in question. In practice, this mutation can be done in place as all references to the relation in the RelationalModel are via the RelationalVariable; but it's useful to think of RelationalVariables as the only mutable things in the database. See Costin's better explanation below.

But what of the pair of (RelationalVariable (table name), CandidateKey) - that does describe a unique entity within a database that can a) change over time (as the RelationalVariable points to new relations as a result of transactions), b) may have a limited lifespan as tuples are inserted into/deleted from relations, and b) is unique within a database.

In the OO world, consider the difference between a variable pointing to a ValueObject (such as an integer), and the ValueObject itself. The value object itself has no identity; but the variable certainly does. The complication occurs when variables are themselves contained within objects - ReferenceObjects; this is where ObjectIdentity comes into play.

Some feel that a database table row's lack of intrinsic object identity is one of the problems that partially comprises the ObjectRelationalImpedanceMismatch, because they feel that intrinsic object identity is sufficient enough to leverage in constructing useful applications, without requiring the artifice of adding additional attributes to facilitate ObjectRelationalMapping. Others contend that this lack is inconsequential, because intrinsic object identity is not sufficient to identify a "real-world" object or entity to a software application or its users.

Contributors: MarkAddleman, CostinCozianu, RichardHenderson, RandyStafford

</refactoring>

Relational primary keys usually have a "presentable" form; something that can be printed on paper. This makes it easier to discuss and communicate, especially across platforms or programming languages. How is this a bad thing?


A ValueObject can only exist where there is referential transparency - having a reference (any reference) to that object is the same as possessing that value. Table rows are referenced by (RelationalVariable, CandidateKey). There is no referential transparency as to the remainder of the cells in that row. Therefore rows are not ValueObjects?, and attempting to categorize them as such is a bad abstraction... at least not without historical databases where the rows are referenced by (RelationalVariable, CandidateKey, Time).

The STATE of a row IS a value. (Not a ValueObject. A Value.) However, the state of ANY cell is always a value. (A cell is the abstraction of a service to store one value, and is fundamental to assignment.) Attempting to categorize rows as ValueObjects? would be kin to saying: "My C++ program is functional. Every time I perform an assignment, I'm logically taking the entire program and replacing it with one that is the exact same as the previous program, except with the mutation in the value I assigned and in the 'next action' pointer."

The argument above states that "If rows are immutable then why can I change the values in them?" is "Sleight of hand." But the whole argument is sleight of mind. The mutation aspects of DataManipulation fall outside the RelationalModel, but do and must exist. What ChrisDate presents is a useful lie: saying that the RelationalVariable is the only thing that can be mutated is a valuable paradigm when it comes to designing table-grain security and concurrent transaction semantics. However, what is -logically- happening in the end is that the cells associated with a row-reference are, in fact, changing.


Database rows are *relationships* between objects, not objects. Therefore, they don't have object identity, because a relationship describes the *public* properties of a collection of interacting objects. In contrast, an object describes a single abstract *private* piece of data in a storage resource, where the resource itself must be identifiable despite the abstractness, thus object identity is needed. Rows do not need identity, since they are relying on the identity of every object referred to by the key fields of the table (Example: a 'person' would be a good class of objects. It would not be a good database table, since you cannot store people in a database(!) {Although I think Hitler tried it}. You could store information about 'friendship' between various people (say, customers) in a database, so "customer_friends" would be a good table name. The 'customer_id' key in the table would be a reference to one person. The table would also have 'friend_id' key, which would refer to the customer's friend, which is another person. But neither 'customer_id' nor 'friend_id' describes the actual person. Friendship is not something you can describe with objects, because there is no natural object identity that you could associate with friendship. However, an object could easily represent a person. In this example, closest analogy as relational tables to a person object would be a table 'personal_data', which would describe those aspects of (one) person's data that are available for public use. But there is no way that table could include the information needed to support the person for his own private tasks (say "password for his bank account"), because those are not intended to be available for general public, the person himself must control their use. ). There can be more than one key field in a relational table, thus each row refers to more than one object. Objects cannot be usefully stored in relational databases, since object data is private [only accessible by the methods of the object], whereas all data in relational databases is public, accessible by the world at large (anybody using sql tools). Relational databases only describe public aspects of the stored data, whereas objects describe the (internal) resource.

I disagree with the characterization that tables are more "global". See DatabaseNotMoreGlobalThanClasses and GateKeeper.

There is no globality at issue here. The distinction is between interfaces and internal implementation, not between global interface and local interface. The private state of an object is not about its interface (object contains conversion from private representation to public interface). The reason why object identity in OO is important is because it creates a distinction between 'private representation' and 'public interface'. The private representation is visible in the interface as just the object identity property, nothing else.


Q: If rows are immutable then why can I change the values in them?

A: You actually do not. In the RelationalModel (and in its approximation, SQL implementation) the UPDATE operator will change the value of the table (more specifically, the RelationalVariable), by removing the tuples corresponding to old values and inserting values corresponding to new tuples by the specified transformation. From a logical point of view, a new table is created; and the RelationalVariable updated to point to it. The old table, depending on the semantics of the RDBMS, may be discarded at some point (see below).

Let's look at a practical example

  UPDATE employees SET salary= salary*2 WHERE employee_id=1001
And say that the database contained the tuple (1001, 100K$), then the execution is logically defined to mean:

 EMPLOYEES := EMPLOYEES  MINUS {(1001,100K$) } PLUS {(1001,200K$)}
Some databases obviously optimize this operation to acquire a row lock, then update the physical image of the old tuple "in place". This is an optimization/implementation detail not very relevant for the logical model, the relational model only needs to check that such optimizations are available and reasonable for implementors. Other databases (most notably PostgreSQL and Oracle), may create a new version of the physical row used to store the tuple value while possibly allowing concurrent transactions to access the old physical row with the previous version.

All these implementation aspects have no relevance in the relational model.

Essentially a FunctionalProgramming model.

Under pure functional, when a new speck of dust lands on the Earth from space, the old earth is deleted and a new one that weighs 0.000000000000001 pounds more is created? Weird, but then again quantum probabilities probably still take the cake in that category. -- AnonymousDonor

I see the attraction of having a functional core to the relational model. I also think it seems silly. There is no difference between mutating a row, and creating a new table with a mutated row, and replacing the old table. It has the exact same implications. So why not accept that databases are about state, and state changes? This is like drawing a distinction between integer addition in C, and the 3 lines of assembly require to perform the addition. From the C abstraction level, there is no distinction. Can anyone identify a benefit to modeling state mutation in this roundabout way? If there was versioning or something, you would save the old state, but there is no reason those details should be in the basic logical model. I'd like to be enlightened :) Thanks. - Steve Wedig

The relational model does include state - relation variables (or, speaking loosely, "tables"), which have a relation value. The rows are part of the relation value; there are no separately distinguishable row variables in the database. You could introduce them, but it would complicate the model, and constrain implementations of it, unnecessarily. Row variables are not part of the relational model for some of the same reasons that the C language specification doesn't talk about register allocation in addition expressions. -- DanMuller

Oh, you have row variables, Dan. A simple tuple of (Relation Variable, Candidate Key) is a reference to a row that may vary over time (including in presence). They aren't independent variables (as in 'independent' of the relation variables), but they certainly exist.

I'm not clear what you mean here. In the definitive description of the RelationalModel, TheThirdManifesto, there are no persistent tuple-valued variables. There may be transient tuple-valued variables. The only persistent variables are relation-valued variables. A database is an identifiable collection of relation-valued variables. -- DaveVoorhis

Perhaps you are confused on what constitutes cell (a service that stores a value) as opposed to what constitutes variable (any time-varying quantity or value). The RelationalModel specifies variables, not the underlying implementation as 'cells', and it is quite possible to use one cell to store everything (e.g. with a record containing all the relations). The relation variables would, then, be referred to roughly as myBigCell.RelationName?. Row variables would be (Relation Variable, Candidate Key). The variant properties of a row variable would be: (a) whether the row exists (i.e. whether the key is found), (b) the state of the row if it exists. The variant properties of the relation variable would include its size and the set of relationships it contains. Both of those are persistent variables (or at least as persistent as myBigCell). Neither sort of variable is a cell. And the RelationalModel is not compromised.

I am using the term "variable" as used in TheThirdManifesto -- where the term "cell" does not appear -- and in much of the literature. You seem to be referring to your own system and terminology, hence the confusion. Dan is implicitly referring to the commonly-known descriptions of the RelationalModel and related implementations. Therefore, you are incorrect in saying that he has row variables. It would be correct to say, however, that you have row "variables." -- DaveVoorhis

Variable is variable. The version I use is correct in TheThirdManifesto and most of literature. The fact that variables very often directly reference cells (thus creating a tight coupling in normal use) doesn't mean that you're correct to confuse the two concepts or insist one is the same as the other. TheThirdManifesto doesn't mention "cell" and, indeed, should not mention cell because the storage service for the Database (including level of persistence, and whether there is one big cell or a million tiny cells - one for every component of every row) is an implementation detail. The notion that it TheThirdManifesto is insisting you use one "variable" (whereby you apparently mean cell) for each relation is absurd. Variable is variable. Cell is cell. They are fundamentally different abstractions, related by the fact that the contents of a mutable cell is, in fact, variable. Any time you have an independent variable, you necessarily will have an arbitrary number of dependent variables that exist in reference to that variable and others. They are no less variable for being dependent. However, they aren't the sort of variables you can twiddle. (Of course, when dealing with constraints for consistency, you don't even have full twiddle-powers on variables in the relational model.)

I think we're circling around some sort of ViolentAgreement here. I was specifically -- though perhaps awkwardly -- trying to disambiguate the use of "variable" in the way Dan used it and as it appears in TheThirdManifesto (i.e., an explicitly-declared identifier-named slot or cell) vs "variable" as you used it in your reply to Dan, for which the term "mutable object" (or some equivalent) is often used in order to prevent confusion over the use of the term "variable".

In effect, what was written in your exchange with Dan was something like this: Replace "citrus fruit" with "row variables", and "fruit orchard" with "model".

Given the context, it would appear that what you actually meant was the following:

Replace "oranges" with persistent tuple variables, and replace "lemons" with persistent variant tuples. Dan was referring to "variables" as explicitly-declared, identifier-named slots or cells within the database. You were referring to "variables" as non-identifier-named, non-explicitly-declared variant entities within the database.

However... In the RelationalModel, there are in fact no persistent tuple "variables" using either definition of "variable." Persistent tuples exist only by virtue of being part of the structure of relations stored in persistent relation-valued variables. You can only change a tuple in a relation stored in a relation-valued variable by assigning a new relation to that variable. Implementations of the RelationalModel may optimize the process by internally replacing one tuple with another, or even by changing attributes within a given tuple. However, this is purely internal, and as invisible to the user as the in situ bit-twiddling that may go on to optimize integer arithmetic in, say, C. Conceptually, an integer is an immutable value. Likewise, in the RelationalModel, a relation is an immutable value. No relation has slots, variables, cells, or any other mutable or variant structure - any more than the integer value "3" has slots, variables, cells, or any other mutable or variant structure.

Of course, relational languages usually provide convenient short-hand notations for assigning a new relation to a relation-valued variable -- such as the INSERT, UPDATE, and DELETE commands in SqlLanguage and TutorialDee -- that give the appearance of mutating relations at a tuple or attribute-of-a-tuple level. As I've mentioned, they may even be implemented that way for optimization reasons. However, the remainder of the system must behave as if the cell/slot/variable containing an immutable relation was assigned a new immutable relation.

-- DaveVoorhis

The type for a row-variable isn't a persistent tuple, true. It's a persistent 'Maybe Tuple'. (I.e. Just Tuple | Nothing). After all, if the row isn't in the relation, then that case must be handled. Logically, setting that row-variable to 'Nothing' would remove it from the relation, while setting it to 'Just Tuple' would set it to said tuple (with the constraint that you can't change the candidate-key components, and must maintain any other database consistency requirements).

Ultimately, what looks like a 'cell' depends on how you manipulate the variables. Consider that it's possible to implement a virtualization of addressable shared memory (e.g. RAM) atop a Relation (very easily, as I'm sure you know). From that perspective, every row contains a mutable cell and can be referenced by (Relation Variable, Address). The operations to change that mutable cell just happen to utilize a DML. If that address isn't in the relation (e.g. hasn't been added, or has since been removed) then any attempted access would return that the appropriate tuple doesn't exist and perform some reasonable action like generating a memory-access exception. It doesn't really matter whether the entire relation is, itself, stored in 'one named slot' or otherwise. That doesn't change the logical access pattern, or the logical results of such access. Logically, the row-items are both logically mutable cells and variables.

Every referenceable property of a cell constitutes a 'variable', including the value-state of the cell. Every independently mutable property of a cell can, itself, be used as a cell (because 'cell' is just an abstraction of a service to store and recover a value). Even if you insist that DanMuller's definition of variable is the correct one - a named slot that stores a value (the most common unity of 'variable' and 'cell'), you still have named slots that store values at the tuple-level of a consistent relation.

If all that was said is that the RelationalModel doesn't have tuple-variables independent of the relations, I'd wholly agree. You can't add rows to a relation by creating a variable 'r' and assigning to it a tuple-value... well, you can't do that and still have the RelationalModel.

Argh. Can I suggest, most politely and humbly, that you read TheThirdManifesto? -- DaveVoorhis

As pointed out, one can talk about the parts of a relation being "dependent variables". Adding the qualifier "independent" to my original statement is a valid suggestion. My point might better have been that, although they may be implied by the model, they are not an explicit part of it, and really add nothing to an understanding of the it. Their implied existence is acknowledged by the common forms of the data modification operators. The danger in focusing attention on them lies only in that many programmers have a fuzzy concept of "value" and "variable", engendered by common (mis)usage in daily work with currently popular languages - leading to the sort of misconceptions that this page seems intended to address. It's rather a pity to waste this much verbiage on the concept of tuples-as-variables. Would anyone object if I reworded my statement slightly and removed this long digression? -- DanMuller

No objection from me. -- DV

I also do not object to refactoring the discussion. However, I disagree with the premise of this page; as such, I don't consider this so much a 'digression' as a useful conflict.

Viewing relations, tuples, even whole database-states as 'values' is... doable. One may, logically, view the whole database-state as being replaced every time one relation is replaced (whereby 'relation' is identified by some unique name in the database). One may logically view the whole relation-state as being replaced every time one 'tuple' is replaced/removed/added (whereby 'tuple' is identified by some candidate key... possibly the whole tuple). One may logically view the a single value of a non-key cell of a tuple to be replaced every time one bit is twiddled in its representation. Etc. In the reverse direction, one can consider the entire world to be replaced with another the moment the database is updated. Etc.

These views, however, are not relevant. They are misleading. They do not change what is fundamentally happening, or how 'things' may be fundamentally referenced (giving them object-identity). They do not provide any value. If you can uniquely reference something, you have object identity. If that object is mutable (i.e. has properties that can change), you have a variable. Period. That's true for physical reality, and it's also true for variable relations and tuples within databases. Further, references to things, in any language, may be done by the properties they possess (e.g. "my computer"), including the possibility of uniquely attached names (e.g. SSN, cell phone number, etc.) The same is true for both physical reality and for relations and tuples in databases. Often, one object can be identified by many different things... as is the case with a tuple that may be identified by many candidate keys.

Don't get carried away with promoting TheThirdManifesto like some sort of gospel. I don't have the book, but I did read the paper a while back. The authors promote a more 'pure' Relational approach and a paradigm that is, like most paradigms, a UsefulLie. Paradigms are views, not reality. The paradigm that a cell associated with the RelationVariable? is the only thing that is changing is not more valid than the paradigm that the whole universe is being updated, or the paradigm that individual tuples referenced by candidate keys are being altered. No paradigms change the fact that a tuple that can be referenced by its candidate key is now present/gone/updated/etc. Relational database table rows DO have object identity. And, like all objects, they have properties, such as being present or absent... and, if present, then potentially having other related values. Date's paradigm won't change that.

"These views, ... do not change what is fundamentally happening, ..." Say rather, "this way of describing a model does not change how the model may be implemented". There is, after all, nothing "happening", fundamentally or otherwise, in a model. :) (A similar comment applies to "Paradigms are views, not reality". We're talking about a model, which is arguably different from a paradigm and is absolutely distinct from reality or implementation.) A functional description is not a gratuitous lie, but a way of describing the model that simplifies many types of reasoning, omits irrelevancies, and can, with only a simple conceptual addition, be used as a basis to describe stateful behavior when needed. (I agree that it's a UsefulLie prior to adding stateful behavior. It's not nearly so useful afterwards. As far as 'happening', I was referring specifically to the state update of the database in response to a DataManipulation request.)

Individual tuples do not have object identity in the usual sense of that phrase. "Object identity" in most object models is a concept independent of the type of the object, and closely tied to the ability to reference an object independently of any other knowledge of the object. To be more specific, (one sort of) mutable object is a cell, its identity is its canonical name and can itself usually be treated as a value. A tuple's "identity" might be said to be synonymous with a particular value of a (not "the"!) candidate key. But a key value hardly seems synonymous with (or even a representation of) an object identity. A key value can have a very different domain in different tuples (being composed of any number of values of any domain/type), and two tuples in entirely different relations could have identical key values, while clearly not being "the same object". So although you might say that a tuple's key values resemble an object reference in some ways, they are not the same thing.

I suppose you could try to define a "tuple object reference" consisting of both a candidate key value and the name of a relation variable that contains the tuple. But that that opens another rat's nest. One could easily come up with object references that look perfectly valid, but that do not reference a tuple in actual existence - which seems to indicate that this is really the name of a dependent cell that, if it exists, might contain a ... what? Tuple value? Or a tuple object reference, which we still haven't defined? Or you'd have to expand the usual usage of "object reference" to include references to non-existent objects. Attempts to shoehorn relational theory into an object-identity point of view seem ill-advised to me, when doing so easily leads to such complexities and misunderstandings. A functional approach to the relational model is simple to understand and, so far as I've seen to date, sufficient. OTOH, it is useful to talk about the similarities between candidate keys and object references when discussing systems that use objects to represent relational data, or that use relational data to persist objects. Just don't forget that there are differences. -- DanMuller

"Don't get carried away with promoting TheThirdManifesto like some sort of gospel."

I'm not, and there are certainly portions that I question or disagree with. However, as it is the de facto reference for the RelationalModel since DrCodd's original "A Relational Model for Large Shared Data Banks" (http://www.acm.org/classics/nov95/toc.html), it is worth reading as a foundation for these discussions even if you disagree with some or all of its precepts. We're covering some well-trodden ground here, so at the very least using it as a common reference might save us some effort in typing up explanations. -- DaveVoorhis

The RelationalModel was pretty darn easy to understand... named sets of records with a DML is pretty simple. I've studied it in at least four different texts, including DrCodd's original, FoundationsOfDatabases (very nice), and FundamentalsOfDatabaseSystems (mediocre), plus a refresher from the Wikipedia page and this C2. They're all quite consistent. I've also put a great deal of effort and mathematical proof into understanding such things as data itself (WhatIsData), data manipulation (DataManipulation), perception, behavior, actors, communication, cells, emergent behaviors, models vs objects (ObjectVsModel), knowledge (KnowLedge), understanding, etc. Forgive me if I'm unwilling to buy another book to learn a very simple concept that I'm quite confident I already understand. (continued)

Indeed, on the surface and for most purposes, the RelationalModel is very simple, which is one of its strengths. However, the devil is in the details -- this debate is about one of the details -- and TheThirdManifesto in particular deals with these at some length. AnIntroductionToDatabaseSystems does as well, though with somewhat less emphasis on the relationship between types, values, and the RelationalModel; but more on the fundamentals which are assumed to be understood in TheThirdManifesto. I agree that FundamentalsOfDatabaseSystems is mediocre (though I vaguely recall it has a nice section on query optimization, but maybe I'm thinking of something else), and I haven't read FoundationsOfDatabases. But that isn't the point. The point is that some of what we're discussing here is either dealt with explicitly, or well implied, in TheThirdManifesto. I'm not sure it's covered elsewhere in as much detail, or with as much justification. For that reason, I recommend it. -- DV

The issue isn't that I don't understand the RelationalModel; it's that the particular paradigm promoted for understanding a state update to a relational variable is not useful to an understanding the either event [??] or how it relates to the world in which the database resides. It's a UselessLie. If you believe it to be a UsefulLie, then, please, inform me of its utility. Tell me what you gain from it as opposed to viewing the entire Database as being replaced to update one relation, or viewing the entire world as being replaced to update one database? If you cannot, then why do you propose there is utility in that paradigm over that of manipulating tuple-objects in data space? Tell me from the perspective of an observer of the world. Tell me from the perspective of the database manager - the actor sending the low-level signals to physically update the database. Tell me from the perspective of the world and physical environment. Where does one gain value from this paradigm? I think nowhere. I currently believe that this paradigm ought to be rejected, not embraced. It doesn't help people understand, and shouldn't be part of explanations except in explaining the paradigm itself.

Good question. This is one that I feel is answered well by Chapter 1 of TheThirdManifesto, plus a bit of reading between the lines, but I'll attempt an answer. First:

In other words, from a purely pragmatic point of view we can accept the perception that tuples are mutable, and even implement it that way, as long as the implementation doesn't violate the underlying theoretical model.

Since the RelationalModel is a theoretical model, we can insist on greater rigour than would normally be considered on a pragmatic basis, and thereby predictably and formally prove the behaviour of the model under all reasonable circumstances. That theoretical rigour has an important impact on our pragmatic users, data managers, and casual readers whether they know it or not - it means they can trust correct implementations of the model to be consistent and behave predictably even under novel conditions. However, in order to make provability manageable, we have to constrain the model. Hopefully, the theoretical constraints will not unduly limit pragmatic implementations of that model.

A constraint that the RelationalModel imposes is that attribute values are immutable. We could conceivably extend the model to handle mutable objects, but this would unduly complicate the model. Even though the RelationalModel is orthogonal to persistence, it is typically used in contexts where persistence is significant, i.e., to implement database systems where the values are "frozen", so to speak, when stored in the database. Therefore, there is no reason - at least in the usual interpretations of the model - to consider mutable objects in a database context.

However, the RelationalModel specifically does not constrain what types of values are stored in the database. Any value can be an attribute of a tuple. (There may be other constraints on the value, e.g., that it can be tested for equality against another value of the same type, or that the values belonging to a type must be ordinal, but these are not germane to this discussion.)

Because a goal of the RelationalModel is that it not unreasonably constrain the types of value that an attribute may contain, it makes sense that a tuple or even a relation can be an attribute value. This allows us to have nested tuples and relations. Although it is rare to require these, there are circumstances where they are appropriate, and the alternatives would be awkward or inelegant.

Since we've already established that attribute values must be immutable, it implies that tuples and relations must be immutable. Otherwise, tuples and relations could not be attribute values. If that were true:

Therefore, tuples and relations in the RelationalModel are immutable. The only mutable persistent object in a theoretical true relational database is a relvar, aka a relation-valued variable. The only thing we can do to it is assign it a new value, i.e., replace its current value -- an immutable relation -- with another immutable relation. However, as we've seen, this does not constrain pragmatic implementations or casual interpretations of the model, but it does sufficiently constrain the theoretical model itself so that it is not unduly complex or inelegant, and therefore its behaviour is easily provable. That means pragmatic implementations which correctly implement the model will be provably consistent, and demonstrate behaviour that can be predicted by the model. Therefore, our users and data managers can reasonably trust the system.

-- DaveVoorhis

The state of the whole database at a given instant is also a value. You can store a whole database-state in a single attribute value. I understand this well enough. So why did you not you extend your argument just a little bit further? Why did you not say: "Well, I could store the whole state of the database as a single attribute value, so 'relation variables' aren't really mutable, either... the whole database state is an immutable value. Instead, we should use only one variable that carries the whole database-value." I think you had a bit of bias to stop where you did.

It wasn't a matter of bias to stop where I did. I was simply explaining the RelationalModel as it is, rather than how it could be. My impression is that the keepers of the RelationalModel decided what is mutable or not based on the argument I've given above, rather than (say) some notion of functional purity. That said, you're not the first to note that a database like the following (using TutorialDee syntax)...

 VAR Customers REAL RELATION {CustName CHAR, Address CHAR, Phone CHAR} KEY {Name};
 VAR Orders REAL RELATION {OrderNumber INTEGER, CustName CHAR, Date DATE} KEY {OrderNumber};
...could be represented as follows:

 VAR MyDatabase REAL RELATION {Customers RELATION {CustName CHAR, Address CHAR, Phone CHAR},
                               Orders RELATION {OrderNumber INTEGER, CustName CHAR, Date DATE}};
Of course, why stop there? Databases (as defined above) can be attributes, too:

 VAR MyUniverse REAL RELATION {
      MyDatabase RELATION {
           Customers RELATION {CustName CHAR, Address CHAR, Phone CHAR},
           Orders RELATION {OrderNumber INTEGER, CustName CHAR, Date DATE}},
      YourDatabase RELATION {
           Fish RELATION {Species CHAR, FavouriteRecipe CHAR, BestBait CHAR},
           Bait RELATION {Name CHAR, Refrigerate BOOLEAN, UseHook BOOLEAN}}};
And so on. However, no matter how much we continue this hierarchy, the RelationalModel defines that the "root", as it were, is going to be relation-valued variables -- even if there is only one and its value is only set once. For all intents and purposes, since the above structure is fully encompassed by the RelationalModel, we might as well consider that collection of (one!) relvars to be "the database" and so we're effectively back where we started. -- DaveVoorhis

Err... that wouldn't be quite correct. As you've defined it, the 'MyDatabase' will be a relation that is a set of tuples containing two relations {(Customers:<Relation>, Orders:<Relation>),...} However, only ONE such tuple is actually a database. If you have more than one, you won't be able to identify the actual 'MyDatabase'. What you actually needed at the top level was not a relation variable, but rather a cell carrying a record-value (a single 'tuple' in the relational terminology). Further, 'MyUniverse' is intended to be a universe of Databases. Instead, what you gave was a relation containing a large set of (MyDatabase, YourDatabase) pairs, as though every one of MyDatabases must, inherently, be paired with one of YourDatabases. Better would be to support a universe with: MyUniverse is Relation of {Id: Database-Identifier (KEY), meta: Database-Metadata, value: (Record of name->Relation)}. And, ultimately, 'MyUniverse' uses a relation only because what it actually needs is a set, and relations are sets. Are you still confident that the top-level here ought to be a relation-valued variable? Even for MyDatabase?

No, it's correct. MyDatabase is presumed to be a relation that contains a single tuple. We could supply a constraint to enforce it, if we wished. TutorialDee provides a TUPLE FROM operator to obtain the tuple value from a relation of cardinality one, because these occur frequently as a rough analogue to the notion of a singleton. The FROM operator is used to obtain an attribute of a tuple. Thus, we can obtain the Customers relation via the following expression:

 Customers FROM (TUPLE FROM MyDatabase)

Or...

 Customers FROM (TUPLE FROM (MyDatabase FROM (TUPLE FROM MyUniverse)))

Obviously, this is awkward syntax. Should such a schema be desirable, the associated RelationalLanguage would presumably provide a clean short-hand for the above -- something like:

 MyUniverse.MyDatabase.Customers

Of course, some other schema design might warrant multiple tuples for MyUniverse or MyDatabase -- say, to implement multiple versions -- for which we would have to provide an appropriate attribute for selection purposes.

-- DaveVoorhis

Time would be a good component for historical databases. However, the MyUniverse really should allow for the introduction and elimination of databases over time.

Absolutely. That's the subject of temporal databases, which are another topic. HughDarwen is arguably the expert on these. -- DaveVoorhis

What is the difference between a relation constrained to have exactly one tuple and a cell that has a value that is a tuple? Really? And if there isn't one, then are you insisting that a 'relvar' must be at the top level only because you can... eh... make it fit?

I am insisting that a relvar must be at the top level because it is required by Relational Model Prescription 16, that a database shall be a named container for relvars, which is a direct result of DrCodd's "Information Principle", that all information [in a database] is represented by data values in relations. Feel free to violate this, but the result would, by definition, not be the RelationalModel. -- DaveVoorhis

The fact is that, despite the argument you provided that this is for rigor and constraint on the model, you gained no rigor, and you gained no constraints. You haven't made it any easier to prove any qualities about the behavior of actors utilizing the relational database - those mutating it, those reading it, etc. You haven't made it any easier to prove qualities of the database under such manipulations - e.g. consistency, accuracy. Where you can make use of the argument that relation-variables are the basic conceptual 'cells' manipulated for purposes of concurrency control involving multiple writers... but concurrency control is not part of the RelationalModel. For that, you need some model of concurrency and actors, and an approach or theory for concurrency-control... and once you have a full theory for concurrency and transactions, you won't actually need to distinguish anything at the arbitrarily chosen relational-variable level.

The behaviour of actors utilising the relational database is outside the domain of the RelationalModel, and therefore irrelevant to the RelationalModel. I can, in theory, prove the RelationalModel to be reasonably self-consistent, which is as it should be. Feel free to define a meta-model that incorporates both the RelationalDatabase and its clients. As for concurrency control and the like, that is usually defined at the level of a storage engine, below (and effectively outside) the RelationalModel. There are models that endeavour to model the system from the highest abstract level down to the bits on disk platters -- ExtendedSetTheory, for example -- but I'm not aware of any of these that define concurrency, either. That generally seems to be considered implementation-specific. -- DaveVoorhis

Can you give me a statement of what you've actually gained, for real? I.e. a postulate or behavior you can deductively prove under your position that cannot be proven without it? I doubt it (because I know that, from an actor's viewpoint, this is just a paradigm, and paradigms cannot affect proofs), but you (and by extension Date) certainly deserve the opportunity. If there is no such postulate or behavior, then you've literally gained nothing - no rigor, no constraint, no utility.

I'm not clear what your point is here. Is this a specific criticism of the RelationalModel itself, or is it a criticism of the RelationalModel for not being an all-encompassing model of computation in general, or is it a prelude to a ParadigmPissingMatch? -- DaveVoorhis

I like the RelationalModel itself; it's not sufficient for all my own purposes (e.g. knowledge databases) but that isn't relevant to my position here.

What I've presented is more a criticism of the RelationalModel as it is being presented with regards to immutability and object-identity. The arguments you've provided are invalid and useless, and shouldn't have any support. You're presenting them by proxy from Date and Darwen, but if you've accurately presented your understanding of their arguments, then you shouldn't be supporting them. The entire 'it's not mutable because we said it is not' is very much an 'emperor has no clothes' situation; smart people are ignoring the very obvious - that non-key attributes of referenceable tuples are perfectly capable of operating as mutable cells from the perspective of ALL actors, including the DBMS. They are, instead, saying these tuples are 'logically immutable'... which isn't true... because Date and Darwen said so. Actually, the argument is more akin to: "Relation values ARE immutable, so Relations are immutable, so Tuples must be immutable." That's a correct argument. The problem is that "Cells ('variables' in common vernacular) are NOT immutable, so relations as referenced through cells by actors are not immutable, so tuples as referenced through those relations as referenced through those cells by actors are not immutable." This is also a correct argument. The emperor has no clothes. The argument that attempts to conceptually separate tuples from the relation-variable has no validity. Please, can we have some people speaking up about it rather than accepting Date and Darwen at their word and presenting/teaching their arguments as though they are counters to some rather basic truths?


Attempting to be a bit clearer: if something can be used as a mutable cell in every way, and 'cell' is an abstraction of a service to store and retrieve a value, then that something is (by definition) a cell. A cell is mutable. The value in a cell is not mutable (because values are not mutable), but properties of the cell (including the property of 'the value in the cell') are mutable. If you have a relation value, it is not mutable. If you have a cell containing a relation, it IS mutable. Further, other properties of a cell containing a relation are independently mutable... e.g. by placing a new relation in the cell with an extra tuple, one has changed the properties associated with the cell. Because you can reference (i.e. 'name') properties of a cell with a relation-variable by use of a candidate-key, those also constitute variables. Because you can manipulate the relation-variable cell in order to set those properties independently of each other, you also have (by definition) cells within the relation-variable. Thus, deductively, you have mutable tuples. These mutable tuples have properties of presence (Just <value> | Nothing) and properties associated with the '<value>' when present. These truths are independent of the RelationalModel; they are true because you have mutable cell carrying a set/relation. These truths are also independent of implementation; if the act of setting a cell associated with a tuple requires sending a DataManipulationLanguage? message to the whole 'MyUniverse', then so be it - that's an implementation issue. These truths are also independent of the nature of the 'cell' associated with the relation variable, which itself may be just an independently mutable property of a single, larger 'cell' associated with the database, or even with a 'universe'.

Date and Darwen and others cannot escape these truths by shutting their eyes and saying differently - that relvars are the only (<-- emphasis!) mutable variables. Further, they gain no rigor, no constraint, and no utility from even attempting to do so. Pointing out that relations are values isn't valid as an argument for their position - it's true, yes; it just isn't relevant. But, even if they've tricked you into believing them, utility and gain from their position must still be measured in real terms... e.g. a single theorem that can be deductively proven from their position that cannot be proven without it. You certainly have not presented such a theorem... only alluded that one must exist. Until you are presented such a theorem, you should know that an extra helping of skepticism is deserved.

As I see it, attempting to apply this restriction to a RelationalDatabase is logical impossibility without making the entire database immutable (and thus providing true referential transparency). Saying that this is part of the RelationalModel would make the RelationalModel internally inconsistent. If introducing such inconsistency is what the keepers of the RelationalModel have been doing with their spare time, we need to fire them and hire some new ones. If it's just a common misunderstanding of what the keepers have actually been saying, then they need to work on clarification.

The RelationalModel is a model, not an implementation. As such, the notion of immutability has no impact on the users of implementations of the model, or even on broader models that might incorporate the RelationalModel -- these may harmlessly regard the model's immutables as mutable. The restriction, therefore is only on the RelationalModel, not on a RelationalDatabase, as long as its behaviour is consistent with the predictions of the model. As I've stated above, within the RelationalModel, this UsefulLie affords simplicity and clarity. This is not even an issue of hypothetical theorems, merely one of simplicity and convenience, but these certainly facilitate provability. If you wish to define a RelationalModel that incorporates mutable objects, feel free to do so. I'd certainly be interested to see it, but I can almost guarantee that it will be more complex than the existing model without any increase in utility. -- DaveVoorhis

You say: "As I've stated above, within the RelationalModel, this UsefulLie affords simplicity and clarity. This is not even an issue of hypothetical theorems, merely one of simplicity and convenience." The problem is that it does not afford simplicity or clarity. It is NOT useful. If you insist it is, then prove it rather than just say it! And proving that it is useful damn well is an issue of theorems... real proofs, not hypothetical proofs - models such as the RelationalModel are designed to allow more rigorous reasoning, so restrictions on mutability and such must provide simplicity and clarity to this rigorous reasoning if they are to be of any utility at all. But the restriction as you've stated it is not of any utility. It's a UselessLie. Worse, if it were an official part of the model, then the model would be logically inconsistent because you'd have immutable mutable things. The fact remains that the RelationalModel automatically incorporates mutable tuple objects the moment it has mutable RelationalVariables. You cannot escape this; it's a fact inherent to the variables. And because this is already there, there is no increase in complexity - it's there to live with whether you wish it or not. Or, more accurately, there is exactly one means to escape it: make the relvars immutable. That will do the job. If relvars are referentially mutable then so are table rows (tuples) associated with those relvars. Period. I've presented deductive proofs for this more than once. Saying it ain't so won't change it. If you have an issue with the proof, then tackle the proof rather than insisting I must not be referring to the real RelationalModel. Either that, or insist that the RelationalModel must not have mutable relvars (and, thus, that the model is not relevant to any discussion of mutability in a RelationalDatabase). Is logical consistency too much to ask?

If you're defining a variable as a tuple object, that's fine. It should be clear, however, that the immutable tuples are those within relations, not those "special" tuples that appear -- through dissection of the terminology -- external to relations as variables.

However, your challenge to tackle this issue at the level of proofs is an interesting one. I shall work on this. Please stay tuned...

-- DaveVoorhis

I'm not defining a variable as a tuple object. Objects are things that can be uniquely referenced, and that have properties. Mutable objects can change from one observation to another. I'm not defining a variable as a tuple object, but the existence of a variable containing a set of tuples referenceable with a candidate key necessitates the existence of tuple objects in that data space. The existence of tuple objects follows directly from the existence of a relation variable. There is no circular logic involved, and nothing "special" going on.

I'm interested to see what you come up with on the proofs. However, that challenge goes out to anyone else who is interested, too: ChrisDate, DanMuller, etc.

How do you address the fact that a relation can have multiple candidate keys, thus each tuple can have multiple 'identities', each of which can change independently of the other? Isn't this stretching the analogy between tuple identity and typical notions of object identity quite a bit? -- DanMuller (continued)

More importantly, what do these concepts add to an understanding of the relational model? I think the length and complexity of the discussion on this page is its own argument against such an addition. -- DanMuller

Regarding object identity: We are concerned here with object identity in the context of a data model, not a more general philosophical discussion of object identity in the physical world. A relational database attempts to model some portion of a domain, portions of which might be physical, much of which is often conceptual (as are, for instance, many accounting concepts). For the physical portions of a specific modelled domain, the digression into philosophical notions of object-ness might be relevant, but applications of the relational model in general are not limited to such things.

Regarding the question of mutability of tuples: I'm afraid I don't see why the question should be reversed. In fact, to me your reasoning seems in all respects backwards; by Occam's razor, if you will, it seems that keeping the model simple is preferred unless adding complexity addresses specific problems or limitations of the model. (Remember that we are talking about an artificially constructed model, the purpose of which is to simplify certain the organizing and manipulation of data. It's not a general theory on the essential nature of data in general.) I guess you see a proscription against considering tuples as objects, where I see an omission of the concept of tuple identity. You see more complexity, where I (and DaveVoorhis) see less. So far I've seen only vague assertions about the value of the additional complexity discussed here.

The relational model addresses concurrency only indirectly, by specifying that a database state, at all times that it can be observed, is consistent with all of its constraints. Transitions between one consistent state and another must thus appear to be atomic to users of the database. At this level, that seems to be all that needs to be said regarding mutability. This simplification indirectly affects many aspects of the model.

The implementor of a relational database must obviously delve deeper than this, and will probably deal with some notion of row identity - a notion which will typically differ from object identity as discussed above, I might add. The author of an OO data layer implemented on top of a relational database will likely also struggle with issues of object identity for objects that represent domain entities - thus only indirectly with row or tuple identity. Personally, I think the objects in such OO layers are most usefully thought of as caches of database data, because there's already a large body of techniques to help reason about and implement data caches. Correspondence between these caches and tuples in the database can be a difficult problem that does revolve around notions of identity.

I think there is room for more discussion on concurrency issues at some level just above that of the relational model. I am, for instance, uncomfortable with Date & Darwen's treatment of atomic, multi-part database updates in Tutorial D, which attempts to avoid the notion of transactions by defining a multiple-assignment operator. I can't help but think that this would be very awkward in practice, and doesn't seem significantly different from an explicit transaction insofar as user-defined functions would be involved in such expressions, thus fairly arbitrary code can still run within a partial-update context. But a) these are considerations of programming language syntax and semantics, at least one step removed from the underlying model, and b) I don't yet see how discussions of tuple or row identity help to address these problems, which are essentially the same whether you're talking about changing relvars or "tuple vars".

-- DanM

We are concerned here with object identity in the context of a data model, not a more general philosophical discussion of object identity in the physical world. -- We are concerned with whether relational database table rows have object identity... which does require some study as to what constitutes "object identity", especially in the context of mathematical objects within a mathematical space (e.g. tuples in a tuple-space, logical rows in a logical table, etc.) Do not concern yourself with whether a particular database is modelling the physical world. Do not concern yourself with what data the relational database is carrying or modelling. That's a domain issue; as such, it is completely irrelevant to the more basic question of whether the rows or tuples, themselves, have object identity.

to me your reasoning seems in all respects backwards; by Occam's razor, if you will, it seems that keeping the model simple is preferred unless adding complexity addresses specific problems or limitations of the model -- Occam's razor traditionally applies to hypothesis in science and logic; if the "Spaghetti Monster" hypothesis isn't required for a particular theory (or isn't relevant), then the "Spaghetti Monster" hypothesis should be removed. Fewer hypothesis results in a simpler theory. In the context of models, 'hypothesis' aren't strictly relevant, but 'requirements' and 'postulates' certainly are. Requirements constrain the model and postulates describe them, and the model is simpler if it requires fewer constraints and less description. Occam's razor would apply here in the sense that you should remove requirements and postulates (call them logical 'sentences') that aren't relevant to the model, including those that can be proven from other postulates and requirements, and you'll have a simpler model. Occam's razor, taken to its fullest, would encourage you to find a minimal set of sentences that fully describe the model, though there are often many possible minimal sets (where no sentence may be removed without changing the model). That is, every single 'sentence' describing the model must be justified by pointing out a theorem that cannot be proven without that sentence. Which minimal set of sentences to choose depends on which is easier (simpler) to explain or use. The best shall be as simple as possible, but no simpler... i.e. complex enough to be correct, but no more complex than necessary. When applied to an "artificial model", like Relational, "correctness" is defined in terms of the set of theorems you could prove with some "authentic" specification... but that specification isn't always as simple as possible. Anyhow, Dan Muller, my reasoning is not 'backwards'; I know what I'm talking about when I say that it is the removal of an unnecessary requirement that leads to a simpler model.

I guess you see a proscription against considering tuples as objects, where I see an omission of the concept of tuple identity. -- If it is only an omission, then it is quite incorrect to state that they, therefore, "have no intrinsic object identity". After all, the object identity exists whether you omit its description or not. There is no reduction in inherent complexity by omission of a sentence that follows from other sentences already in the model. Where you are adding complexity is in attempting to take a perfectly good model and add a sentence that states relational variable tuples or database table rows "have no intrinsic object identity". "Relational Database Table Rows Have No Intrinsic Object Identity" is generally the sort of statement you must prove. So far, you've attempted to justify it by saying it is an inherent part of the model. You effectively proscribe this fact to the model by saying "it's part of the model, therefore it's true, and if you don't like it then complain to the designers". If the model only omits mention object identity, arguing that rows have no intrinsic object identity requires you prove it using some reasonable and rigorous definition of object identity. Takes your pick and makes your choice, Dan; I'm tired of seeing people falling back to one argument when the other fails -- the model can only allow one line of argument to be relevant. If it's omission, then I can assure you that relational database table rows DO have intrinsic object identity as mathematical objects in a mathematical space. If it's proscription, then I can prove that the model is logically inconsistent with, at least, many of the more rigorous definitions of 'object identity'.

Another point of contention has been as to whether 'relational variables' are the only 'mutable' units in the model, as apparently 'proscribed' by the model... when doing so is inconsistent with any reasonable definition of 'mutable', and doesn't actually gain you anything outside of certain concurrency guarantees.

The relational model addresses concurrency only indirectly, by specifying that a database state, at all times that it can be observed, is consistent with all of its constraints. -- actually, the constraints limitation only requires that all observations of database state be consistent with all constraints. (This is a fine distinction, but it allows that the database state be inconsistent whenever it isn't observed... even if it can be observed.) However, a requirement that actors perform manipulations to the database through the relational variable would also have an effect in concurrent situations, as it would require that actors on the database read and write at the 'relation' level in a logically atomic manner. (This is orthogonal to the constraints requirement, and would be a major blow to the possibility of distributed relational databases.) If concurrency isn't directly addressed in the relational model (and I know it is not), then this requirement should be eliminated as irrelevant to the model, allowing implementors to seek better theories for concurrent operation. As this would be the only real effect of enforcing 'relation variables' as the only directly mutable 'cells' in a relational database, such a ruling is of no significant utility.

Transitions between one consistent state and another must thus appear to be atomic to users of the database. At this level, that seems to be all that needs to be said regarding mutability. This simplification indirectly affects many aspects of the model. -- With this, I agree.

Personally, I think the objects in such OO layers are most usefully thought of as caches of database data, because there's already a large body of techniques to help reason about and implement data caches. Correspondence between these caches and tuples in the database can be a difficult problem that does revolve around notions of identity. -- You discuss OO layers atop relational. In this case, 'identity' for domain objects is orthogonal to 'identity' for tuple objects. I can't agree with your proposition regarding caches and object data; a domain object represented in any data set will either be projective (carrying the definitive properties of the object) or reflective (carrying some known facts about an object that exists external to the database). The former requires no cache (except for optimization) and the latter is no cache (an actor cannot 'cache' external reality... only observations and inductions on that reality.)

avoid the notion of transactions by defining a multiple-assignment operator -- heh. I need to open that book (just received TTM from Amazon). That sounds like a rather poor attempt to avoid transactions.

I'm struggling a bit to figure out how to say this concisely. Speaking only for myself, I don't see a constraint or proscription, but rather an omission. I think I've been consistent about this, even starting with my first addition to this page. If tuple or row identity can be usefully identified as an emergent phenomenon of the model, have at it. But as you've pointed out: "There is no "typical notion of object identity" that possesses any rigor ...", so to reiterate, I don't see the general value, although I can imagine specific contexts in which the pursuit might be worthwhile. That, however, is different from saying that I agree there is an "intrinsic" object identity that pertains to tuples. (I'll assume that "relational database rows" can be rendered as "tuple", if "relational" here is meant literally and not just approximately.) How can anyone assert that they do have an "intrinsic" ("belonging to a thing by its very nature") object identity when there isn't even an "intrinsic" understanding of what "object identity" is? This mystifies me. And this page doesn't seem to have shed much light on what definition of object identity would falsify the title. -- DanM

This page doesn't seem to have shed much light on what definition of object identity would falsify the title. -- hell, it's worse than that, Dan. This page doesn't shed any light on what definition of object identity would potentially support the title. If you make a claim like the title above, I want proof! And I'm sure you want the same. A simple definition of 'object identity' that casually falsifies the title is: "object identity is any value that can be guaranteed to uniquely identify an object if that object exists." There. I'm done. A 'tuple' in a relation space can be identified by a candidate key and a relation variable. Tuples are objects, the database is an object space, relation variable is an address to a particular segment of object space, and object identity is a candidate key. This was stated at the top of the page! Now, Dan, why do I see so much support for the notion purported by the title when I haven't yet seen even ONE sound argument that supports the title? All I see, over and over again, is: "it must be true 'cuz Date or DrCodd said so." But if you believe that they didn't say so, why must I bother falsifying the title? You, like me, should demand proof of the titular statement in the first place.

Anyhow, there are many notions of 'object identity' that are both very rigorous and correct in their object space, and there are many more that are both rigorous and correct in ANY object space. They just aren't typical. The typical notion of object identity is based on human intuition and built upon human perception... which is dreadfully inadequate - you can't even prove deductively that there exist objects for you to perceive! To find a rigorous definition, you need to consider mathematical objects in mathematical spaces, for which objects provably (deductively) exist and therefore a rigorous definition of object identity may also exist. I elaborated on this a bit to Top, above. I don't advocate trying to use an inadequate typical notion of 'object identity' based on naive typical human understanding of both 'object' and 'identity'. I advocate using an atypical but rigorous definition. Comprehend?

Yes, I understand just fine. But I don't think your simple definition is useful. I return always to the case of a relation variable with two or more candidate keys. If I change one, then by your definition, a tuple "object" has both changed a characteristic (but not its identity), and has been destroyed (and another one created), depending on which candidate key(s) you focus attention on. This hardly sounds useful, and I still wait to see what concepts, theorems, or deductions it clarifies or enables. -- DanMuller

Sigh. Again, with the typical understanding of object identity. I'm rather interested in hearing what you consider 'object identity', but I don't think you'd be able to put words to it.

Consider the fresh baseball. It is an object. But, if you identify it by "the fresh baseball, never been hit once", then is it untrue to state that this object is destroyed when someone hits it? I think not.

Object identity only matters to actors making observations on a variable object-space. Ultimately, the very concept of 'object' is simply a means of modelling that which an actor perceives... an 'object' is something one can identify, observe, and watch for changes. Objects can be people, the sky, the ground, a glass of water, etc. In science, all of them are merely conglomerations of atoms interacting with light... in various philosophies, it is pointed out that there is no deductive reason to consider one physical object as 'separate' from any other. If the claim was that intrinsic object identity doesn't exist for anything... that all object identity is extrinsic... then I believe a very well-reasoned argument could be made. However, if object identity is considered intrinsic for physical objects perceived by humans in physical space, then it's just as intrinsic for mathematical objects in mathematical spaces as observed by abstract agents. And it's only reasonable to consider 'object identity' to be intrinsic when you have a value that can be guaranteed to uniquely identify an object at any time it (the value) is used.

I follow your reasoning, but I'm still not seeing the point behind such a general definition of object identity in the context of programming. I think that two different 'object identity values' that sometimes refer to the same object, and sometimes to different objects, is fine when discussing human understanding of the physical world, but I doubt that many programmers would expect such a definition in the context of programming.

You're right in that I can't put a definition to 'object identity' for you, because I'm interested here in definitions that relate to programming only, and the definitions vary somewhat among programming languages. C++, for instance, is a little unusual and comes closer to your more general definition, in that you can have (in some specific circumstances) different pointer values that reference the same object - but it's still one object being referenced, and if it is deleted, then all of the reference values are invalid. In most OO programming languages, one object will correspond to one reference value.

I'd be mildly surprised if you can even find any academic computer science papers that use a working definition of object identity that is as general as yours. I still wait to see what concepts, theorems, or deductions it clarifies or enables.

-- DanMuller

In most OO programming languages, one object will correspond to one reference value. -- I agree. Most OO programming language implementations use some mechanism of reference as object identity; all other aspects of the object are considered subject to change, and any other qualities that can potentially identify objects are considered incidental to the object set rather than inherent to the object system. C++ runtimes internally use RAM addresses for object identity. That means if, say, someone moves the object to another address in memory, it's considered a different object by all actors who happen to use address as object identity - the original object was destroyed, and a new one exists. Since C++ allows objects to be destroyed while the value referencing them still exists, and since it is possible to forge the value referencing an object (as it's merely an integer), it is also possible to refer to objects that do not exist, or to refer to an object that exists, is destroyed, and is later replaced with another object at the same location that has an entirely different type.

I still wait to see what concepts, theorems, or deductions it clarifies or enables. -- Clarifies or enables compared to what? I think you mistake definition for model. Definitions merely are. They don't do anything. You need to have at least one rigorous definition for a word before you can use it in any proofs... but any rigorous definition will do the job. The proofs you can make will depend on which definition you choose, true, but I don't have any other definitions sitting in front of me. Even the authors of this page haven't provided one. The definition I provided is simply one that is both rigorous and, for most object spaces (including physical space), very reasonable. I'm quite open to alternative rigorous definitions of 'object identity', which is why I stated a simple definition of 'object identity' instead of the definition. I do have some objection to simplistic definitions that cannot generally handle mathematical objects in mathematical spaces... but such definitions may be good as operational definitions for specific sorts of objects in specific sorts of object spaces.

One theorem I can prove with the definition I provided is: Relational Database Table Rows do, indeed, Have Intrinsic Object Identity. I'm confident that this can be proven with any rigorous definition of 'object identity' that is broad enough to generally handle mathematical and physical objects... supposing the definition doesn't reject intrinsic object identity as existing at all.

Anyhow, objects in programming aren't so different from physical objects that they don't make use of multiple identities even in your work today. Consider a filesystem from which web pages may be displayed... each file (an object in the filesystem) can be identified by its canonical file name, by its URL, by its ComputerID:filename, by its hashcode, etc. These can change independently of one another (e.g. a URL can move from one file to another, a filenames can be switched, files can be renamed, files can be changed which will alter the hashcode, etc.) However, the file is the object, existing courtesy of the filesystem. Is there any better identity for these files than, say, the name... or the URL... or the content? Not really. Actors need to make do with whatever identity they choose to use and live with the fact that under that perspective, manipulations by one actor (e.g. renaming the content) will appear to another as something entirely different (deleting an object with one name, and creating an object that incidentally has the same content with another name).

More complex object-oriented languages will, indeed, migrate towards a more flexible approach to object identity than exists in current runtimes... especially for distributed, resilient (fault-tolerant and attack-tolerant) programming. The current system of using addresses as the only reference mechanism is far too fragile and far too easy to tamper with in distributed systems... even those that do not expect regular cracking attempts or the destruction of valuable hardware at inconvenient moments.

I disagree with the file system analogy. File names are very much like a primary relational key (within a folder). However, that is not something that most implementations of OOP inherently share. Unique file names would have to be an added constraint in an OOP model. The only comparable thing most common implementations of OOPL's share out-of-the-box is to see if two object references are referencing the same RAM address. This is more analogous to a system-generated key instead of a domain-attribute-based key. Most OOPL's don't compare equivalence on attributes, let alone give you a choice of which attributes "count". This also reflects the real-world, more or less. The only way to know for sure if two people are not in fact the same is to put them in the same room. In other words, see whether or not they occupy the same "space". In computers, this space is RAM. -- BlackHat

Seeing if two people are the same by observing whether they occupy the same 'space' only works due to a constraint that is understood to be part of the physical domain: no two objects can occupy the same space at the same time. Thus using time and space coordinates is certainly one means of identifying physical objects. It works well enough for people... with only slight hiccups when dealing with pregnancy, chimerism, transplants, implants, artificial limbs, and siamese twins. However, not all mathematical spaces possess a 'distance' dimension. Tables a'la SQL, for example, do have an inherent 'distance' dimension: row number. It makes for an address-based object identity. Relations, being sets, do not have a 'distance' dimension; an object is either entirely within a set (distance=zero) or entirely outside of it (distance=infinity). Since all objects in a set are zero distance from all others, you cannot use addressing.

Of course, you can't count on space identifiers for all things even within the physical realm. When dealing with bosons, the Pauli exclusion principle does not apply. Bosons can occupy the same 'space' at the same time. Attempting to use address-based 'object identity' for such objects is doomed to failure.

I meant for practical, common stuff. Bringing the realm of quantum physics into the mix is not necessary at this point and complicates an already tricky topic in my opinion. -- BlackHat

Canonical (aka absolute) filenames do include their complete path. A canonical filename is a complete address, and is necessarily unique within a filesystem. (...) (...) Strings from a finite character-set are isomorphic to integers and filenames (in any modern system) be represented as such strings... so they are not significantly distinct from RAM addresses. The important components of my analogy above was not the filename, but the presence of two other sorts of agents viewing the filesystem differently... one through a translation (URL to file) and one through an inversion (associating identity with the content block, so a 'name change' is meaningful... whereas 'renaming' under the view that filename is identity would be considered a destruction followed instantly by a creation). These additional views are also common to filesystems in programming. It's a mistake to consider RAM as the most natural space for objects in programming simply because it's the one to which you are most accustomed due to modern OOPLs. Identity as a RAM address allows for rapid access to representations of objects in a local system because it is very close-to-the-metal. However, RAM addressing is very poor for object-oriented code in distributed systems, for objects shared between different sorts of non-persistent processes, and for code allowing for transparently persistent objects in general. Address-based identifiers of any sort are awful for code that must resist forgery of object identifiers, for code that acquires security through the capability model, for code allowing the migration and mobility of objects between and during runtimes, and for code that incorporates distributed caching and synchronization models for speed and resilience (allowing one object to take over if another fails). Identifiers utilizing a RAM address get the worst of both worlds.

You should view a RAM address as being a very simplistic form of object identity... one that trades for speed on a local machine at a rather significant cost to flexibility, security, abstraction, and resilience.

It would be interesting to look into how distributed OO systems deal with object identity. It may tell us something about how the implementors view object identity, being that I am looking at the usage frequency to shape my working description of ObjectIdentity. -- BlackHat


I moved this down here because there was an editing conflict that tangled what I was replying to.

Let's simplify things and assume a file system without sub-folders (like early PC's):

   class File {
     attribute fileName: string private;
     attribute content: bytes private;
     method new(nm, con) {   // initiator
        fileName = nm;
        content = con;
     }
   }
   ...
   file1 = File.new("foo","asdfasdfasd");
   file2 = File.new("foo","xx234987");
The object engine would not crash. We have two "files" (2 file objects) with the same name ("foo"), a no-no in filedom, but native OO does not understand this. To prevent this, we would have to implement some kind of search of existing objects to make sure one with the same name is not already there. Contrast with a RDBMS where by saying the "fileName" column is the primary key, we prevent duplicates. The RDBMS natively "understands" domain attribute based identity. By "understand", I mean it has built-in abstractions that support the concept.

What you don't have is a filesystem -- the object-space. Files cannot exist without a filesystem. Your attempt to place the files into the C++ object space is where you err. A correct implementation would be: deftype Filesystem = [Filename => File], deftype Filename = [Char], deftype File = [Byte]. The RDBMS logical equivalent would to have a set of (Name,Content) pairs, with a constraint that no two names in the relation are identical. Theseare rather different approaches.

Are you saying the C++ approach is not "true OOP"? Perhaps this is an issue of the definition of OOP?

Not at all. I'm saying that your C++ approach is not "true files and filesystem". You created objects, but you did not create file objects... at least not in the sense of files in a filesystem.

I am not familiar with the syntax of your proposed alternative. It does seem to have maps built into its syntax, but that is not a requirement of OOP by most definitions.

Objects need to exist in an object-space. It is true of OOP under any definition. For most implementations of OOPLs, the object-space is an addressable memory space. For files, the object space is a filesystem. That's pretty much true by definition. Oh, and syntax [T] is logical array of type T (indexed by position), while [K=>T] is a logical map. It's syntax from a MyFavoriteLanguage.

That is an addition to OO. Again, I am basing my characterization of OO on common usage, and C++ style OO includes that. I am not giving the common definition a value judgment at this point. Note a primary key makes it unnecessary to have to apply explicit constraints in RDBMS. Primary keys are required by the relational model (because sets by definition have no duplicates). Constraints are not. -- bh

Primary keys are not necessary. Candidate keys are. But if there is more than one candidate key, there is no requirement to name one of them "primary". Oh, and having a primary key smaller than the whole tuple is a constraint.

How about we put it this way: it needs at least one unique key (compound or singular). But it does not change my original point.

I'll agree that relations necessarily possess at least one unique potentially composite key, that being the entire tuple. Your original point is incorrect for other reasons.


Re: "Did I not say that "rename" doesn't make sense if you consider the "identity" to be the filename?"

Please explain. From an interface/user perspective, it could mean "create a new copy with identical attributes, but with the new given name and discard the original". Whether that captures the "essence" of "rename" or not, that is kind of a fuzzy psychological issue.

This would be equivalent of when getting punched in the face, a deity (or quantum threads) suddenly grabs the pre-punched person and replaces it with a new person (post-punched) which is identical to the old person in every way except for a bashed up face. From the observer (user) perspective, they cannot tell the difference and normally don't care. The file name is equivalent to the person's face: it changes but nothing else does (from the observer's perspective at least).

If you view filename as that which identifies a file object, then the concept of "renaming" simply doesn't make sense. If you view filename as file-identity, then you can watch the 'content' of a file change over time, and you can watch as file-objects are added to and removed from the filesystem, but you can't watch a renaming... the closest you'd ever come is seeing a file with one name disappear while another file simultaneously appears with a different name but with the exact same content as the file that disappeared. You might have some psychological issues with it, but you're a human and thus subject to psychology; there isn't anything fuzzy happening here in the logical sense. For any actor, human or otherwise, observing a logical renaming requires that one consider the file to be identified by something other than its name.

My point is that one cannot tell the other apart. If there is no way to tell them apart, then for practical purposes, they are equivalent (or if the differences don't contradict the definition). If the process is thread-locked during the copying, there is no way a user could tell the difference by seeing two files at the same time. I suppose you could argue that it uses more temporary space than an attribute-only rename, but the use of work space is not forbidden by most system utilities.

If you define, for practical purposes, "renaming" as a copy followed by a delete, then it would be so. However, that is not how humans think about "renaming" objects. Don't kid yourself about it. You can't "rename" your cat by cloning it precisely, naming the new one, then killing the old one. It doesn't even work if the process is done within a Schroedinger box into which no observations may be made until the process is complete. To a human, "renaming" means that the object is the same... excepting that name associated with it is altered. To consider "the object is the same" absolutely requires that you identify the object by something other than its name. For a cat, it might by its body or history or the memories associated with it. For documents, humans tend to identify a file based on some combination of its content and origin. E.g. 'the document I wrote about how renamed cats have only eight lives left'. A file containing such can be given a name, then I can change its content, then I can change its name again... and it remains the same 'object'. Renaming the document file changed some property related to the document (the filename associated with it); it emphatically did not destroy the object and create a new one simultaneously. Changing the content changed another property. Neither change affected how I identify the object.

How you or any actor views the world or any other object system always depends on what you consider to be 'object identity'. If you consider the filename to be the identifier, then you can't see a rename; you see a copy followed by destruction... a pairing of events you might identify as a common pattern if you see it often; you could call such a pattern "rename" but you wouldn't mean what humans usually mean when they use the word.. If you view 'origin' to be identity, then a rename is fairly natural (as rename doesn't change origin). 'Copies' also make sense, and would be considered different files because they have different origins. However, you'd see a 'copy followed by a delete' as exactly that: creating a copy and destroying the 'original'; you would scoff at calling this a "rename" even if rename is implemented by exactly that. Are you the same person you were yesterday, or is the old you gone forever, replaced by a new you with slightly different properties? Is FredsAxe the same one he started with? Etc. This sort of paradigm issue exists in any object system where multiple identities may reasonably coexist for an object. I originally brought up filesystems precisely because there are many valid paradigms utilized by actors interacting normally with a filesystem... because of multiple 'identities' for files, there is no one identity that can be declared "most correct".

Well, I think it is relative. We think about things different ways depending on the moment and need. It is not cut-and-dry. At this point I think we just have to AgreeToDisagree. -- bh

You can certainly change what you consider to be object identity depending on the need at the moment. That was my point: there are plenty of systems for which it's perfectly reasonable to view the same objects in different ways based on need, and nobody can tell you that any a view is wrong so long as that view is logically consistent. However, certain actions are impossible to perform or perceive under certain paradigms... saying they're possible (without butchering the essential meaning) would introduce a logical inconsistency. Among these is the concept "rename" when object identity is name. Of course you're free to butcher the essential meaning (the assignment of a new name to an object that once had another name) and replace it with something that fits your object identity (copy + delete); I won't accept such a definition as "correct", but I'd agree that it's "practical".

I believe issues of logic and math like this one are cut and dry, and to AgreeToDisagree would require that neither of us respect the answers that come from cold, hard logic and deductive proof. I will agree, however, that you're entitled to your own beliefs, be they inconsistent or not.

If you can produce a "solid" model or math that fits common or consensus definitions that proves it "wrong", I may change my mind. At this point your views seem to have a partial foot in psychology rather than pure math or hard science. It is technically possible to "rename" a primary key like a file name in a usable sense. That is not disputable that I can see even if it does create some philosophical puzzles not too different from the kind found in quantum physics. -- bh

Digging in your heels and calling your argument "not disputable" is somewhat dishonest when dispute clearly exists.

You cannot technically 'rename' or otherwise 'change' the object identity you are using to identify objects. To say otherwise is to introduce a logical inconsistency - a contradiction. The very notion of observing change wrgt an object requires that it be possible to identify the changing object both before and after the change, allowing one to observe differences. However, one identifies an object by an object identity. Thus, if it were possible for 'object identity' of an object to change or be changed, one would by definition be unable to identify the object both before and after the change with the same object identity and therefore observe the change. Thus, said change would be impossible to observe.

Now, if the notion of 'object' were truly independent of observers, this wouldn't be a problem... but this is not the case (see next paragraphs). Relative to all observers of that 'object identity', the object simply no longer exists; the most an observer can do is note this non-existence... a 'destruction' of the object. Of course, with a little search, one might learn that another object exists with similar or even equal properties... but without identifying objects by something other than that first identity, you'll never be able to deduce that the destroyed object is somehow the "same" as the created one. The notion of observing a change in identity leads to a contradiction... and because objects are, by nature, a paradigm on observation, so is the idea of change in objects. Change of object identity isn't observable, therefore it isn't logically possible... unless you use one object identity to observe a change of another, but doing this requires accepting that objects have more than one identity - a notion that you have explicitly rejected for files... and therefore for all address-identified objects by virtue of isomorphism. All this is pretty darn fundamental. You literally, logically, and technically cannot rename a file unless you allow object identity for a file to be something other than its name... or you seriously butcher the essential meaning of "rename".

There is a very pure mathematical foundation underlying my views on objects, and there is no direct grounding in any sort of psychology excepting that which originates "object" as a meaningful word. The concept of object is founded in humans observing and creating a mental model of physical reality. The concept of object is necessarily tied to perspective (paradigm) because there is nothing about our observations on physical reality that can even deductively prove reality 'exists', much less objects within it: there is no inherent reason to consider me a "human" as opposed to a conglomeration of water and lipids, there is no inherent reason to consider a particular density of water vapor a "cloud", and so on. But our brains are pattern-matching machines, and so they identify patterns. "Object" identifies certain sorts of observed patterns... those distinct from space, time, masses, materials, and properties. "Objects" are things we can observe (watch, feel, taste, smell), manipulate, and predict. "Objects" have properties that may (in some domains) change over time. "Objects" are unique in that one identified object is necessarily distinct from another (or you couldn't call it "one" object). "Objects" are in some sense 'real' - they inflict their reality on us observers, not vice versa, because changing how we view the object does not change the object.

A mathematical object must also have all those properties... but the concept of a mathematical object need not have us humans and our five senses. Nor does it need exactly three dimensions of space or one dimension in time - they can even be timeless and spaceless, when abstracted, like value objects. All issues of psychology can (and must) be removed - a more abstract 'observer' can be utilized instead of a human observer. A more abstract 'actor' can be used in place of physical forces. To observe an object necessarily requires that one have somewhere to look, which necessitates the existence of some sort of 'object space'. (By definition, our senses observe the 'physical plane' space.) 'Objects' in an object space can be observed... potentially over "time" if the mathematical abstraction includes a time dimension (or more than one). Mathematical objects in this mathematical object-space are as "real" to the abstract, mathematical "observer" of this space as physical objects are to us. Like the objects we experience, mathematical objects are not properties... nor are they patterns of objects with emergent properties (which would associate more closely with "materials" or "masses"). Indeed, individual mathematical objects necessarily must be uniquely identifiable by the observer... or they would not be "individual" objects. A concept used by an observer for identification of an object can reasonably be called an object identity.

Object oriented programming reverses the abstraction; rather than merely observing objects and manipulating them, one builds complex systems by creating objects and designing how they interact. In this abstraction reversal, one will necessarily implement both at least one object space, at least one object identity for every unique object, and a property representation. These things are minimal and necessary to the concept of object. One would generally add "methods" if an action on one object should propagate to another... because enacting complex object interactions isn't the actor's job. However, methods are a domain thing... if objects in a domain don't interact, then methods aren't necessary. These things will be true in any object-oriented system regardless of other features commonly associated with OOP (like support for classification of objects, polymorphism, prototyped objects, abstract objects, virtualization, encapsulation, dynamic dispatch, etc.). These other features are important to making object oriented design easier in complex domains, but are hardly essential.


I am not clear on why auto-number primary keys for database rows does not satisify your requirements for "object identity". They are a lot like RAM addresses in utility: a "dumb" unique number not tied to domain attributes. They just happen to be more visable and more permanent than RAM addresses. Would you be happier if they were not visable? The visability is for practical reference, so that one can say pick up the phone and say which record has a problem. But this nice feature can in theory be tossed to satisfy your definition if that is the stumbling block. -- bh

Either I've misstated or you've misunderstood. Candidate keys of any sort do, indeed, satisfy the requirements of object identity for the tuples in which they are found, and can also be used to further address individual cells within those relational tuples by name. Primary keys and auto-number keys are types of candidate keys, so they also qualify. However, it's quite possible that the object you're considering does not include this identity; that is, it is associated from the outside. This is the case with object addresses, filenames, etc. The location of the object is not necessarily "part of" the object; it's "part of" the world. You just happen to use it to identify the object. The same is true with filenames and files.


I never thought the term "rename" could get so contraversial. I guess CommonSenseIsAnIllusion. Can we go back to DefinitionOfLife? Can life be renamed? :-) -- bh

Yeah. I'd think a definition you can get from a dictionary, and that people use every day, would be the obvious one. Then comes equivocation. "Rename" can only mean one thing for an argument.

I don't see any conflict with implementation choices. To me "rename" is a description of what you want, not how to get it. This topic seems to be growing a bit testy. I suggest we take a break for a few months and ponder it a while. -- bh

I've put together a little primer "false" conversation between the two of us. Please read it and comment; I'm hoping a different approach might better inform you of how I think, and why we're having troubles communicating. Most of your issues are addressed by your double, but there might still be some confusion. Let me know, and I'll insert it to this little 'dialogue'.

   BH: People use "rename" on files all the time.
   GH: Yes, the certainly do. So, what "thing" are they renaming?
   BH: The file. ...
   GH: Indeed, they rename the file. So, how do people identify files?
   BH: By name, of course.
   GH: Oh, but do they? Certainly I can tell you to fetch the file "/my/foo".
             Tell me, what does "rename" mean.
   BH: It's a results contract. If I rename "/my/foo" to "/your/foo", supposing
             "/your/foo" doesn't exist and all permissions are in order, then "/my/foo"
             will no longer exist and "/your/foo" will exist, and "/your/foo" will have
             the same content that "/my/foo" once had.
   GH: So when people talk about 'rename', they're merely talking about the end
             result? What if they were renaming a cats? If I copied the cat, gave it
             a different name, then disintegrated the original, would that be a renaming?
   BH: That's different. I'm talking about renaming files. For files, you could
             implement by copy+delete... maybe with some thread locks so nobody notices
             the interim state. Or you could re-link the file if the filesystem supports
             linking. I'd hope, at least, that renaming the cat would be more like the
             latter...
   GH: I asked what "rename" means in general, not for files, and I didn't ask for
             implementation details... though I don't disagree with those you offered. 
             Please humor me: what would you call copying and disintegrating a cat to
             rename it?
   BH: ... cruelty to animals.
   GH: Heh... fair enough. But are you sure you wouldn't call it renaming?
             After all, it meets all the requirements of your results contract. If
             you insist, I can automate it and do it in a Schroedinger's box so nobody
             notices the interim state where two cats exist.
   BH: I feel like this is a trick question. I've already said that "rename"
             is different for files and cats.
   GH: And why do you say that? Because you don't want me renaming cats with
             a disintegrator gun? Can you explain or elaborate as to why "rename"
             is different?
   BH: ... No. It just is.
   GH: I look in the dictionary (WordNet), and here's what it says:
                  rename: verb
                     1. assign a new name to; "Many streets in the former East Germany 
                        were renamed in 1990"
                     2. name again or anew; "He was renamed Minister of the Interior" 
             ... I see nothing about a results contract.
             To me, rename looks a lot like you have one object that has moved
             from one name to another. E.g. if you rename a street, it is
             the "same" street... except for the name. This works for cats, too,
             and it's a lot more intuitive than "a results contract".
   BH: Yeah, it works for cats and streets... but it won't work for files. Files
             are different. They can't use that definition.
   GH: Why are files different?
   BH: You're asking me that? You answered the question yourself, what, a
             half-dozen times already. Besides, you're playing both parts in this
             "conversation", so these are your words anyway. Basically, files are
             different because they're identified by name. You can identify streets
             by location and heading... so if you rename a street, everyone with a
             memory can look at it and say, "Hey! This street's been renamed!"
             They can't "remember" the new name, but they can remember the
             original location and heading, which allows them to make this conclusion.
             Similarly, cats can be identified by personality, appearance, behavior,
             general location, or even DNA. So, if someone who has memory of the
             cat's old name then looks at the cat's new collar, they can conclude,
             "Hey! This cat has been renamed!"
   GH: Memory, eh? What were all your earlier complaints about psychology for?
   BH: Hey, smartass, these are your words, not mine. Get it straight, and stop
              trying to confuse the readers.
   GH: Hey, no need to bite. Let's tone down the insults...
   BH: *glare*
   GH: So, if I disintegrated a street, then rebuilt it at the same location and
             heading, would it still be the same street? If I didn't rename it, would
             it have the same name? What if I disintegrated a cat then rebuilt it
             atom by atom with nanolithography, so it had the same DNA, same behavior,
             strutted the same streets, etc?
   BH: I don't like this philosophy stuff, and I'd rather not think about how
             ObjectIdentity would handle under StarTrek technology. Take it to FredsAxe.
   GH: *pout* Okay. So, you were about to explain why files aren't like cats
             and streets?
   BH: No. You were about to explain. I'm just your mouthpiece for this bit.
             I'd rather you not put words in my mouth, but I'll do it so long as it helps
             the dialogue. Anyhow, files are identified only by name. That only part is
             really important. They're like the primary key in RDBMS or the RAM address
             in C++ objects.
   GH: Oh? Why? Why not identify files by content, by authorship, by update
             history, by permissions, etc? How is content of a file different, fundamentally,
             from the appearance of a cat? ... well, other than the obvious. The state of
             a cat's appearance and the state of a file at a given time are both values.
             Files are digital, and composed of bits... but cats are furry, so I could have
             each hair in some position to represent data. I could read with a digital camera
             and write with a comb. Now, I'd need to take care to preserve this appearance
             lest it fall into disarray, so I'd need lots of hair gel, but (... ramble snipped ...)
   BH: Files can be copied, which means content of two files could be exactly equal.
              Also, files are logical objects. They're above the physical layer.
   GH: ... identical twin cats? or perfect copies? (heh. copycat...) Suppose I invent
              a machine to copy cats. Oh, and technically the appearance of a cat is
              also a logical object, as is the current state that appearance represents, and
              the value that current state represents. I just figured I'd need to discuss the
              physical layer or you wouldn't believe me.
   BH: A cat's appearance does not normally represent anything. A file's state
              usually does.
   GH: And your point is? I know my point: a file's state and a cat's appearance
              can both change without changing other identifers, like name. If you consider
              it reasonable for agents, like humans, to identify cats by their appearance,
              then you must consider it equally reasonable to identify files by their content.
   BH: Argh! You're rather frustrating to talk to. You know that?
   GH: I know.
   BH: Okay, I guess you can use content to identify files... that's what Google
              does. And those other things would be nice, too. But I am talking about
              the most basic of filesystems, like the 1980s micros, maybe with folders. It
              can only go from Name to File. Anything above that you need to add on
              yourself. If you like abstraction so much, call it an abstract filesystem.
   GH:  ... Then I'd need to add abstract observers and actors, who start collecting
              information, reading, writing, indexing content, etc. You know, like humans and
             google.
   BH: They aren't really part of the filesystem.
   GH: I dunno. There isn't much point to a filesystem without any users. In at least
             one sense, they're definitely part of the system.
   BH: ... okay. I'll grant that; I'm sure there's at least one sense in which it works.
             But if you keep distracting me, you'll never get your answer as to why files
             are different.
   GH: I don't actually believe that files are importantly different, but you may continue.
   BH: (:grumble: ... I really hope some deity smacks you around a bit ... :grumble:)
             Fine. They're your words. You say them. I'm done.
   GH: Certainly. Files are usually identified by name. They can often be identified
              by other things, such as content or creation time, but one cannot count on such
              things to always uniquely identify a file. One can always count on filename to
              identify a file within a filesystem. For the sake of BH's sanity, consider a system              
              in which files may only be uniquely identified by filename... where any identifier not
              including filename will either identify two or more files or zero files. In this case,
              rename is impossible. Why? Because it is impossible for any observer to have
              memories associated with a particular file by anything but its old name. So, if you
              supposedly "rename" an object, no observer can ever look at  that object and say,
              "Hey! This file has been renamed!" Instead, as far as the observer can tell, the
              object associated with the old name is gone, vanished, disappeared; it simply doesn't
              exist anymore. If the filesystem is small enough, and the observer is looking for
              it, the observer might recognize that a new file is in the filesystem and that this
              file has the same content as the one that disappeared. It might even be able to tell
              they're at the same time if the observer has a clock. But there'd never be a reason
              for this observer to remember this event as a "rename". It'd just record that one
              file is gone and another exists... a creation + deletion.
   BH: I keep telling you: renaming is possible. It's just different. Just becuase you
              can't see the "rename" doesn't mean it didn't happen.
   GH: That's quite some logic, there. It's true, but it's also of the exact same sort as:
              "Just because you you don't see the Giant Spaghetti Monster controlling you like
              a puppet doesn't mean he isn't out there, doing so. or Just because evidence
              of Dinosaurs exists doesn't mean God didn't just plant them there for humans to
              dig up.  So, if you can prove that you'll never, ever observe'' a rename, then
              can a rename really "happen"?
   BH: You can observe a rename. Look at the filesystem. A new file has the same content
              as the old file, at about the same time... or even simultaneously.
   GH: Oh, so you're identifying files by content and time now? No, no, no... that's
              against rules I added for your own sanity, BH. You may only identify files by name. 
              If you try to identify them by content and creation/deletion time, you'll get at least two
              files back. You might get a million. How would you tell which file was the copy, then?

   BH: ... Now I'm really starting to feel insane. Okay, so you're essentially telling me that
             attempting to identify a file by content and creation time is to attempt a different ''object
             identity''?
   GH: Yep. That's exactly what you just tried. We could go on a lot longer, but every single
              time you'll bump into exactly that sort of wall. It is logically impossible to observe a
              rename if you can only identify objects by name. If you did, it'd be a paradox... one
              of those "Hey! I just proved one equals two!" situations.
   BH: I might challenge you on that later, but for now I'll believe you. I'll focus on another
             point: Just because I cannot observe a rename doesn't mean I cannot perform
             a rename.
   GH: Oh? Please elaborate.
   BH: Well, rename is essentially a results contract. When I decide I want a file renamed,
              what it means is that I want one file gone and for another to exist with the same
              content and a different name of my choice. That might be implemented by a
              "copy-and-delete", but, to me at least, the file was "renamed".
   GH: Okay, this seems reasonable. However, I'm sure you're aware that you're running the
              risk of equivocating and ending up in a LaynesLaw debate. The definition I use for
              rename is very different from the one you're promoting.... and it's better.
   BH: No it isn't.
   GH: Yes it is.
   BH: No, it isn't.
   GH: Yes, it is... for the following reasons: (a) the definition I provided works for cats,
              streets, files, and mathematical objects. (b) the definition I provided for "rename"
              corresponds directly with the dictionary meaning and what people mean when
              they say "rename". (c) the definition I provided for "rename"  allows both recognition
              that a  rename event has been performed (an object that, to your memory,
              had one name, now has another) and provides a description of the required
              results for completing a rename (a new name has been assigned to an object).
                   Of course, we must still define 'object', 'name', 'assign', and 'time' for this 'rename'
              event to be understood in depth. These get rather interesting. 
   BH: cats, streets, files... We just discussed this. It doesn't work for files. You said it was impossible.
   GH: Sure it does. It just doesn't work for files when you constrain files to only being identified
              by name. That hardly seems a natural constraint to place on observers of the filesystem in
              the more general sense. 
   BH: ... (burble, burble) ...
   GH: Sorry! I forgot about that sanity issue. It's alright, though; being insane isn't all that bad. 
              Beware, though, that you might develop a disconcerting habit of talking to yourself. Well,
              it isn't all that disconcerting for you, and the conversations can get pretty good, and...
              I'm sure you know what I mean...
   BH: I'll get better with a few beers, I'm sure. I have a question, and I want to avoid another
             debate on definition. Using what I consider to be rename... in that filesystem
             where objects can only be identified by name... what are your thoughts.
   GH: Basically that you'd be delusional, maybe a little kookoo on the choochoo, not quite all there
             in the head if you know what I mean.
   BH: ... okay, who was telling whom to cut back on insults? And I demand you explain yourself.
   GH: Sorry. Basically the issue would be thus: You "rename" a file. You tell someone else you
             "renamed" that file. They go look at the object system and say... "so, where is this file you
             renamed? I can't find it." Then you say, "That's because you're looking for the file under the
             old name, you nimrod. Look for it under the new name." Then they say, "Okay, I found the file.
             It seems new; I don't remember it being here before. What changed?" Then you say, "the name
             changed." Then they say, "How do you know? To me, it looks like one file is gone and this
             new one exists." Then you say, "I just know." Then they say, "Oh? How do you know? Does god
             speak to you?". Then you say, "No. I renamed the file myself." Then they say, "Oh. That makes
             some sense. Can you show me proof that you did this?" Then you either try renaming another file,
             which will look to the other observer exactly like a delete + create in close proximity, or you can say "No".
             Either way, you're certifiable.
   BH: Did you just have a conversation with yourself within a conversation with yourself? Really, though,
              I thought we were using my definition.
   GH: I'll admit to a bit of facetiousness, there. ;o) However, even your definition doesn't allow people to
              observe or otherwise recognize a rename event from other coincidental create+delete events. 
              It just allows you to say that you've done a rename if your actions have particular results. As such,
              it's really only half a definition. However, if you can show that the copy+delete is the result of your
              efforts, then you could show that you've performed a 'rename' by your definition. 
   BH: I don't think it's less correct to define "rename" in terms of its contract than it is to define
              rename in terms of its recognition.
   GH: The definition I provided allows for recognition and a contract. I think that makes it more
              correct. However, that's just expert opinion from a designer of language that supports distributed
              objects and services... what could I know?
   BH: But 'recognition' is getting back into all that psychology-based-definition stuff. So is your insistence
              on observers, actors, etc. Psychology is not cut-and-dry like math and logic; you shouldn't be using
              it here.
   GH: I study computation theory, which is pure math and logic. Data,
              knowledge, memories and such can all be studied from a computation theory
              perspective. That is what looks, to you, like psychology. But it isn't... at least
              not in the 'soft science' sense. Go study the ActorsModel, the PiCalculus, and especially their
              behaviorally typed counterparts. Maybe study a little InformationTheory, Cryptology, and
              ModalLogic?. Then come back and tell me I'm talking psychology. The 'actors'
              and 'observers' I've been talking about are probably more abstract and more
              mathematical than you imagine. (I'll bet a cookie that you've been thinking
              actor == human user and observer == human watcher.) Humans can fulfill a
              role as actors and observers, but so may other things (e.g. programs, threads,
              intelligent agents, pigeons, hardware, etc.)

Far as I'm concerned, it looks fine. Quite entertaining, really. Funny thing is, I can see - and appreciate - both your points of view. Now, I humbly suggest that you both take a step back, endeavour to transcend merely understanding each others' points of view just enough to launch a counter-attack, and take a meta-view: Look for the motivations behind the viewpoints themselves. Explore the belief systems, experiences, and backgrounds that (inevitably?) lead to these points of view. You might even wish to ask each other... Questions.

If undertaken in earnest, I suspect you'll both find ways to transcend the quibbling and express your arguments in a compelling and convincing manner. One of the keys to this is to recognise and even embrace the views of your opponent, in order to address the inconsistencies in your opponent's arguments in his or her terms. -- DaveVoorhis


Who snipped out the government tag inspectors versus granny's cat-eyes identity analogy? And why?

Shark ate it, by reverting to the last legitimate edit prior to it, but I'm not sure why. I've put it back in. Time for some bug-hunting, methinks. DeleteWhenCooked -- DaveVoorhis

Thanks for putting it back. I added a second one, but this time I am keeping a copy in case editing foobars it again. -bh


Is it the name that bothers you?

If you didn't include it, "rename" would be a feature one would eventually ask for after frequent file copy-and-deletes become tiring. Call such operation "Zibtroob" if you want, but I think most users would rather call it "rename". Just think of it as "Zibtroob" and the name won't bother you anymore, correct?

If a customer asks to group a bunch of commands together under a single name to save time, then I doubt you would complain, correct?

Suppose a user gets tired of copying and deleting over and over, and decides to make a command file (like a DOS .BAT) to do it for him/her. Suppose the user calls it "cmd2" for whatever reason. If you ask the user what it is, they may say, "Oh, I just got tired of copying and deleting over and over and decided to consolidate it under one command to save time and reduce typing errors." I assume you won't have any problem whatsoever with this. Correct? It is just automation 101: factor frequently-used sequences into a single command/object/idiom.

But if they call such a thing "rename", it *then* seems to bother you out of some sense of mathematical or logical purity (which I still don't agree with). Thus, it is the name that is the issue, not the existence of such a feature, correct?

If the name does not lead to confusion in the user(s), then there is no practical downside to using the name "rename" that I can see. Unless, you are worried that it will corrupt the user's knowledge of identity issues and cause them conceptual problems down the road. But unless they run the Star Trek Time Transportation and Dialation System in another job, such is unlikely to be a problem. To be frank, the word "anal" comes to mind. It is similar to calling a tomato a "vegetable" in a salad recipe. Technically tomatoes are a "fruit" I hear, but mentally people like to group them with veggies due to the taste and typical usage. If it was say a PhD thesis on biology and evolution of fruits, then and only then could it be a potential problem.

-BlackHat

It is equivocation that bothers me. Rename has a certain meaning for objects in general. You must use exactly one definition for every purpose within any mathematical or logical argument or you lose the ability to perform proofs.

We've been discussing object-identity in a relational database, and made analogy to object-identity within a filesystem. Object-identity in particular is interesting in that such things as cats also' qualify as objects having some identity... thus your definition of "rename", whatever it is, must work for cats as well as file-objects, and even for tuple-objects. The latter is handled vacuously because tuples have no names and thus cannot be renamed, but the other two (file-objects and cats) must share one definition of "rename" within the discussion.

You are using "rename" as a contract. If so, then for this full discussion, "rename" must be a contract. Cats included. Otherwise LaynesLaw is going to interrupt any meaningful discussion. If you must use "rename", call mine "rename-object". It still stands that "rename-object" is logically impossible in that filesystem... even for file-objects.

Even so, your 'rename-file' or 'Zibtroob' is only half of a full definition. It's a contract, and contracts are not (generally) definitions. It fails to allow you to identify a renaming event. Any good definition will conceptually allow for both identification and creation, though one of those steps may require omnipotence or omniscience within the system. (See KnowLedge. Words, themselves, arise as abstractions to allow semantic compression in discussion of higher-level concepts.) E.g. by knowing the definition for parallel lines, you can create and identify parallel lines in many 2-or-higher dimensional surfaces. By knowing the definition for a triangle, you can create and identify triangles in a mathematical universe. By knowing a definition for 'cat', one can identify cats and produce cats... and, if omnipotent, you could create cats. It works for verbs, too; by knowing the definition of the phrase "bicycle-racing", you can both identify the bicycle-racing action and produce a bicycle-race event that follows the definition. However, if all you know is the results-contract for 'rename-file', you could not recognize a rename... even if you were omniscient.

Oh, and tomatoes are legally vegetables (due to lobbying) and scientifically fruit (since 'fruit' is defined in terms of the flesh and seeds). Earlier definitions influencing the ruling in U.S. Supreme Court in 1893 were use-based; 'fruit' was something you ate for dessert or breakfast. I'm not particularly concerned which definition you use here because both are proper definitions of 'fruit'... but I'd blast you for saying that a tomato isn't 'fruit-scientific' because it isn't 'fruit-legal'. Similarly, I'll blast you for saying 'rename(-object)' is logically possible on a filesystem because people do 'rename(-file)' all the time. The two are different definitions and require different evaluation. In the paradigm where files are objects witin a filesystem object-space, 'rename-object' will continue to be the only proper definition. Exceptions cannot be made because you're used to 'files' having a different 'rename'.


Much, if not all, of this page is moot. In the RelationalModel, the tuples (aka rows) in a relation represent beliefs or propositions (see DatabaseIsRepresenterOfFacts) which are characterised by their truth, not objects that are characterised by state. In particular, there is no notion of identity that needs to be preserved across arbitrary changes of state. Identity, as represented by a key, is needed only to uniquely identify a particular fact so that it can be replaced or removed. There is no notion of "change of state" at a tuple level. There are only facts that may be added (INSERT), replaced with a new fact (UPDATE), or removed (DELETE).

While the RelationalModel does not express "change of state" at all (as is typical of mathematical algebras and calculi), a Relational Database certainly does. For a Relational Database, identity of the relation variables (RelVar) must be consistent across arbitrary manipulations - update, delete, insert. Because identity of the RelVar is consistent across operations, one may also speak of 'row' identity = <RelVar, any CandidateKey>. One may speak of the 'state' of any tuple so named (consisting of both 'existence' status and contents of values in non-key columns). As you note, a Relational Database makes direct use of this notion for updates and deletes, so you cannot even argue this notion to not exist in practice. QED. You are free to argue that such identity is certainly second class (similar to the identity of a particular attribute or method in an OO object) but to say the notion does not exist or is unused seems wrong.

Given the following RelVar

 VAR r REAL RELATION {k INTEGER, v INTEGER} KEY {k};
with the following value
 r := RELATION {
   TUPLE {k 1, v 1},
   TUPLE {k 2, v 2}
 };
issue the following update:
 UPDATE r WHERE k = 1 (k := 3, v := 3);
Does it seem appropriate to speak of the value of "the" tuple where k = 1 changing state when there is no longer a tuple with k = 1?

Note that to track such updates under, say, a PublishAndSubscribe model, you would either be forced to create artificial "tuple IDs" in violation of the RelationalModel, or issue two notifications, one representing deletion of a tuple where k = 1 and one representing insertion of a tuple where k = 3. Thus, there is no tuple-level change of state. There is only RelVar-level change of state.

You could also rename a RelVar or table with a DDL command, or even change the URI (e.g. IP and Port) used to access the Database with a higher level configuration option. This is not a problem for 'identity', but is an issue of identity being open rather than opaque. (See ConceptOrientedProgramming for a variation of OO focusing on open identity.) You might ask how PublishSubscribeModel deals with such changes in open identity - whether it insist its target is deleted as opposed to attempting to forward the observer's attention as to its new identity. Which semantics are available depends upon implementation.

{From an implementation standpoint, an RDB still faces the same issue as file names. For one, a copy-and-delete affects the performance characteristics. And there's the issue of existing concurrent users who may want a consistent snapshot of state as it was just before rename or "key" change. It may not be practical to always just make a copy for the snapshot, especially for big data items, but rather keep a pointer to the "original" if large but non-key info is being referenced. In practice, there seems to be some nice benefits in using the internal address name-space if it exists. It's a convenient "cheat" on "pure" identity, and has some roughly real-world counterparts, as the cat cage analogy above illustrates. - t}


EditText of this page (last edited September 19, 2013) or FindPage with title or text search