Relational Database Table Rows Have No Intrinsic Object Identity

(DeleteMe: this page is intended as a more clearly-named refactoring of the discussion whose raw thread is on RelationalHasNoObjectIdentity)

A row in a table in a relational database is not an object in the sense of ObjectOrientedProgramming, because it represents only state and not behavior. Therefore a row in a table in a relational database cannot have intrinsic object identity as implemented in numerous ObjectOrientedProgrammingLanguage's (typically by the "==" comparator). To an application programmer using an ObjectOrientedProgrammingLanguage, the true identity of an instance of a class in the address space of the process running the program is derived from the memory location where that instance's state is stored. See the JDK 1.3 JavaDoc for java.lang.System.identityHashCode(Object object) and java.lang.Object.hashCode(), for example. This identity is not a function of the values of any instance variables of the instance. A row in a relational table is identified by the values of its columns and has no other intrinsic identity to the application programmer.

A row (tuple) is an immutable thing; a ValueObject.

If rows are immutable then why can I change the values in them? Sleight of hand. What you are really doing (if ChrisDate is to be believed) is creating a new relation - which is the same as the old relation, except with a modified copy of the row in question. In practice, this mutation can be done in place as all references to the relation in the RelationalModel are via the RelationalVariable; but it's useful to think of RelationalVariables as the only mutable things in the database. See Costin's better explanation below.

But what of the pair of (RelationalVariable (table name), CandidateKey) - that does describe a unique entity within a database that can a) change over time (as the RelationalVariable points to new relations as a result of transactions), b) may have a limited lifespan as tuples are inserted into/deleted from relations, and b) is unique within a database.

In the OO world, consider the difference between a variable pointing to a ValueObject (such as an integer), and the ValueObject itself. The value object itself has no identity; but the variable certainly does. The complication occurs when variables are themselves contained within objects - ReferenceObjects; this is where ObjectIdentity comes into play.

Some feel that a database table row's lack of intrinsic object identity is one of the problems that partially comprises the ObjectRelationalImpedanceMismatch, because they feel that intrinsic object identity is sufficient enough to leverage in constructing useful applications, without requiring the artifice of adding additional attributes to facilitate ObjectRelationalMapping. Others contend that this lack is inconsequential, because intrinsic object identity is not sufficient to identify a "real-world" object or entity to a software application or its users.

Contributors: MarkAddleman, CostinCozianu, RichardHenderson, RandyStafford

</refactoring>

Relational primary keys usually have a "presentable" form; something that can be printed on paper. This makes it easier to discuss and communicate, especially across platforms or programming languages. How is this a bad thing?

A ValueObject can only exist where there is referential transparency - having a reference (any reference) to that object is the same as possessing that value. Table rows are referenced by (RelationalVariable, CandidateKey). There is no referential transparency as to the remainder of the cells in that row. Therefore rows are not ValueObjects?, and attempting to categorize them as such is a bad abstraction... at least not without historical databases where the rows are referenced by (RelationalVariable, CandidateKey, Time).

The STATE of a row IS a value. (Not a ValueObject. A Value.) However, the state of ANY cell is always a value. (A cell is the abstraction of a service to store one value, and is fundamental to assignment.) Attempting to categorize rows as ValueObjects? would be kin to saying: "My C++ program is functional. Every time I perform an assignment, I'm logically taking the entire program and replacing it with one that is the exact same as the previous program, except with the mutation in the value I assigned and in the 'next action' pointer."

The argument above states that "If rows are immutable then why can I change the values in them?" is "Sleight of hand." But the whole argument is sleight of mind. The mutation aspects of DataManipulation fall outside the RelationalModel, but do and must exist. What ChrisDate presents is a useful lie: saying that the RelationalVariable is the only thing that can be mutated is a valuable paradigm when it comes to designing table-grain security and concurrent transaction semantics. However, what is -logically- happening in the end is that the cells associated with a row-reference are, in fact, changing.

Database rows are *relationships* between objects, not objects. Therefore, they don't have object identity, because a relationship describes the *public* properties of a collection of interacting objects. In contrast, an object describes a single abstract *private* piece of data in a storage resource, where the resource itself must be identifiable despite the abstractness, thus object identity is needed. Rows do not need identity, since they are relying on the identity of every object referred to by the key fields of the table (Example: a 'person' would be a good class of objects. It would not be a good database table, since you cannot store people in a database(!) {Although I think Hitler tried it}. You could store information about 'friendship' between various people (say, customers) in a database, so "customer_friends" would be a good table name. The 'customer_id' key in the table would be a reference to one person. The table would also have 'friend_id' key, which would refer to the customer's friend, which is another person. But neither 'customer_id' nor 'friend_id' describes the actual person. Friendship is not something you can describe with objects, because there is no natural object identity that you could associate with friendship. However, an object could easily represent a person. In this example, closest analogy as relational tables to a person object would be a table 'personal_data', which would describe those aspects of (one) person's data that are available for public use. But there is no way that table could include the information needed to support the person for his own private tasks (say "password for his bank account"), because those are not intended to be available for general public, the person himself must control their use. ). There can be more than one key field in a relational table, thus each row refers to more than one object. Objects cannot be usefully stored in relational databases, since object data is private [only accessible by the methods of the object], whereas all data in relational databases is public, accessible by the world at large (anybody using sql tools). Relational databases only describe public aspects of the stored data, whereas objects describe the (internal) resource.

I disagree with the characterization that tables are more "global". See DatabaseNotMoreGlobalThanClasses and GateKeeper.

There is no globality at issue here. The distinction is between interfaces and internal implementation, not between global interface and local interface. The private state of an object is not about its interface (object contains conversion from private representation to public interface). The reason why object identity in OO is important is because it creates a distinction between 'private representation' and 'public interface'. The private representation is visible in the interface as just the object identity property, nothing else.

Q: If rows are immutable then why can I change the values in them?

A: You actually do not. In the RelationalModel (and in its approximation, SQL implementation) the UPDATE operator will change the value of the table (more specifically, the RelationalVariable), by removing the tuples corresponding to old values and inserting values corresponding to new tuples by the specified transformation. From a logical point of view, a new table is created; and the RelationalVariable updated to point to it. The old table, depending on the semantics of the RDBMS, may be discarded at some point (see below).

Let's look at a practical example

  UPDATE employees SET salary= salary*2 WHERE employee_id=1001

And say that the database contained the tuple (1001, 100K$), then the execution is logically defined to mean:

 EMPLOYEES := EMPLOYEES  MINUS {(1001,100K$) } PLUS {(1001,200K$)}

Some databases obviously optimize this operation to acquire a row lock, then update the physical image of the old tuple "in place". This is an optimization/implementation detail not very relevant for the logical model, the relational model only needs to check that such optimizations are available and reasonable for implementors. Other databases (most notably PostgreSQL and Oracle), may create a new version of the physical row used to store the tuple value while possibly allowing concurrent transactions to access the old physical row with the previous version.

All these implementation aspects have no relevance in the relational model.

Essentially a FunctionalProgramming model.

Under pure functional, when a new speck of dust lands on the Earth from space, the old earth is deleted and a new one that weighs 0.000000000000001 pounds more is created? Weird, but then again quantum probabilities probably still take the cake in that category. -- AnonymousDonor

I see the attraction of having a functional core to the relational model. I also think it seems silly. There is no difference between mutating a row, and creating a new table with a mutated row, and replacing the old table. It has the exact same implications. So why not accept that databases are about state, and state changes? This is like drawing a distinction between integer addition in C, and the 3 lines of assembly require to perform the addition. From the C abstraction level, there is no distinction. Can anyone identify a benefit to modeling state mutation in this roundabout way? If there was versioning or something, you would save the old state, but there is no reason those details should be in the basic logical model. I'd like to be enlightened :) Thanks. - Steve Wedig

The relational model does include state - relation variables (or, speaking loosely, "tables"), which have a relation value. The rows are part of the relation value; there are no separately distinguishable row variables in the database. You could introduce them, but it would complicate the model, and constrain implementations of it, unnecessarily. Row variables are not part of the relational model for some of the same reasons that the C language specification doesn't talk about register allocation in addition expressions. -- DanMuller

Oh, you have row variables, Dan. A simple tuple of (Relation Variable, Candidate Key) is a reference to a row that may vary over time (including in presence). They aren't independent variables (as in 'independent' of the relation variables), but they certainly exist.

I'm not clear what you mean here. In the definitive description of the RelationalModel, TheThirdManifesto, there are no persistent tuple-valued variables. There may be transient tuple-valued variables. The only persistent variables are relation-valued variables. A database is an identifiable collection of relation-valued variables. -- DaveVoorhis

Perhaps you are confused on what constitutes cell (a service that stores a value) as opposed to what constitutes variable (any time-varying quantity or value). The RelationalModel specifies variables, not the underlying implementation as 'cells', and it is quite possible to use one cell to store everything (e.g. with a record containing all the relations). The relation variables would, then, be referred to roughly as myBigCell.RelationName?. Row variables would be (Relation Variable, Candidate Key). The variant properties of a row variable would be: (a) whether the row exists (i.e. whether the key is found), (b) the state of the row if it exists. The variant properties of the relation variable would include its size and the set of relationships it contains. Both of those are persistent variables (or at least as persistent as myBigCell). Neither sort of variable is a cell. And the RelationalModel is not compromised.

I am using the term "variable" as used in TheThirdManifesto -- where the term "cell" does not appear -- and in much of the literature. You seem to be referring to your own system and terminology, hence the confusion. Dan is implicitly referring to the commonly-known descriptions of the RelationalModel and related implementations. Therefore, you are incorrect in saying that he has row variables. It would be correct to say, however, that you have row "variables." -- DaveVoorhis

Variable is variable. The version I use is correct in TheThirdManifesto and most of literature. The fact that variables very often directly reference cells (thus creating a tight coupling in normal use) doesn't mean that you're correct to confuse the two concepts or insist one is the same as the other. TheThirdManifesto doesn't mention "cell" and, indeed, should not mention cell because the storage service for the Database (including level of persistence, and whether there is one big cell or a million tiny cells - one for every component of every row) is an implementation detail. The notion that it TheThirdManifesto is insisting you use one "variable" (whereby you apparently mean cell) for each relation is absurd. Variable is variable. Cell is cell. They are fundamentally different abstractions, related by the fact that the contents of a mutable cell is, in fact, variable. Any time you have an independent variable, you necessarily will have an arbitrary number of dependent variables that exist in reference to that variable and others. They are no less variable for being dependent. However, they aren't the sort of variables you can twiddle. (Of course, when dealing with constraints for consistency, you don't even have full twiddle-powers on variables in the relational model.)

I think we're circling around some sort of ViolentAgreement here. I was specifically -- though perhaps awkwardly -- trying to disambiguate the use of "variable" in the way Dan used it and as it appears in TheThirdManifesto (i.e., an explicitly-declared identifier-named slot or cell) vs "variable" as you used it in your reply to Dan, for which the term "mutable object" (or some equivalent) is often used in order to prevent confusion over the use of the term "variable".

... but values are never mutable. [Oops. Typo on my part. I meant to write "mutable object." I have corrected the above. -- DV] Cells are mutable. Properties of cells are variable. Most such properties can be explicitly declared and identifier-named. One such property is the value held by the cell, which is the most common one to reference. That IS how variable is used normally. Only problem is that young CS students who don't take the theory path don't really get into discussions of cell vs variable, so they confuse the two.

In effect, what was written in your exchange with Dan was something like this:

There are no citrus fruit in the relational fruit orchard.
Oh, you have citrus fruit in the relational fruit orchard, Dan.

Replace "citrus fruit" with "row variables", and "fruit orchard" with "model".

Given the context, it would appear that what you actually meant was the following:

There are no oranges in the relational fruit orchard.
Oh, you have lemons in the relational fruit orchard.

Replace "oranges" with persistent tuple variables, and replace "lemons" with persistent variant tuples. Dan was referring to "variables" as explicitly-declared, identifier-named slots or cells within the database. You were referring to "variables" as non-identifier-named, non-explicitly-declared variant entities within the database.

However... In the RelationalModel, there are in fact no persistent tuple "variables" using either definition of "variable." Persistent tuples exist only by virtue of being part of the structure of relations stored in persistent relation-valued variables. You can only change a tuple in a relation stored in a relation-valued variable by assigning a new relation to that variable. Implementations of the RelationalModel may optimize the process by internally replacing one tuple with another, or even by changing attributes within a given tuple. However, this is purely internal, and as invisible to the user as the in situ bit-twiddling that may go on to optimize integer arithmetic in, say, C. Conceptually, an integer is an immutable value. Likewise, in the RelationalModel, a relation is an immutable value. No relation has slots, variables, cells, or any other mutable or variant structure - any more than the integer value "3" has slots, variables, cells, or any other mutable or variant structure.

Aye. But a slot (cell) with a relation DOES have variables. As does a slot containing a record containing relations, or a slot that carries an integer. I agree that relations are values. That doesn't mean a relation variable is a value, or that any subreferenceable properties of that relation variable are values.

Of course, relational languages usually provide convenient short-hand notations for assigning a new relation to a relation-valued variable -- such as the INSERT, UPDATE, and DELETE commands in SqlLanguage and TutorialDee -- that give the appearance of mutating relations at a tuple or attribute-of-a-tuple level. As I've mentioned, they may even be implemented that way for optimization reasons. However, the remainder of the system must behave as if the cell/slot/variable containing an immutable relation was assigned a new immutable relation.

-- DaveVoorhis

The type for a row-variable isn't a persistent tuple, true. It's a persistent 'Maybe Tuple'. (I.e. Just Tuple | Nothing). After all, if the row isn't in the relation, then that case must be handled. Logically, setting that row-variable to 'Nothing' would remove it from the relation, while setting it to 'Just Tuple' would set it to said tuple (with the constraint that you can't change the candidate-key components, and must maintain any other database consistency requirements).

Ultimately, what looks like a 'cell' depends on how you manipulate the variables. Consider that it's possible to implement a virtualization of addressable shared memory (e.g. RAM) atop a Relation (very easily, as I'm sure you know). From that perspective, every row contains a mutable cell and can be referenced by (Relation Variable, Address). The operations to change that mutable cell just happen to utilize a DML. If that address isn't in the relation (e.g. hasn't been added, or has since been removed) then any attempted access would return that the appropriate tuple doesn't exist and perform some reasonable action like generating a memory-access exception. It doesn't really matter whether the entire relation is, itself, stored in 'one named slot' or otherwise. That doesn't change the logical access pattern, or the logical results of such access. Logically, the row-items are both logically mutable cells and variables.

Every referenceable property of a cell constitutes a 'variable', including the value-state of the cell. Every independently mutable property of a cell can, itself, be used as a cell (because 'cell' is just an abstraction of a service to store and recover a value). Even if you insist that DanMuller's definition of variable is the correct one - a named slot that stores a value (the most common unity of 'variable' and 'cell'), you still have named slots that store values at the tuple-level of a consistent relation.

If all that was said is that the RelationalModel doesn't have tuple-variables independent of the relations, I'd wholly agree. You can't add rows to a relation by creating a variable 'r' and assigning to it a tuple-value... well, you can't do that and still have the RelationalModel.

Argh. Can I suggest, most politely and humbly, that you read TheThirdManifesto? -- DaveVoorhis

As pointed out, one can talk about the parts of a relation being "dependent variables". Adding the qualifier "independent" to my original statement is a valid suggestion. My point might better have been that, although they may be implied by the model, they are not an explicit part of it, and really add nothing to an understanding of the it. Their implied existence is acknowledged by the common forms of the data modification operators. The danger in focusing attention on them lies only in that many programmers have a fuzzy concept of "value" and "variable", engendered by common (mis)usage in daily work with currently popular languages - leading to the sort of misconceptions that this page seems intended to address. It's rather a pity to waste this much verbiage on the concept of tuples-as-variables. Would anyone object if I reworded my statement slightly and removed this long digression? -- DanMuller

No objection from me. -- DV

I also do not object to refactoring the discussion. However, I disagree with the premise of this page; as such, I don't consider this so much a 'digression' as a useful conflict.

Viewing relations, tuples, even whole database-states as 'values' is... doable. One may, logically, view the whole database-state as being replaced every time one relation is replaced (whereby 'relation' is identified by some unique name in the database). One may logically view the whole relation-state as being replaced every time one 'tuple' is replaced/removed/added (whereby 'tuple' is identified by some candidate key... possibly the whole tuple). One may logically view the a single value of a non-key cell of a tuple to be replaced every time one bit is twiddled in its representation. Etc. In the reverse direction, one can consider the entire world to be replaced with another the moment the database is updated. Etc.

These views, however, are not relevant. They are misleading. They do not change what is fundamentally happening, or how 'things' may be fundamentally referenced (giving them object-identity). They do not provide any value. If you can uniquely reference something, you have object identity. If that object is mutable (i.e. has properties that can change), you have a variable. Period. That's true for physical reality, and it's also true for variable relations and tuples within databases. Further, references to things, in any language, may be done by the properties they possess (e.g. "my computer"), including the possibility of uniquely attached names (e.g. SSN, cell phone number, etc.) The same is true for both physical reality and for relations and tuples in databases. Often, one object can be identified by many different things... as is the case with a tuple that may be identified by many candidate keys.

Don't get carried away with promoting TheThirdManifesto like some sort of gospel. I don't have the book, but I did read the paper a while back. The authors promote a more 'pure' Relational approach and a paradigm that is, like most paradigms, a UsefulLie. Paradigms are views, not reality. The paradigm that a cell associated with the RelationVariable? is the only thing that is changing is not more valid than the paradigm that the whole universe is being updated, or the paradigm that individual tuples referenced by candidate keys are being altered. No paradigms change the fact that a tuple that can be referenced by its candidate key is now present/gone/updated/etc. Relational database table rows DO have object identity. And, like all objects, they have properties, such as being present or absent... and, if present, then potentially having other related values. Date's paradigm won't change that.

"These views, ... do not change what is fundamentally happening, ..." Say rather, "this way of describing a model does not change how the model may be implemented". There is, after all, nothing "happening", fundamentally or otherwise, in a model. :) (A similar comment applies to "Paradigms are views, not reality". We're talking about a model, which is arguably different from a paradigm and is absolutely distinct from reality or implementation.) A functional description is not a gratuitous lie, but a way of describing the model that simplifies many types of reasoning, omits irrelevancies, and can, with only a simple conceptual addition, be used as a basis to describe stateful behavior when needed. (I agree that it's a UsefulLie prior to adding stateful behavior. It's not nearly so useful afterwards. As far as 'happening', I was referring specifically to the state update of the database in response to a DataManipulation request.)

Individual tuples do not have object identity in the usual sense of that phrase. "Object identity" in most object models is a concept independent of the type of the object, and closely tied to the ability to reference an object independently of any other knowledge of the object. To be more specific, (one sort of) mutable object is a cell, its identity is its canonical name and can itself usually be treated as a value. A tuple's "identity" might be said to be synonymous with a particular value of a (not "the"!) candidate key. But a key value hardly seems synonymous with (or even a representation of) an object identity. A key value can have a very different domain in different tuples (being composed of any number of values of any domain/type), and two tuples in entirely different relations could have identical key values, while clearly not being "the same object". So although you might say that a tuple's key values resemble an object reference in some ways, they are not the same thing.

Any (relation-variable, candidate-key) pair will uniquely identify a tuple-object in a relation, so every such pair will suffice for object identity. And, despite intuition making it 'seem' otherwise, such a pair is mathematically equivalent to the common understanding of object identity' in both object databases and physical reality. Object identity is a means to uniquely identify an object. Objects are things that can be uniquely identified by one or more sets of observable properties (e.g. "my computer", "the bookshelf in my room", etc.) or by attached names (e.g. "me"). The same is true for tuples-objects - identified by relation-variable and candidate-key. The same is true for relation-variables, identified by name and a database-identifier.
It'd be more correct to say that tuples in a relation don't have object identity by a normal person's intuition of object. But the very concept of object is, itself, a very UsefulLie in many senses; it isn't possible to prove deductively that one object is separate or independent of another and thus should have unique identity. It isn't possible to deductively prove that I'm separate from you, or that you're separate from the Internet. The very idea of object is just a useful way of looking at the world - another paradigm. The abstractions that lead to the concepts of object and object identity apply very directly to tuples in the Database in every sense. The 'object identity' does refer to a tuple-object in mutable data space rather than a physical object in physical space, but it's just as valid. Perhaps it takes a solipsist to appreciate it.

I suppose you could try to define a "tuple object reference" consisting of both a candidate key value and the name of a relation variable that contains the tuple. But that that opens another rat's nest. One could easily come up with object references that look perfectly valid, but that do not reference a tuple in actual existence - which seems to indicate that this is really the name of a dependent cell that, if it exists, might contain a ... what? Tuple value? Or a tuple object reference, which we still haven't defined? Or you'd have to expand the usual usage of "object reference" to include references to non-existent objects.

One may always come up with object-references that look perfectly valid but that refer to absolutely nothing. E.g. "my DeLorian?". Anyhow, I did address this, above. The value of a tuple-variable referenced by a tuple-identifier is a "Maybe Tuple" (in the Haskell notation). That is "Just <tuple> | Nothing". "my DeLorean?" is a 'nothing', as it doesn't exist in physical space. A tuple-object may similarly be absent from data space. The variable, however, is persistent for the lifetime of the relation-variable... just as the variable "my first DeLorean?" is persistent for my lifetime (but I do need to add 'first' to handle the possibility that I get more than one).

Attempts to shoehorn relational theory into an object-identity point of view seem ill-advised to me, when doing so easily leads to such complexities and misunderstandings. A functional approach to the relational model is simple to understand and, so far as I've seen to date, sufficient.

... and dreadfully inaccurate. Functional is fine for calculations over values (including determining the next value that shall be held by the relation variable), but is inappropriate for understanding state manipulations and other behaviors, or what exactly 'state' is. For that, it's better to use an ActorsModel or ProcessCalculi, and model perception and manipulation from the perspective of an Actor. If, in the RelationalDatabase, you could never, ever change anything, then functional would be appropriate.

OTOH, it is useful to talk about the similarities between candidate keys and object references when discussing systems that use objects to represent relational data, or that use relational data to persist objects. Just don't forget that there are differences. -- DanMuller

"Don't get carried away with promoting TheThirdManifesto like some sort of gospel."

I'm not, and there are certainly portions that I question or disagree with. However, as it is the de facto reference for the RelationalModel since DrCodd's original "A Relational Model for Large Shared Data Banks" (http://www.acm.org/classics/nov95/toc.html), it is worth reading as a foundation for these discussions even if you disagree with some or all of its precepts. We're covering some well-trodden ground here, so at the very least using it as a common reference might save us some effort in typing up explanations. -- DaveVoorhis

The RelationalModel was pretty darn easy to understand... named sets of records with a DML is pretty simple. I've studied it in at least four different texts, including DrCodd's original, FoundationsOfDatabases (very nice), and FundamentalsOfDatabaseSystems (mediocre), plus a refresher from the Wikipedia page and this C2. They're all quite consistent. I've also put a great deal of effort and mathematical proof into understanding such things as data itself (WhatIsData), data manipulation (DataManipulation), perception, behavior, actors, communication, cells, emergent behaviors, models vs objects (ObjectVsModel), knowledge (KnowLedge), understanding, etc. Forgive me if I'm unwilling to buy another book to learn a very simple concept that I'm quite confident I already understand. (continued)

Indeed, on the surface and for most purposes, the RelationalModel is very simple, which is one of its strengths. However, the devil is in the details -- this debate is about one of the details -- and TheThirdManifesto in particular deals with these at some length. AnIntroductionToDatabaseSystems does as well, though with somewhat less emphasis on the relationship between types, values, and the RelationalModel; but more on the fundamentals which are assumed to be understood in TheThirdManifesto. I agree that FundamentalsOfDatabaseSystems is mediocre (though I vaguely recall it has a nice section on query optimization, but maybe I'm thinking of something else), and I haven't read FoundationsOfDatabases. But that isn't the point. The point is that some of what we're discussing here is either dealt with explicitly, or well implied, in TheThirdManifesto. I'm not sure it's covered elsewhere in as much detail, or with as much justification. For that reason, I recommend it. -- DV

I think this is a debate regarding a component far more fundamental than that of the RelationalModel. The very idea of storing a 'value' into a 'cell' for later retrieval by an 'actor', the concept of 'variable' from the perspective of those that can measure them and change them, etc. - these are what matter. Along with what 'object' and 'object identity' are. And while I haven't read what D&D have to say on the subject, I'm quite confident that these things aren't covered rigorously. But, since it does come with your recommendation, I'll place an order on Amazon. All those arguments of Date that I have read lack much rigor (e.g. the whole 'objects are domains' argument), but I've hesitated to combat these directly because I've received the arguments through proxy. At the very least, having the book will give me a real target.

The issue isn't that I don't understand the RelationalModel; it's that the particular paradigm promoted for understanding a state update to a relational variable is not useful to an understanding the either event [??] or how it relates to the world in which the database resides. It's a UselessLie. If you believe it to be a UsefulLie, then, please, inform me of its utility. Tell me what you gain from it as opposed to viewing the entire Database as being replaced to update one relation, or viewing the entire world as being replaced to update one database? If you cannot, then why do you propose there is utility in that paradigm over that of manipulating tuple-objects in data space? Tell me from the perspective of an observer of the world. Tell me from the perspective of the database manager - the actor sending the low-level signals to physically update the database. Tell me from the perspective of the world and physical environment. Where does one gain value from this paradigm? I think nowhere. I currently believe that this paradigm ought to be rejected, not embraced. It doesn't help people understand, and shouldn't be part of explanations except in explaining the paradigm itself.

Good question. This is one that I feel is answered well by Chapter 1 of TheThirdManifesto, plus a bit of reading between the lines, but I'll attempt an answer. First:

From the perspective of the user, the database manager, and the casual reader, there is no harm in viewing update operations as mutating relations, i.e., manipulating tuple-objects in data space, or even altering individual attributes within tuples within relations.
From the perspective of the implementor of the RelationalModel, he or she will often mutate relations in situ. This is either done as an optimization - to avoid the expense of copying relations as a result of an update - or as a naive (or deliberate!) approach that regards relations as mutable containers instead of constant values.

In other words, from a purely pragmatic point of view we can accept the perception that tuples are mutable, and even implement it that way, as long as the implementation doesn't violate the underlying theoretical model.

Since the RelationalModel is a theoretical model, we can insist on greater rigour than would normally be considered on a pragmatic basis, and thereby predictably and formally prove the behaviour of the model under all reasonable circumstances. That theoretical rigour has an important impact on our pragmatic users, data managers, and casual readers whether they know it or not - it means they can trust correct implementations of the model to be consistent and behave predictably even under novel conditions. However, in order to make provability manageable, we have to constrain the model. Hopefully, the theoretical constraints will not unduly limit pragmatic implementations of that model.

A constraint that the RelationalModel imposes is that attribute values are immutable. We could conceivably extend the model to handle mutable objects, but this would unduly complicate the model. Even though the RelationalModel is orthogonal to persistence, it is typically used in contexts where persistence is significant, i.e., to implement database systems where the values are "frozen", so to speak, when stored in the database. Therefore, there is no reason - at least in the usual interpretations of the model - to consider mutable objects in a database context.

However, the RelationalModel specifically does not constrain what types of values are stored in the database. Any value can be an attribute of a tuple. (There may be other constraints on the value, e.g., that it can be tested for equality against another value of the same type, or that the values belonging to a type must be ordinal, but these are not germane to this discussion.)

Because a goal of the RelationalModel is that it not unreasonably constrain the types of value that an attribute may contain, it makes sense that a tuple or even a relation can be an attribute value. This allows us to have nested tuples and relations. Although it is rare to require these, there are circumstances where they are appropriate, and the alternatives would be awkward or inelegant.

Since we've already established that attribute values must be immutable, it implies that tuples and relations must be immutable. Otherwise, tuples and relations could not be attribute values. If that were true:

The lack of tuple or relation attribute values would be an undesirable constraint on pragmatic implementations; or,
We would have to consider tuples and relations as a special case (well, more than we already do), which would be undesirably complex and inelegant; or,
We would have alter the model to allow for mutable attribute "values", which would be undesirably complex; or,
We would have to allow both mutable and non-mutable tuples and relations, which would be undesirably complex and inelegant.

Therefore, tuples and relations in the RelationalModel are immutable. The only mutable persistent object in a theoretical true relational database is a relvar, aka a relation-valued variable. The only thing we can do to it is assign it a new value, i.e., replace its current value -- an immutable relation -- with another immutable relation. However, as we've seen, this does not constrain pragmatic implementations or casual interpretations of the model, but it does sufficiently constrain the theoretical model itself so that it is not unduly complex or inelegant, and therefore its behaviour is easily provable. That means pragmatic implementations which correctly implement the model will be provably consistent, and demonstrate behaviour that can be predicted by the model. Therefore, our users and data managers can reasonably trust the system.

-- DaveVoorhis

The state of the whole database at a given instant is also a value. You can store a whole database-state in a single attribute value. I understand this well enough. So why did you not you extend your argument just a little bit further? Why did you not say: "Well, I could store the whole state of the database as a single attribute value, so 'relation variables' aren't really mutable, either... the whole database state is an immutable value. Instead, we should use only one variable that carries the whole database-value." I think you had a bit of bias to stop where you did.

It wasn't a matter of bias to stop where I did. I was simply explaining the RelationalModel as it is, rather than how it could be. My impression is that the keepers of the RelationalModel decided what is mutable or not based on the argument I've given above, rather than (say) some notion of functional purity. That said, you're not the first to note that a database like the following (using TutorialDee syntax)...

The problem is that the argument you presented is an invalid argument. The premises (values are immutable) do not lead to the conclusion (properties of variables are immutable). In this particular case, 'properties of variables' includes such things as tuples in the relation, and values in those tuples. Invalid arguments ought be rejected... not accepted, embraced, and taught like they have some greater value than they actually possess. The 'keepers of the RelationalModel' have no inherent rights to 'decide what is mutable or not', and have far less rights to use invalid arguments to promote said position. Further, if they're going to decide something is immutable, they ought to be able to ensure that it actually is immutable rather than calling inherently mutable structures 'immutable' simply because they decided it. It's important to realize that state, at any given instant, is always an immutable value. That, by itself, provides all the rigor and constraint that the relational model needs.
It's a model, therefore the designers of the model have every right to define it as they see fit. In particular, treating the model as consisting solely of values and variables (in the usual programming sense) has dual benefits of clarity and simplification. (There are no Model Designers. There was one designer, and a few usurpers. Further, even if you give D&D every right to define things as they see fit, they can still be wrong where wrong is defined by the creation of logically inconsistent model. )
- Simplification -- for example, in that checking constraints need only be considered at the point of variable assignment, rather than under a variety of mutations to mutable objects... (Constraints checking and proofs can be handled either at the level of the manipulation language, or after each manipulation. It exists outside and independent of the RelationalModel. )
- Clarity -- for example, in that the notion of variables (in the usual programming sense) and values is clear and unambiguous, whereas the notion of a mutable object (quick: define "object" and get agreement on your definition) is complex and subject to ambiguity... (It's easier to focus on the 'mutable' aspect, which is well defined (if not well understood). Anyhow, I maintain that no real clarity was gained... nothing that shows up in proofs.)
As already noted, this has no impact on the users of implementations of the model, or even on broader models that might incorporate the RelationalModel -- these may harmlessly regard the apparent mutables as truly mutable. Within the RelationalModel, however, it affords simplicity and clarity, so it is a UsefulLie. -- DaveVoorhis

 VAR Customers REAL RELATION {CustName CHAR, Address CHAR, Phone CHAR} KEY {Name};
 VAR Orders REAL RELATION {OrderNumber INTEGER, CustName CHAR, Date DATE} KEY {OrderNumber};

...could be represented as follows:

 VAR MyDatabase REAL RELATION {Customers RELATION {CustName CHAR, Address CHAR, Phone CHAR},
                               Orders RELATION {OrderNumber INTEGER, CustName CHAR, Date DATE}};

Of course, why stop there? Databases (as defined above) can be attributes, too:

 VAR MyUniverse REAL RELATION {
      MyDatabase RELATION {
           Customers RELATION {CustName CHAR, Address CHAR, Phone CHAR},
           Orders RELATION {OrderNumber INTEGER, CustName CHAR, Date DATE}},
      YourDatabase RELATION {
           Fish RELATION {Species CHAR, FavouriteRecipe CHAR, BestBait CHAR},
           Bait RELATION {Name CHAR, Refrigerate BOOLEAN, UseHook BOOLEAN}}};

And so on. However, no matter how much we continue this hierarchy, the RelationalModel defines that the "root", as it were, is going to be relation-valued variables -- even if there is only one and its value is only set once. For all intents and purposes, since the above structure is fully encompassed by the RelationalModel, we might as well consider that collection of (one!) relvars to be "the database" and so we're effectively back where we started. -- DaveVoorhis

Err... that wouldn't be quite correct. As you've defined it, the 'MyDatabase' will be a relation that is a set of tuples containing two relations {(Customers:<Relation>, Orders:<Relation>),...} However, only ONE such tuple is actually a database. If you have more than one, you won't be able to identify the actual 'MyDatabase'. What you actually needed at the top level was not a relation variable, but rather a cell carrying a record-value (a single 'tuple' in the relational terminology). Further, 'MyUniverse' is intended to be a universe of Databases. Instead, what you gave was a relation containing a large set of (MyDatabase, YourDatabase) pairs, as though every one of MyDatabases must, inherently, be paired with one of YourDatabases. Better would be to support a universe with: MyUniverse is Relation of {Id: Database-Identifier (KEY), meta: Database-Metadata, value: (Record of name->Relation)}. And, ultimately, 'MyUniverse' uses a relation only because what it actually needs is a set, and relations are sets. Are you still confident that the top-level here ought to be a relation-valued variable? Even for MyDatabase?

No, it's correct. MyDatabase is presumed to be a relation that contains a single tuple. We could supply a constraint to enforce it, if we wished. TutorialDee provides a TUPLE FROM operator to obtain the tuple value from a relation of cardinality one, because these occur frequently as a rough analogue to the notion of a singleton. The FROM operator is used to obtain an attribute of a tuple. Thus, we can obtain the Customers relation via the following expression:

 Customers FROM (TUPLE FROM MyDatabase)

Or...

 Customers FROM (TUPLE FROM (MyDatabase FROM (TUPLE FROM MyUniverse)))

Obviously, this is awkward syntax. Should such a schema be desirable, the associated RelationalLanguage would presumably provide a clean short-hand for the above -- something like:

MyUniverse.MyDatabase.Customers

Of course, some other schema design might warrant multiple tuples for MyUniverse or MyDatabase -- say, to implement multiple versions -- for which we would have to provide an appropriate attribute for selection purposes.

-- DaveVoorhis

Time would be a good component for historical databases. However, the MyUniverse really should allow for the introduction and elimination of databases over time.

Absolutely. That's the subject of temporal databases, which are another topic. HughDarwen is arguably the expert on these. -- DaveVoorhis

What is the difference between a relation constrained to have exactly one tuple and a cell that has a value that is a tuple? Really? And if there isn't one, then are you insisting that a 'relvar' must be at the top level only because you can... eh... make it fit?

I am insisting that a relvar must be at the top level because it is required by Relational Model Prescription 16, that a database shall be a named container for relvars, which is a direct result of DrCodd's "Information Principle", that all information [in a database] is represented by data values in relations. Feel free to violate this, but the result would, by definition, not be the RelationalModel. -- DaveVoorhis

A 'named container for relvars' can be implemented in quite a few different ways. One of them is as a cell that carries a record value, for which each of the labels identifies a relation. In that case, the database is literally a container (a cell) carrying named relations that are referentially variable (i.e. to actors viewing the cell). Where am I violating anything? Your version was a database carrying a whole set of your databases in a relation, but scaled back to exactly one database, which is unnecessarily indirect when the goal was just one database (in particular: MyDatabase).
Unless your cell & record value is a relation, it is a violation of DrCodd's "Information Principle." This has a practical basis: Only one set of operators is required to access and manipulate everything in the database and/or visible to the database management system, including meta-data. Indeed, it implies that accessing meta-data, data, and anything else (e.g., system status, environmental values, etc.) is done via the same fundamental set of operators, and via the same fundamental data structure, the relation. The overall simplicity and physical independence afforded by the Information Principle outweighs the occasional need for awkward indirection, such as extracting the singleton tuple from a relation of cardinality one. -- DV
No, it would not be such a violation. MyDatabase is not supposed to be information IN a database. MyDatabase IS the database. If you wish to put a database inside a database, then simply use the MyUniverse approach, for which a relation is perfectly appropriate.

The fact is that, despite the argument you provided that this is for rigor and constraint on the model, you gained no rigor, and you gained no constraints. You haven't made it any easier to prove any qualities about the behavior of actors utilizing the relational database - those mutating it, those reading it, etc. You haven't made it any easier to prove qualities of the database under such manipulations - e.g. consistency, accuracy. Where you can make use of the argument that relation-variables are the basic conceptual 'cells' manipulated for purposes of concurrency control involving multiple writers... but concurrency control is not part of the RelationalModel. For that, you need some model of concurrency and actors, and an approach or theory for concurrency-control... and once you have a full theory for concurrency and transactions, you won't actually need to distinguish anything at the arbitrarily chosen relational-variable level.

The behaviour of actors utilising the relational database is outside the domain of the RelationalModel, and therefore irrelevant to the RelationalModel. I can, in theory, prove the RelationalModel to be reasonably self-consistent, which is as it should be. Feel free to define a meta-model that incorporates both the RelationalDatabase and its clients. As for concurrency control and the like, that is usually defined at the level of a storage engine, below (and effectively outside) the RelationalModel. There are models that endeavour to model the system from the highest abstract level down to the bits on disk platters -- ExtendedSetTheory, for example -- but I'm not aware of any of these that define concurrency, either. That generally seems to be considered implementation-specific. -- DaveVoorhis

Not entirely. Actors are the only things with behaviors. (Well, technically, anything with behaviors is an actor.) If the RelationalModel supports the idea mutation to, say, relvars, it certainly must have some conception that something like actors exist to both perform and observe said mutation. Further, your arguments that things like the entire 'relvars are the only mutable items in a RelationalDatabase' are direct appeals to the notion of a system in which actors exist and are mutating and observing databases. Even further, my statement wasn't with regards to the RelationalModel itself - it was with regards to the notion that the argument you provided (relvars are only mutables) provides any rigor or constraint regarding (mutations to the) model. Clearly, to do such it must be providing a constraint on the actors or aiding in proofs of consistency while under (potentially concurrent) manipulations and observations by actors.
Anyhow, what is 'mutable' and 'immutable' must always be determined relative to the actors using the RelationalModel/RelationalDatabase - those mutating it AND those observing it. That's rather fundamental. If it can be proven that a particular observation will never change when repeated by an actor, that's an invariant property. If it can change, it's variable. If it can be changed by actors, it's mutable. If it can be changed independently of other properties/observations, then it may serve as an independent cell. What I'm saying is that the argument stating that 'relvars are the only mutable' things is a UselessLie. It is simply not true, and (further) is of no benefit to proving any constraints or providing any rigor to the RelationalModel.

Can you give me a statement of what you've actually gained, for real? I.e. a postulate or behavior you can deductively prove under your position that cannot be proven without it? I doubt it (because I know that, from an actor's viewpoint, this is just a paradigm, and paradigms cannot affect proofs), but you (and by extension Date) certainly deserve the opportunity. If there is no such postulate or behavior, then you've literally gained nothing - no rigor, no constraint, no utility.

I'm not clear what your point is here. Is this a specific criticism of the RelationalModel itself, or is it a criticism of the RelationalModel for not being an all-encompassing model of computation in general, or is it a prelude to a ParadigmPissingMatch? -- DaveVoorhis

I like the RelationalModel itself; it's not sufficient for all my own purposes (e.g. knowledge databases) but that isn't relevant to my position here.

What I've presented is more a criticism of the RelationalModel as it is being presented with regards to immutability and object-identity. The arguments you've provided are invalid and useless, and shouldn't have any support. You're presenting them by proxy from Date and Darwen, but if you've accurately presented your understanding of their arguments, then you shouldn't be supporting them. The entire 'it's not mutable because we said it is not' is very much an 'emperor has no clothes' situation; smart people are ignoring the very obvious - that non-key attributes of referenceable tuples are perfectly capable of operating as mutable cells from the perspective of ALL actors, including the DBMS. They are, instead, saying these tuples are 'logically immutable'... which isn't true... because Date and Darwen said so. Actually, the argument is more akin to: "Relation values ARE immutable, so Relations are immutable, so Tuples must be immutable." That's a correct argument. The problem is that "Cells ('variables' in common vernacular) are NOT immutable, so relations as referenced through cells by actors are not immutable, so tuples as referenced through those relations as referenced through those cells by actors are not immutable." This is also a correct argument. The emperor has no clothes. The argument that attempts to conceptually separate tuples from the relation-variable has no validity. Please, can we have some people speaking up about it rather than accepting Date and Darwen at their word and presenting/teaching their arguments as though they are counters to some rather basic truths?

Attempting to be a bit clearer: if something can be used as a mutable cell in every way, and 'cell' is an abstraction of a service to store and retrieve a value, then that something is (by definition) a cell. A cell is mutable. The value in a cell is not mutable (because values are not mutable), but properties of the cell (including the property of 'the value in the cell') are mutable. If you have a relation value, it is not mutable. If you have a cell containing a relation, it IS mutable. Further, other properties of a cell containing a relation are independently mutable... e.g. by placing a new relation in the cell with an extra tuple, one has changed the properties associated with the cell. Because you can reference (i.e. 'name') properties of a cell with a relation-variable by use of a candidate-key, those also constitute variables. Because you can manipulate the relation-variable cell in order to set those properties independently of each other, you also have (by definition) cells within the relation-variable. Thus, deductively, you have mutable tuples. These mutable tuples have properties of presence (Just <value> | Nothing) and properties associated with the '<value>' when present. These truths are independent of the RelationalModel; they are true because you have mutable cell carrying a set/relation. These truths are also independent of implementation; if the act of setting a cell associated with a tuple requires sending a DataManipulationLanguage? message to the whole 'MyUniverse', then so be it - that's an implementation issue. These truths are also independent of the nature of the 'cell' associated with the relation variable, which itself may be just an independently mutable property of a single, larger 'cell' associated with the database, or even with a 'universe'.

Date and Darwen and others cannot escape these truths by shutting their eyes and saying differently - that relvars are the only (<-- emphasis!) mutable variables. Further, they gain no rigor, no constraint, and no utility from even attempting to do so. Pointing out that relations are values isn't valid as an argument for their position - it's true, yes; it just isn't relevant. But, even if they've tricked you into believing them, utility and gain from their position must still be measured in real terms... e.g. a single theorem that can be deductively proven from their position that cannot be proven without it. You certainly have not presented such a theorem... only alluded that one must exist. Until you are presented such a theorem, you should know that an extra helping of skepticism is deserved.

As I see it, attempting to apply this restriction to a RelationalDatabase is logical impossibility without making the entire database immutable (and thus providing true referential transparency). Saying that this is part of the RelationalModel would make the RelationalModel internally inconsistent. If introducing such inconsistency is what the keepers of the RelationalModel have been doing with their spare time, we need to fire them and hire some new ones. If it's just a common misunderstanding of what the keepers have actually been saying, then they need to work on clarification.

The RelationalModel is a model, not an implementation. As such, the notion of immutability has no impact on the users of implementations of the model, or even on broader models that might incorporate the RelationalModel -- these may harmlessly regard the model's immutables as mutable. The restriction, therefore is only on the RelationalModel, not on a RelationalDatabase, as long as its behaviour is consistent with the predictions of the model. As I've stated above, within the RelationalModel, this UsefulLie affords simplicity and clarity. This is not even an issue of hypothetical theorems, merely one of simplicity and convenience, but these certainly facilitate provability. If you wish to define a RelationalModel that incorporates mutable objects, feel free to do so. I'd certainly be interested to see it, but I can almost guarantee that it will be more complex than the existing model without any increase in utility. -- DaveVoorhis

You say: "As I've stated above, within the RelationalModel, this UsefulLie affords simplicity and clarity. This is not even an issue of hypothetical theorems, merely one of simplicity and convenience." The problem is that it does not afford simplicity or clarity. It is NOT useful. If you insist it is, then prove it rather than just say it! And proving that it is useful damn well is an issue of theorems... real proofs, not hypothetical proofs - models such as the RelationalModel are designed to allow more rigorous reasoning, so restrictions on mutability and such must provide simplicity and clarity to this rigorous reasoning if they are to be of any utility at all. But the restriction as you've stated it is not of any utility. It's a UselessLie. Worse, if it were an official part of the model, then the model would be logically inconsistent because you'd have immutable mutable things. The fact remains that the RelationalModel automatically incorporates mutable tuple objects the moment it has mutable RelationalVariables. You cannot escape this; it's a fact inherent to the variables. And because this is already there, there is no increase in complexity - it's there to live with whether you wish it or not. Or, more accurately, there is exactly one means to escape it: make the relvars immutable. That will do the job. If relvars are referentially mutable then so are table rows (tuples) associated with those relvars. Period. I've presented deductive proofs for this more than once. Saying it ain't so won't change it. If you have an issue with the proof, then tackle the proof rather than insisting I must not be referring to the real RelationalModel. Either that, or insist that the RelationalModel must not have mutable relvars (and, thus, that the model is not relevant to any discussion of mutability in a RelationalDatabase). Is logical consistency too much to ask?

If you're defining a variable as a tuple object, that's fine. It should be clear, however, that the immutable tuples are those within relations, not those "special" tuples that appear -- through dissection of the terminology -- external to relations as variables.

However, your challenge to tackle this issue at the level of proofs is an interesting one. I shall work on this. Please stay tuned...

-- DaveVoorhis

I'm not defining a variable as a tuple object. Objects are things that can be uniquely referenced, and that have properties. Mutable objects can change from one observation to another. I'm not defining a variable as a tuple object, but the existence of a variable containing a set of tuples referenceable with a candidate key necessitates the existence of tuple objects in that data space. The existence of tuple objects follows directly from the existence of a relation variable. There is no circular logic involved, and nothing "special" going on.

I'm interested to see what you come up with on the proofs. However, that challenge goes out to anyone else who is interested, too: ChrisDate, DanMuller, etc.

How do you address the fact that a relation can have multiple candidate keys, thus each tuple can have multiple 'identities', each of which can change independently of the other? Isn't this stretching the analogy between tuple identity and typical notions of object identity quite a bit? -- DanMuller (continued)

There is no "typical notion of object identity" that possesses any rigor... at least not that I've seen on this forum. What I can say for sure is that object identity provides a means to uniquely identify an 'object' from other possible 'objects' in some space in which objects may exist, possibly over some course of time. It is, perhaps, a bit of a stretch to say that any value (e.g. a set of properties) that can uniquely identify an object over the whole of time must constitute an 'object identity'. It would allow, for example, "The Prime Minister of Britain" to constitute an object identity; such a phrase certainly identifies an object (a person), even if that person changes over time, so long as Britain has one Prime Minister. From an actor's POV (such as you or me), "The Prime Minister of Britain" could be considered an object whose properties change every second, every hour, every day, every year, and sometimes in very massive steps (e.g. change of name) when one human body steps out of the position and another into the position. * However, this probably contradicts your intuitive (and non-rigorous) understanding of "object", which likely only considers individual people as 'objects' (supposing you're willing to objectify people). We can try to use names as 'object identity', but people share names. So you add address, or history, or knowledge, or possession of a secret number (such as SSN). Ultimately, you can find an 'object identity' that uniquely identifies me as an individual for all time in this particular dimension on this particular Earth, etc. etc. etc. But in searching for this particular sort of object identity and rejecting various others, I must point out that your actions would mostly reveal preconceptions about what constitutes an 'object' that aren't particularly rigorous. Why use my name as an 'object' identity when the various atoms constituting my body are in constant flux? Why not allow "Prime Minister" as an object identity simply because the body constituting it is irregular flux? I believe there is a page somewhere that discusses object identity a bit... and a hyperbolic 'axe' that was years old and had gone through seven handles and five heads and yet was considered a single, aged, well-used axe. That discussion seems relevant here... but I cannot recall which page. [It was probably FredsAxe.]
- Re: However, this probably contradicts your intuitive (and non-rigorous) understanding of "object", which likely only considers individual people as 'objects' - There is no rigorous definition of objects. This is one of its many problems. (NobodyAgreesOnWhatOoIs) -- top
- With that, I partially agree. I, myself, am a solipsist; I accept that I cannot prove (deductively) that there is something more than myself in this universe - any such proof will necessarily be circular. This is a direct result of the nature of perception. It isn't possible to create a rigorous definition of objects if one relies on intuition built upon perception. However, it is possible to provide a rigorous definition of mathematical objects in mathematical spaces (e.g. tuples in a tuple space or relation variable). The issue of object identity exists even in these spaces, and is most interesting when the set and properties of 'objects' in the 'space' vary over time relative to the perceptions of actors. This interesting aspect requires that the space be variable... and optionally mutable.
- Despite disagreement as to what constitutes an 'object', it is possible to create a universal definition of 'object' that will work for any sort of mathematical object in any sort of mathematical space. Such a definition would at necessarily be sufficient to any mental model of physical objects in physical space, but might be overkill - theoretical mathematical limits completely overlap computational (decidability) limits, and our brains are a sort of computation system. Of course, it would be impossible to prove whether these definitions were suitable to actual physical objects... but that isn't relevant until we prove (deductively) that physical reality actually exists. With such a definition of 'object' it 'should' be easy to create a common definition for 'object oriented' (e.g. "a programming paradigm that views and implements complex systems as collections of less complex 'objects' interacting in a common space")... but that phrase ("object oriented") has long been usurped by people marketing somewhat less relevant implementation details (e.g. encapsulation, virtualization, classes, polymorphism).''
Anyhow, to reject the notion that any candidate key works requires that you embrace the notion that there is some "true" key... some "true" property or set of properties that uniquely identify objects, and that other properties are merely variables on those objects instead of constituting identity. Despite intuition, this view isn't natural to many object spaces. It can only be true in a space that allows only only one candidate key... such that other potential keys are incidental rather than intrinsic. That is, alternative 'unique identifiers' would need be the result of the data-set, not the result of inherent constraints on the data set.
We should really move this to a page on WhatIsObjectIdentity or WhatIsObject?. Do you recall which page had the discussion on the hyperbolic 'axe' that was years old and had gone through seven handles and five heads?

More importantly, what do these concepts add to an understanding of the relational model? I think the length and complexity of the discussion on this page is its own argument against such an addition. -- DanMuller

This question is much better reversed. Why should you (or anyone) add some postulate or requirement about how one should think about mutability in the relational model when it literally adds nothing to the model or any analysis thereof? To be of utility to a rigorous analysis or understanding of the model and data within it, the postulate must allow you to prove a theorem that cannot be proven without it. In rejecting concurrency issues involving multiple actors as part of the relational model, DaveVoorhis swiftly eliminated all possible such theorems. With those things rejected, the postulate is literally only a paradigm: that relational variables are the only mutable persistent objects (cells) in the relational model. It's not even a rational paradigm because comprehending something as 'mutable' or 'variable' requires an actor's perception, and because such perception certainly isn't limited to keeping one's eyes on the relation variables rather than focusing that perception on some candidate-key identified tuple or broadening one's view to the universe of all databases. The proposed postulate might help irrational human actors think about or design a data manipulation language for the model, but it certainly doesn't help when it comes to proofs. It, fundamentally, cannot help with proofs because it is only a paradigm. By Occam's razor, it should be removed from the model. By reasoning, it should never have been added in the first place. Whoever added it made a mistake. It would much better be replaced with requirements on concurrent manipulations of the model (e.g. requiring that data manipulation be, logically, the atomic replacement of one relation value for another on a relation variable) but this requires accepting possible concurrency as inherent to the any system utilizing the model. That I need a great deal of discussion to say this point tells me only that most people neither design nor rigorously study models... which isn't unexpected.

Regarding object identity: We are concerned here with object identity in the context of a data model, not a more general philosophical discussion of object identity in the physical world. A relational database attempts to model some portion of a domain, portions of which might be physical, much of which is often conceptual (as are, for instance, many accounting concepts). For the physical portions of a specific modelled domain, the digression into philosophical notions of object-ness might be relevant, but applications of the relational model in general are not limited to such things.

Regarding the question of mutability of tuples: I'm afraid I don't see why the question should be reversed. In fact, to me your reasoning seems in all respects backwards; by Occam's razor, if you will, it seems that keeping the model simple is preferred unless adding complexity addresses specific problems or limitations of the model. (Remember that we are talking about an artificially constructed model, the purpose of which is to simplify certain the organizing and manipulation of data. It's not a general theory on the essential nature of data in general.) I guess you see a proscription against considering tuples as objects, where I see an omission of the concept of tuple identity. You see more complexity, where I (and DaveVoorhis) see less. So far I've seen only vague assertions about the value of the additional complexity discussed here.

The relational model addresses concurrency only indirectly, by specifying that a database state, at all times that it can be observed, is consistent with all of its constraints. Transitions between one consistent state and another must thus appear to be atomic to users of the database. At this level, that seems to be all that needs to be said regarding mutability. This simplification indirectly affects many aspects of the model.

The implementor of a relational database must obviously delve deeper than this, and will probably deal with some notion of row identity - a notion which will typically differ from object identity as discussed above, I might add. The author of an OO data layer implemented on top of a relational database will likely also struggle with issues of object identity for objects that represent domain entities - thus only indirectly with row or tuple identity. Personally, I think the objects in such OO layers are most usefully thought of as caches of database data, because there's already a large body of techniques to help reason about and implement data caches. Correspondence between these caches and tuples in the database can be a difficult problem that does revolve around notions of identity.

I think there is room for more discussion on concurrency issues at some level just above that of the relational model. I am, for instance, uncomfortable with Date & Darwen's treatment of atomic, multi-part database updates in Tutorial D, which attempts to avoid the notion of transactions by defining a multiple-assignment operator. I can't help but think that this would be very awkward in practice, and doesn't seem significantly different from an explicit transaction insofar as user-defined functions would be involved in such expressions, thus fairly arbitrary code can still run within a partial-update context. But a) these are considerations of programming language syntax and semantics, at least one step removed from the underlying model, and b) I don't yet see how discussions of tuple or row identity help to address these problems, which are essentially the same whether you're talking about changing relvars or "tuple vars".

-- DanM

We are concerned here with object identity in the context of a data model, not a more general philosophical discussion of object identity in the physical world. -- We are concerned with whether relational database table rows have object identity... which does require some study as to what constitutes "object identity", especially in the context of mathematical objects within a mathematical space (e.g. tuples in a tuple-space, logical rows in a logical table, etc.) Do not concern yourself with whether a particular database is modelling the physical world. Do not concern yourself with what data the relational database is carrying or modelling. That's a domain issue; as such, it is completely irrelevant to the more basic question of whether the rows or tuples, themselves, have object identity.

to me your reasoning seems in all respects backwards; by Occam's razor, if you will, it seems that keeping the model simple is preferred unless adding complexity addresses specific problems or limitations of the model -- Occam's razor traditionally applies to hypothesis in science and logic; if the "Spaghetti Monster" hypothesis isn't required for a particular theory (or isn't relevant), then the "Spaghetti Monster" hypothesis should be removed. Fewer hypothesis results in a simpler theory. In the context of models, 'hypothesis' aren't strictly relevant, but 'requirements' and 'postulates' certainly are. Requirements constrain the model and postulates describe them, and the model is simpler if it requires fewer constraints and less description. Occam's razor would apply here in the sense that you should remove requirements and postulates (call them logical 'sentences') that aren't relevant to the model, including those that can be proven from other postulates and requirements, and you'll have a simpler model. Occam's razor, taken to its fullest, would encourage you to find a minimal set of sentences that fully describe the model, though there are often many possible minimal sets (where no sentence may be removed without changing the model). That is, every single 'sentence' describing the model must be justified by pointing out a theorem that cannot be proven without that sentence. Which minimal set of sentences to choose depends on which is easier (simpler) to explain or use. The best shall be as simple as possible, but no simpler... i.e. complex enough to be correct, but no more complex than necessary. When applied to an "artificial model", like Relational, "correctness" is defined in terms of the set of theorems you could prove with some "authentic" specification... but that specification isn't always as simple as possible. Anyhow, Dan Muller, my reasoning is not 'backwards'; I know what I'm talking about when I say that it is the removal of an unnecessary requirement that leads to a simpler model.

I guess you see a proscription against considering tuples as objects, where I see an omission of the concept of tuple identity. -- If it is only an omission, then it is quite incorrect to state that they, therefore, "have no intrinsic object identity". After all, the object identity exists whether you omit its description or not. There is no reduction in inherent complexity by omission of a sentence that follows from other sentences already in the model. Where you are adding complexity is in attempting to take a perfectly good model and add a sentence that states relational variable tuples or database table rows "have no intrinsic object identity". "Relational Database Table Rows Have No Intrinsic Object Identity" is generally the sort of statement you must prove. So far, you've attempted to justify it by saying it is an inherent part of the model. You effectively proscribe this fact to the model by saying "it's part of the model, therefore it's true, and if you don't like it then complain to the designers". If the model only omits mention object identity, arguing that rows have no intrinsic object identity requires you prove it using some reasonable and rigorous definition of object identity. Takes your pick and makes your choice, Dan; I'm tired of seeing people falling back to one argument when the other fails -- the model can only allow one line of argument to be relevant. If it's omission, then I can assure you that relational database table rows DO have intrinsic object identity as mathematical objects in a mathematical space. If it's proscription, then I can prove that the model is logically inconsistent with, at least, many of the more rigorous definitions of 'object identity'.

Another point of contention has been as to whether 'relational variables' are the only 'mutable' units in the model, as apparently 'proscribed' by the model... when doing so is inconsistent with any reasonable definition of 'mutable', and doesn't actually gain you anything outside of certain concurrency guarantees.

The relational model addresses concurrency only indirectly, by specifying that a database state, at all times that it can be observed, is consistent with all of its constraints. -- actually, the constraints limitation only requires that all observations of database state be consistent with all constraints. (This is a fine distinction, but it allows that the database state be inconsistent whenever it isn't observed... even if it can be observed.) However, a requirement that actors perform manipulations to the database through the relational variable would also have an effect in concurrent situations, as it would require that actors on the database read and write at the 'relation' level in a logically atomic manner. (This is orthogonal to the constraints requirement, and would be a major blow to the possibility of distributed relational databases.) If concurrency isn't directly addressed in the relational model (and I know it is not), then this requirement should be eliminated as irrelevant to the model, allowing implementors to seek better theories for concurrent operation. As this would be the only real effect of enforcing 'relation variables' as the only directly mutable 'cells' in a relational database, such a ruling is of no significant utility.

Transitions between one consistent state and another must thus appear to be atomic to users of the database. At this level, that seems to be all that needs to be said regarding mutability. This simplification indirectly affects many aspects of the model. -- With this, I agree.

Personally, I think the objects in such OO layers are most usefully thought of as caches of database data, because there's already a large body of techniques to help reason about and implement data caches. Correspondence between these caches and tuples in the database can be a difficult problem that does revolve around notions of identity. -- You discuss OO layers atop relational. In this case, 'identity' for domain objects is orthogonal to 'identity' for tuple objects. I can't agree with your proposition regarding caches and object data; a domain object represented in any data set will either be projective (carrying the definitive properties of the object) or reflective (carrying some known facts about an object that exists external to the database). The former requires no cache (except for optimization) and the latter is no cache (an actor cannot 'cache' external reality... only observations and inductions on that reality.)

avoid the notion of transactions by defining a multiple-assignment operator -- heh. I need to open that book (just received TTM from Amazon). That sounds like a rather poor attempt to avoid transactions.

I'm struggling a bit to figure out how to say this concisely. Speaking only for myself, I don't see a constraint or proscription, but rather an omission. I think I've been consistent about this, even starting with my first addition to this page. If tuple or row identity can be usefully identified as an emergent phenomenon of the model, have at it. But as you've pointed out: "There is no "typical notion of object identity" that possesses any rigor ...", so to reiterate, I don't see the general value, although I can imagine specific contexts in which the pursuit might be worthwhile. That, however, is different from saying that I agree there is an "intrinsic" object identity that pertains to tuples. (I'll assume that "relational database rows" can be rendered as "tuple", if "relational" here is meant literally and not just approximately.) How can anyone assert that they do have an "intrinsic" ("belonging to a thing by its very nature") object identity when there isn't even an "intrinsic" understanding of what "object identity" is? This mystifies me. And this page doesn't seem to have shed much light on what definition of object identity would falsify the title. -- DanM

This page doesn't seem to have shed much light on what definition of object identity would falsify the title. -- hell, it's worse than that, Dan. This page doesn't shed any light on what definition of object identity would potentially support the title. If you make a claim like the title above, I want proof! And I'm sure you want the same. A simple definition of 'object identity' that casually falsifies the title is: "object identity is any value that can be guaranteed to uniquely identify an object if that object exists." There. I'm done. A 'tuple' in a relation space can be identified by a candidate key and a relation variable. Tuples are objects, the database is an object space, relation variable is an address to a particular segment of object space, and object identity is a candidate key. This was stated at the top of the page! Now, Dan, why do I see so much support for the notion purported by the title when I haven't yet seen even ONE sound argument that supports the title? All I see, over and over again, is: "it must be true 'cuz Date or DrCodd said so." But if you believe that they didn't say so, why must I bother falsifying the title? You, like me, should demand proof of the titular statement in the first place.

Anyhow, there are many notions of 'object identity' that are both very rigorous and correct in their object space, and there are many more that are both rigorous and correct in ANY object space. They just aren't typical. The typical notion of object identity is based on human intuition and built upon human perception... which is dreadfully inadequate - you can't even prove deductively that there exist objects for you to perceive! To find a rigorous definition, you need to consider mathematical objects in mathematical spaces, for which objects provably (deductively) exist and therefore a rigorous definition of object identity may also exist. I elaborated on this a bit to Top, above. I don't advocate trying to use an inadequate typical notion of 'object identity' based on naive typical human understanding of both 'object' and 'identity'. I advocate using an atypical but rigorous definition. Comprehend?

Yes, I understand just fine. But I don't think your simple definition is useful. I return always to the case of a relation variable with two or more candidate keys. If I change one, then by your definition, a tuple "object" has both changed a characteristic (but not its identity), and has been destroyed (and another one created), depending on which candidate key(s) you focus attention on. This hardly sounds useful, and I still wait to see what concepts, theorems, or deductions it clarifies or enables. -- DanMuller

Sigh. Again, with the typical understanding of object identity. I'm rather interested in hearing what you consider 'object identity', but I don't think you'd be able to put words to it.

Consider the fresh baseball. It is an object. But, if you identify it by "the fresh baseball, never been hit once", then is it untrue to state that this object is destroyed when someone hits it? I think not.

Object identity only matters to actors making observations on a variable object-space. Ultimately, the very concept of 'object' is simply a means of modelling that which an actor perceives... an 'object' is something one can identify, observe, and watch for changes. Objects can be people, the sky, the ground, a glass of water, etc. In science, all of them are merely conglomerations of atoms interacting with light... in various philosophies, it is pointed out that there is no deductive reason to consider one physical object as 'separate' from any other. If the claim was that intrinsic object identity doesn't exist for anything... that all object identity is extrinsic... then I believe a very well-reasoned argument could be made. However, if object identity is considered intrinsic for physical objects perceived by humans in physical space, then it's just as intrinsic for mathematical objects in mathematical spaces as observed by abstract agents. And it's only reasonable to consider 'object identity' to be intrinsic when you have a value that can be guaranteed to uniquely identify an object at any time it (the value) is used.

I follow your reasoning, but I'm still not seeing the point behind such a general definition of object identity in the context of programming. I think that two different 'object identity values' that sometimes refer to the same object, and sometimes to different objects, is fine when discussing human understanding of the physical world, but I doubt that many programmers would expect such a definition in the context of programming.

You're right in that I can't put a definition to 'object identity' for you, because I'm interested here in definitions that relate to programming only, and the definitions vary somewhat among programming languages. C++, for instance, is a little unusual and comes closer to your more general definition, in that you can have (in some specific circumstances) different pointer values that reference the same object - but it's still one object being referenced, and if it is deleted, then all of the reference values are invalid. In most OO programming languages, one object will correspond to one reference value.

I'd be mildly surprised if you can even find any academic computer science papers that use a working definition of object identity that is as general as yours. I still wait to see what concepts, theorems, or deductions it clarifies or enables.

-- DanMuller

In most OO programming languages, one object will correspond to one reference value. -- I agree. Most OO programming language implementations use some mechanism of reference as object identity; all other aspects of the object are considered subject to change, and any other qualities that can potentially identify objects are considered incidental to the object set rather than inherent to the object system. C++ runtimes internally use RAM addresses for object identity. That means if, say, someone moves the object to another address in memory, it's considered a different object by all actors who happen to use address as object identity - the original object was destroyed, and a new one exists. Since C++ allows objects to be destroyed while the value referencing them still exists, and since it is possible to forge the value referencing an object (as it's merely an integer), it is also possible to refer to objects that do not exist, or to refer to an object that exists, is destroyed, and is later replaced with another object at the same location that has an entirely different type.

I still wait to see what concepts, theorems, or deductions it clarifies or enables. -- Clarifies or enables compared to what? I think you mistake definition for model. Definitions merely are. They don't do anything. You need to have at least one rigorous definition for a word before you can use it in any proofs... but any rigorous definition will do the job. The proofs you can make will depend on which definition you choose, true, but I don't have any other definitions sitting in front of me. Even the authors of this page haven't provided one. The definition I provided is simply one that is both rigorous and, for most object spaces (including physical space), very reasonable. I'm quite open to alternative rigorous definitions of 'object identity', which is why I stated a simple definition of 'object identity' instead of the definition. I do have some objection to simplistic definitions that cannot generally handle mathematical objects in mathematical spaces... but such definitions may be good as operational definitions for specific sorts of objects in specific sorts of object spaces.

One theorem I can prove with the definition I provided is: Relational Database Table Rows do, indeed, Have Intrinsic Object Identity. I'm confident that this can be proven with any rigorous definition of 'object identity' that is broad enough to generally handle mathematical and physical objects... supposing the definition doesn't reject intrinsic object identity as existing at all.

Anyhow, objects in programming aren't so different from physical objects that they don't make use of multiple identities even in your work today. Consider a filesystem from which web pages may be displayed... each file (an object in the filesystem) can be identified by its canonical file name, by its URL, by its ComputerID:filename, by its hashcode, etc. These can change independently of one another (e.g. a URL can move from one file to another, a filenames can be switched, files can be renamed, files can be changed which will alter the hashcode, etc.) However, the file is the object, existing courtesy of the filesystem. Is there any better identity for these files than, say, the name... or the URL... or the content? Not really. Actors need to make do with whatever identity they choose to use and live with the fact that under that perspective, manipulations by one actor (e.g. renaming the content) will appear to another as something entirely different (deleting an object with one name, and creating an object that incidentally has the same content with another name).

More complex object-oriented languages will, indeed, migrate towards a more flexible approach to object identity than exists in current runtimes... especially for distributed, resilient (fault-tolerant and attack-tolerant) programming. The current system of using addresses as the only reference mechanism is far too fragile and far too easy to tamper with in distributed systems... even those that do not expect regular cracking attempts or the destruction of valuable hardware at inconvenient moments.

I disagree with the file system analogy. File names are very much like a primary relational key (within a folder). However, that is not something that most implementations of OOP inherently share. Unique file names would have to be an added constraint in an OOP model. The only comparable thing most common implementations of OOPL's share out-of-the-box is to see if two object references are referencing the same RAM address. This is more analogous to a system-generated key instead of a domain-attribute-based key. Most OOPL's don't compare equivalence on attributes, let alone give you a choice of which attributes "count". This also reflects the real-world, more or less. The only way to know for sure if two people are not in fact the same is to put them in the same room. In other words, see whether or not they occupy the same "space". In computers, this space is RAM. -- BlackHat

Seeing if two people are the same by observing whether they occupy the same 'space' only works due to a constraint that is understood to be part of the physical domain: no two objects can occupy the same space at the same time. Thus using time and space coordinates is certainly one means of identifying physical objects. It works well enough for people... with only slight hiccups when dealing with pregnancy, chimerism, transplants, implants, artificial limbs, and siamese twins. However, not all mathematical spaces possess a 'distance' dimension. Tables a'la SQL, for example, do have an inherent 'distance' dimension: row number. It makes for an address-based object identity. Relations, being sets, do not have a 'distance' dimension; an object is either entirely within a set (distance=zero) or entirely outside of it (distance=infinity). Since all objects in a set are zero distance from all others, you cannot use addressing.

Of course, you can't count on space identifiers for all things even within the physical realm. When dealing with bosons, the Pauli exclusion principle does not apply. Bosons can occupy the same 'space' at the same time. Attempting to use address-based 'object identity' for such objects is doomed to failure.

I meant for practical, common stuff. Bringing the realm of quantum physics into the mix is not necessary at this point and complicates an already tricky topic in my opinion. -- BlackHat

Yeah, yeah... still, it's a valid point. If time travel were possible, that'd also lead to some interesting questions of location-based identity.

Canonical (aka absolute) filenames do include their complete path. A canonical filename is a complete address, and is necessarily unique within a filesystem. (...)

That is still a "domain string key". It does not change my original argument that I can see. "Domain" is the issue here. Objects generally don't use domain attributes as native uniqueness identifiers. -- BH
Physical location is an attribute of the physical object space, RAM addresses are an attribute of a memory (or virtual memory) object representation space. Filenames are an attributes of filesystems (a file-object space). All of these things are sorts of addresses in object-space; that is, they are mechanisms for identifying objects by their 'location' in an object-space, and every object in those object-spaces must have a 'location'. Your original argument, as I understood it, was that RAM addresses are somehow theoretically different from using Filename addresses. They are not different. In fact, the two are isomorphic. Where you may say canonical file names are very much like a primary relational key (within a filesystem), I can return that RAM addresses are equally as much like a primary relational key (within a memory space). Both are addresses, and neither form of address is more or less "domain" than the other.

(...) Strings from a finite character-set are isomorphic to integers and filenames (in any modern system) be represented as such strings... so they are not significantly distinct from RAM addresses. The important components of my analogy above was not the filename, but the presence of two other sorts of agents viewing the filesystem differently... one through a translation (URL to file) and one through an inversion (associating identity with the content block, so a 'name change' is meaningful... whereas 'renaming' under the view that filename is identity would be considered a destruction followed instantly by a creation). These additional views are also common to filesystems in programming.

Please clarify. The details of renaming are generally hidden in most file systems. -- BH
The implementation details are not relevant. If you view filename as that which identifies an object, then the concept of "renaming" doesn't make sense. If you view filename as file-identity, then as an actor you can watch the 'content' of a file change over time, and you can watch as file-objects are added to and removed from the filesystem, but you can't watch a renaming... the closest you'd ever come is seeing a file with one name disappear while another file simultaneously appears with a different name but with the exact same content as the file that disappeared. And, being isomorphic, this has a direct translation to RAM addresses. It is easy to move a C++ object from one RAM address to another (ignoring concerns of breaking references): simply take the block of memory that constitutes the object and copy it to another location, then free the old location. Or, alternatively, just tweak the virtual-memory table. However, to those agents using RAM address as object identity, this appears to be the creation of a new object and the destruction of an old one. Does that stick? "Renaming" only makes sense if you associate the 'object' with the content, not the address.

It's a mistake to consider RAM as the most natural space for objects in programming simply because it's the one to which you are most accustomed due to modern OOPLs.

I don't believe I made that implication. I am not giving a value judgement to "key types" at this point. I am only pointing it out as an observed property of OOP as commonly implemented. In practice, OOP does not seem to rely on domain identifiers for uniqueness determination. RAM-ness is not really the focus, but rather lack of domain ties for identity. -- bh
Yes, most implementations utilize address-based identity. There are many viable alternatives - origin-based identity, title-based identity, and property-based identity to name a few. Only the latter relies on what I believe you're calling "domain" components - identifier values that arise from within the object more than the object-space. (Object-space is inherent to objects as every object that exists must do so within an object-space; an object not in a space does not exist in that space.)

Identity as a RAM address allows for rapid access to representations of objects in a local system because it is very close-to-the-metal. However, RAM addressing is very poor for object-oriented code in distributed systems, for objects shared between different sorts of non-persistent processes, and for code allowing for transparently persistent objects in general. Address-based identifiers of any sort are awful for code that must resist forgery of object identifiers, for code that acquires security through the capability model, for code allowing the migration and mobility of objects between and during runtimes, and for code that incorporates distributed caching and synchronization models for speed and resilience (allowing one object to take over if another fails). Identifiers utilizing a RAM address get the worst of both worlds.

You should view a RAM address as being a very simplistic form of object identity... one that trades for speed on a local machine at a rather significant cost to flexibility, security, abstraction, and resilience.

It would be interesting to look into how distributed OO systems deal with object identity. It may tell us something about how the implementors view object identity, being that I am looking at the usage frequency to shape my working description of ObjectIdentity. -- BlackHat

Ask away, then. I qualify as at least a designer of distributed systems though I'm still working on the 'implementor' part. The issue of identity of objects and actors in distributed systems can cause a few headaches even before adding the following constraints: runtime updates of objects and services, attempted spoofing attempts by crackers and hackers, object and actor migration, caching with failover, and potential for catastrophic object or actor destruction (e.g. a robot is destroyed by an RPG, commander is killed by bullet, etc.) where communications must also failover immediately. For this latter case it's much much much better to identify objects by title than by address or origin. However, not all objects need be identified by title... and not all even require mobility or may use mobility relative to some other object, which allows addressing. In practice, I use an abstraction for object-identity (using different sorts of values based on what is most useful) and I allow objects more than one identity. I use address-based for immobile objects because it's fast, and I use RAM-based for objects that are immobile, local, and volatile. For mobile objects I use origin, and for keystone objects and actors I tend to use titles (which can be shifted to another object or actor in event of upgrades, destruction, etc. allowing for a more resilient SOA). In a C++ implementation, identity would be considered an abstract ValueObject that provides a handle for message services (allowing one to send messages).

I moved this down here because there was an editing conflict that tangled what I was replying to.

The term "primary key" in RDBMS-speak does not tell us whether it is a generated (auto-number) key or a domain-tied key (like SSN). Both are common in RDBMS designs. -- bh
True. But this is not relevant to the discussion. Files in a filesystem don't inherently carry their own filename any more than rows in a table inherently carry their row number. Addresses are not part of the object... not in the same way that relational 'keys' are part of tuples in relations.
I am not sure if "carry" carries (pun) any clear meaning in cyberspace. Same with "part of". Thus, I cannot evaluate your statement in a definitive way to agree or disagree. -- bh
Consider a remarkably simple object... say, a cell that carries a single 32-bit integer. Now create an array of N such cells. At this point each object (an individual int32 cell) may be addressed by its integral position in the array. However, this position in the array is not part of the 32-bit integer or cell. If that same sort of object existed in another object space, it might not be addressable by a position, or it might take something other than an integer to address it. It is quite clear that the 32-bit integer cell object does not carry its address. This is true for addresses in general.
Well, I generally consider that an implementation detail. Perhaps auto-keys or even primary keys in DB's are not actually kept with the record data itself since the index already "knows" that info. One could implement positional (integer) arrays by storing the index in the cell, and visa versa with just about any structure. It may perhaps change the performance characteristics, but it won't change the interface or behavior from the user's perspective. The interface does not tell us that info, and it could perhaps change without the user ever knowing. Nobody can prove that a typical integer index array actually uses offset multiplication in its implementation without dissecting the machine code, RAM, or the compiler (the internal "guts"). One cannot tell from the behavior (results) alone.
Addresses are not "implementation details" and you're wrong to consider it so; addresses are logical details, inherent to the object space. How you choose to implement the objects and object spaces is an implementation detail, and you're correct to note that you can implement an array in quite a few different manners. However, regardless of the implementation, the logical object does not carry its address. You cannot "store the index in the cell" and be discussing the same 32-bit integer cell I mentioned earlier.
Let's use emulation to compare characteristics so that we don't have to talk about actual RAM, etc. OO-space generally matches the characteristics of a generated primary key rather than a domain primary key. If I implemented/emulated an object system in a RDBMS, I would likely use generated keys to track objects. Two object references would be considered the "same object" if the generated object key were the same. The generated number is a lot like virtual RAM (except it stays when the power goes out). If I implemented a file system in OOP, I would use some kind of programmed constraint or validation to guarantee unique file names, not the file name or path as the object (file) identifier. In an RDBMS, one could use the built-in key system to guarantee things like unique file names because one *can* tie the primary key to domain attributes. However, a "typical" OO engine does not provide something equivalent. It would have to be explicitly programmed. Out-of-the-box, most OO languages do not allow one to use domain attributes as the primary object identifier. In that sense, it lacks a native feature of RDBMS. (An RDBMS emulation of a file system would probably have to use a compound key consisting of the directory ID and the file-name. It may be possible to use a tree-based path to avoid generated keys, but this could get messy.)
You're free to implement one logical object space within another. Just don't mix the two.
I am not sure what POSIX file features dictates as far as whether "rename" moves, copies, or if it does not care as long as the result is a certain way. -- bh
A "rename" is not a copy. It might be implemented as copy-then-delete. I care not about such details; the discussion here needs only the logical consequences of the event. By "rename" and "copy" I mean the obvious events with which you are assuredly familiar. Oh, and "Moving" a file is logically identical to renaming its canonical file name.
Again, I think English is getting in the way. If "move", "copy", and "rename" can all result in the same result, then it may not tell us anything specific enough for our purposes. Maybe some file system implementations use one and others use another and the user cannot tell the difference. That is why I looked at a RDBMS emulation of them above so that we can at least use clearer terms taken from the world of RDBMS to compare. -- bh
Sigh. What makes you think "copy" is the same as "rename"? After a "copy", you have two files where you once had one. The content of the two files are the same, but the name is not. After a "move" or "rename" you have no increase or reduction in the number of files in the system. Instead, you have a file with the same content as before the move/rename, but with a different canonical file name. And then there's "delete", which removes a file. A rename is not a copy. A rename may be implemented by a copy plus a delete, but that is not relevant to its nature. Your focus on implementations is misplaced; just assume the implementation does the logical job correctly, and focus on the logic.
But "rename" can be implemented via a copy. (...) -- bh
''Wrong. Once again: a "rename" cannot be implemented via (just) a copy. No matter how you copy a file, you haven't done a rename. A logical rename can be correctly implemented with a copy followed by a delete. However, with just copy, you haven't really done a rename... the actor (user) would be able to tell easily because the original file would still be there! And get it through your head: the implementation (assuming it is correct) does not matter to the logic.
(...) Suppose you are implementing an embedded POSIX-compliant file system for a Palmpilot or the like. One may simplify or reduce the size of the code by implementing a rename via a copy and delete internally instead of doing a name-only change. The user may not be even able to tell. (Suppose the file system is single threaded or locks DIR during copy such that one cannot see the 2 files existing during the transaction.) And, what do you mean by "canonical"? (Canonical file name == absolute file name. It includes the path. I've mentioned this before.) If you are talking about the *external behavior* of "copy" being different than "rename" (I'm talking about the damn logical behavior. Yes, external behavior, too.), I perfectly agree. But what does this have to do with identity? (I feel like I'm talking to a wall... Did I not say that "rename" doesn't make sense if you consider the "identity" to be the filename? Were you not damn listening?) From the user's perspective, the file name (or file name plus path) is the "identity". That is more akin to domain-key identity than hidden address identity, the type that OOP natively tends to use. (No. It is not. It is precisely the same as the type that OOP implementations tend to use. That strings are used instead of integers is for human convenience. Further, your use of the word natively is also incorrect.) In this sense, files in file systems do not behave like objects. (Regarding which you are wrong. Simple as that.) Custom behavior has to be added to objects to give them that kind of domain identity tie. (No. It must be added to the object-space, not the objects.) Two objects can have the same file-name and/or path attribute and the object engine will not complain one bit. One has to program an attribute check to catch or prevent such overlap.

Let's simplify things and assume a file system without sub-folders (like early PC's):

   class File {
     attribute fileName: string private;
     attribute content: bytes private;
     method new(nm, con) {   // initiator
        fileName = nm;
        content = con;
     }
   }
   ...
   file1 = File.new("foo","asdfasdfasd");
   file2 = File.new("foo","xx234987");

The object engine would not crash. We have two "files" (2 file objects) with the same name ("foo"), a no-no in filedom, but native OO does not understand this. To prevent this, we would have to implement some kind of search of existing objects to make sure one with the same name is not already there. Contrast with a RDBMS where by saying the "fileName" column is the primary key, we prevent duplicates. The RDBMS natively "understands" domain attribute based identity. By "understand", I mean it has built-in abstractions that support the concept.

What you don't have is a filesystem -- the object-space. Files cannot exist without a filesystem. Your attempt to place the files into the C++ object space is where you err. A correct implementation would be: deftype Filesystem = [Filename => File], deftype Filename = [Char], deftype File = [Byte]. The RDBMS logical equivalent would to have a set of (Name,Content) pairs, with a constraint that no two names in the relation are identical. Theseare rather different approaches.

Are you saying the C++ approach is not "true OOP"? Perhaps this is an issue of the definition of OOP?

Not at all. I'm saying that your C++ approach is not "true files and filesystem". You created objects, but you did not create file objects... at least not in the sense of files in a filesystem.

I think you misunderstood the purpose of the example. It is to emulate a file system, not use a file system.
I did not misunderstand. To emulate a filesystem, you must emulate both the file space and the files. You failed to do the former, and your description of the file class fails as an abstraction of a file object found within a filesystem... though you might be able to strongarm it into working, it'd be a major hack-job.

I am not familiar with the syntax of your proposed alternative. It does seem to have maps built into its syntax, but that is not a requirement of OOP by most definitions.

Objects need to exist in an object-space. It is true of OOP under any definition. For most implementations of OOPLs, the object-space is an addressable memory space. For files, the object space is a filesystem. That's pretty much true by definition. Oh, and syntax [T] is logical array of type T (indexed by position), while [K=>T] is a logical map. It's syntax from a MyFavoriteLanguage.

That is an addition to OO. Again, I am basing my characterization of OO on common usage, and C++ style OO includes that. I am not giving the common definition a value judgment at this point. Note a primary key makes it unnecessary to have to apply explicit constraints in RDBMS. Primary keys are required by the relational model (because sets by definition have no duplicates). Constraints are not. -- bh

Primary keys are not necessary. Candidate keys are. But if there is more than one candidate key, there is no requirement to name one of them "primary". Oh, and having a primary key smaller than the whole tuple is a constraint.

How about we put it this way: it needs at least one unique key (compound or singular). But it does not change my original point.

I'll agree that relations necessarily possess at least one unique potentially composite key, that being the entire tuple. Your original point is incorrect for other reasons.

Re: "Did I not say that "rename" doesn't make sense if you consider the "identity" to be the filename?"

Please explain. From an interface/user perspective, it could mean "create a new copy with identical attributes, but with the new given name and discard the original". Whether that captures the "essence" of "rename" or not, that is kind of a fuzzy psychological issue.

This would be equivalent of when getting punched in the face, a deity (or quantum threads) suddenly grabs the pre-punched person and replaces it with a new person (post-punched) which is identical to the old person in every way except for a bashed up face. From the observer (user) perspective, they cannot tell the difference and normally don't care. The file name is equivalent to the person's face: it changes but nothing else does (from the observer's perspective at least).

If you view filename as that which identifies a file object, then the concept of "renaming" simply doesn't make sense. If you view filename as file-identity, then you can watch the 'content' of a file change over time, and you can watch as file-objects are added to and removed from the filesystem, but you can't watch a renaming... the closest you'd ever come is seeing a file with one name disappear while another file simultaneously appears with a different name but with the exact same content as the file that disappeared. You might have some psychological issues with it, but you're a human and thus subject to psychology; there isn't anything fuzzy happening here in the logical sense. For any actor, human or otherwise, observing a logical renaming requires that one consider the file to be identified by something other than its name.

My point is that one cannot tell the other apart. If there is no way to tell them apart, then for practical purposes, they are equivalent (or if the differences don't contradict the definition). If the process is thread-locked during the copying, there is no way a user could tell the difference by seeing two files at the same time. I suppose you could argue that it uses more temporary space than an attribute-only rename, but the use of work space is not forbidden by most system utilities.

If you define, for practical purposes, "renaming" as a copy followed by a delete, then it would be so. However, that is not how humans think about "renaming" objects. Don't kid yourself about it. You can't "rename" your cat by cloning it precisely, naming the new one, then killing the old one. It doesn't even work if the process is done within a Schroedinger box into which no observations may be made until the process is complete. To a human, "renaming" means that the object is the same... excepting that name associated with it is altered. To consider "the object is the same" absolutely requires that you identify the object by something other than its name. For a cat, it might by its body or history or the memories associated with it. For documents, humans tend to identify a file based on some combination of its content and origin. E.g. 'the document I wrote about how renamed cats have only eight lives left'. A file containing such can be given a name, then I can change its content, then I can change its name again... and it remains the same 'object'. Renaming the document file changed some property related to the document (the filename associated with it); it emphatically did not destroy the object and create a new one simultaneously. Changing the content changed another property. Neither change affected how I identify the object.

How you or any actor views the world or any other object system always depends on what you consider to be 'object identity'. If you consider the filename to be the identifier, then you can't see a rename; you see a copy followed by destruction... a pairing of events you might identify as a common pattern if you see it often; you could call such a pattern "rename" but you wouldn't mean what humans usually mean when they use the word.. If you view 'origin' to be identity, then a rename is fairly natural (as rename doesn't change origin). 'Copies' also make sense, and would be considered different files because they have different origins. However, you'd see a 'copy followed by a delete' as exactly that: creating a copy and destroying the 'original'; you would scoff at calling this a "rename" even if rename is implemented by exactly that. Are you the same person you were yesterday, or is the old you gone forever, replaced by a new you with slightly different properties? Is FredsAxe the same one he started with? Etc. This sort of paradigm issue exists in any object system where multiple identities may reasonably coexist for an object. I originally brought up filesystems precisely because there are many valid paradigms utilized by actors interacting normally with a filesystem... because of multiple 'identities' for files, there is no one identity that can be declared "most correct".

Well, I think it is relative. We think about things different ways depending on the moment and need. It is not cut-and-dry. At this point I think we just have to AgreeToDisagree. -- bh

You can certainly change what you consider to be object identity depending on the need at the moment. That was my point: there are plenty of systems for which it's perfectly reasonable to view the same objects in different ways based on need, and nobody can tell you that any a view is wrong so long as that view is logically consistent. However, certain actions are impossible to perform or perceive under certain paradigms... saying they're possible (without butchering the essential meaning) would introduce a logical inconsistency. Among these is the concept "rename" when object identity is name. Of course you're free to butcher the essential meaning (the assignment of a new name to an object that once had another name) and replace it with something that fits your object identity (copy + delete); I won't accept such a definition as "correct", but I'd agree that it's "practical".

I believe issues of logic and math like this one are cut and dry, and to AgreeToDisagree would require that neither of us respect the answers that come from cold, hard logic and deductive proof. I will agree, however, that you're entitled to your own beliefs, be they inconsistent or not.

If you can produce a "solid" model or math that fits common or consensus definitions that proves it "wrong", I may change my mind. At this point your views seem to have a partial foot in psychology rather than pure math or hard science. It is technically possible to "rename" a primary key like a file name in a usable sense. That is not disputable that I can see even if it does create some philosophical puzzles not too different from the kind found in quantum physics. -- bh

Digging in your heels and calling your argument "not disputable" is somewhat dishonest when dispute clearly exists.

You cannot technically 'rename' or otherwise 'change' the object identity you are using to identify objects. To say otherwise is to introduce a logical inconsistency - a contradiction. The very notion of observing change wrgt an object requires that it be possible to identify the changing object both before and after the change, allowing one to observe differences. However, one identifies an object by an object identity. Thus, if it were possible for 'object identity' of an object to change or be changed, one would by definition be unable to identify the object both before and after the change with the same object identity and therefore observe the change. Thus, said change would be impossible to observe.

Who says we need to observe it? We only need to observe the results for it to be useful to us. It is also possible to assign temporary and or hidden ID's during the change process. I don't see how you can claim it technically not possible. Even RDBMS allow one to change the value of primary keys. Whether it creates a subtle philosophical conundrum or not I stopped addressing here. It is technically doable and does not create immediate, obvious, or significant problems in practice. -- bh
Two things here, BlackHat. First, I went into the reason for observation below. If you didn't bother following that, then read it first and ask any questions after you're confident you comprehended. Second, remember that the actors and observers aren't necessarily attached to one another by any shared memories. Assigning "hidden IDs during the change process" certainly works for whomever can track these hidden IDs that exist externally to the object system (entirely extrinsic). The easiest ID to use for this, conceptually, is "origin" (i.e. this tuple came from modifying that one which came from modifying this one which was created by Bob a year ago). However, unless these histories are part of the object space, this sort of identifier isn't very real; it can't be used to tell another observer which object you're talking about. Further, it still means that you won't ever be able to ever observe someone else doing a "rename"; all you'd see is one object disappearing and another object appearing between observations of the object system, and you'd have no way of knowing that it was different than a create and delete. With regards to the RDBMS: an RDBMS allows you to directly change a relation. It is possible to use this capability to change tuples within a relation identifiable by candidate keys. However, not all changes to relations are changes to tuple objects... at least not in the sense of change being distinct from created and destroyed in the object life-cycle. For a tuple object to exist as an entity in its object space, it must be identifiable by at least a conceptual, omniscient observer that has access to everything in the object space. This can be done by use of a candidate key. For an omniscient observer to recognize change of a tuple object requires that it be possible to identify the same object as it exists both before and after the change. This is done by using the same identifier for both before and after the change. However, if you change the relation in such a way that the candidate key no longer identifies a tuple object, then you've destroyed that tuple object - it no longer exists. This isn't a logical or philosophical "conundrum"; it's solved. Of course, as with files, you could use hidden identifiers for tuples for the change... but, as with files, you still won't be able to see the "change" when anyone else does it, and you cannot communicate it to other actors, and thus it won't be a real identifier. And wrgt the files, your approach to making "rename" technically doable is exactly what I've said you'd need: another object identity than just filename. This can't be avoided for reasons I've provided.
How exactly does a copy-and-delete "rename" require an outside/extra identifier? And even if an outside/extra identifier is by chance needed for the rename process, it is not necessarily needed *permanently*. I suggest you use associate array examples. -- bh
- I do not require an outside identifier... any extra identifier will do the job. A better choice is another intrinsic object identity.
I tire of this. No matter how I try to explain it, you find a way to not understand. It's probably because you're failing to grasp that 'objects' really are just a way of viewing a world. This will be my last effort. If you've defined "rename" as "copy-and-delete", then it does not require an object identity other than name to perform. However, "copy-and-delete" is not a natural definition for "rename" - it's something that you made just so you could use the word "rename" for something practical. The only natural definition of "rename" is the process of changing the name associated with an object; that is, the object had name X before the rename event, and has name Y after the rename event. This can be observed by looking at the object both before and after the rename, and checking out the name. However the "looking at the object" part is where you run into a quandary; as an observer of a mathematical space, you're identifying the object through some properties that exist on that space... such as a name associated with each object... e.g. a file called "shoe". Problem is, if you use name for identity, then any attempt to 'change' the name of "shoe" results in the object identified by "shoe" simply disappearing... which violates the essential definition of "rename".
For an associative array example, consider a cell containing an associative array of [Int => Int]. The integer on the right is named or addressed by the integer on the left, and the one on the left acts as an index or primary key (i.e. it's unique). You can change the array out, one for another; as a result, you can change the array out one for another with some small difference (e.g. to replace (1,3) with (1,4)). Thus the right-hand-side of each tuple provides a logical cell... one that fulfills all the required behavioral properties of cells (minimal is get value, set value). Ultimately, each cell may be viewed as a mutable object, addressable within the 'space' offered by the associative array, addressed by the integer on the left-hand-side. Now that you're considering the objects in that manner, how would you go about "renaming" a cell... that is "changing" its address. You know that the end result you wish is for the cell (1,4) to go to, say, (3,4)... so an obvious "solution" is to replace the entire associative array with the (1,4) tuple replaced by the (3,4) tuple. But, to anyone watching it looks exactly like the cell associated with '1' was deleted while the cell associated with '3' was created, which certainly isn't a "rename" under any reasonable definition of that word. You can play around with virtual object identity, but that doesn't really fix the logic - as I said above, the extrinsic object identities are severely crippled and mostly unusable for communication of identity. What you need is another form of object identity than just name/address.
Sorry, but I don't find your argument convincing. People in practice do use "rename" for such things. Your argument is still tied to the psychology of definitions, not cut-and-dry math/logic. A file system (assume pre-folder like micros in 1980) could be implimented as an associative array like above, and people would still request to "rename a file" and it would work for practical purposes. If it is "unnatural" (your words), why do people say it (and not knowing the implementation)? We are at a LaynesLaw loggerjam. Copy-and-delete is an implementation detail, not a results contract. "Rename" is a results contract. It can be implemented internally via system/RAM pointers or a map on file-name with copy+delete (and maybe some thread-locks). Both satisfy the results contract. My diety-face-punch analogy (above) still applies. (I wish you had stuck to files instead of changed the example to integers, by the way. That confuses readers in my opinion.) Note that I am not saying that extra keys are necessarily a bad implementation. I am only saying it is just an implementation issue. -- bh
You're still too focused on implementation details... but I'm probably too abstract for your preference. I don't bother with such details as "assume pre-folder like micros in 1980" because I'll just note that "/root/folder/file" is a perfectly valid canonical filename, and I'll move on. Similarly, I'm not particularly concerned with [Int => Int] vs. [String => [Byte]] vs. [Filename => File] vs. \TypeA TypeB -> [TypeA => TypeB]... but that you mention this bothers you tells me you're used to concrete information. I don't really think on that level; I haven't for years, and can barely remember how. That's why I'm struggling with this explanation.
Anyhow, I've prepared a 'dialogue' style essay, below, that you can look at.

Now, if the notion of 'object' were truly independent of observers, this wouldn't be a problem... but this is not the case (see next paragraphs). Relative to all observers of that 'object identity', the object simply no longer exists; the most an observer can do is note this non-existence... a 'destruction' of the object. Of course, with a little search, one might learn that another object exists with similar or even equal properties... but without identifying objects by something other than that first identity, you'll never be able to deduce that the destroyed object is somehow the "same" as the created one. The notion of observing a change in identity leads to a contradiction... and because objects are, by nature, a paradigm on observation, so is the idea of change in objects. Change of object identity isn't observable, therefore it isn't logically possible... unless you use one object identity to observe a change of another, but doing this requires accepting that objects have more than one identity - a notion that you have explicitly rejected for files... and therefore for all address-identified objects by virtue of isomorphism. All this is pretty darn fundamental. You literally, logically, and technically cannot rename a file unless you allow object identity for a file to be something other than its name... or you seriously butcher the essential meaning of "rename".

Which "observers"? Do you mean end-users, or programmers, system designers? I consider "rename" to be a contract for the view of the user and *only* the user. What the techies/implementers see means jack-bleep. -- bh
Abstract observers of the final system. That includes end-users and programs. The problem with the statement you just made is when you used the word the prior to user. I can't consider it such. "Rename", to me, means that people with memories of an object can look at it, look at its new name, and say, "Oh, it's been renamed!" That means more than just you. That means all end-users. Now, if you only have one user by definition, it isn't a problem. If you have two users, it is a problem.
I am growing confused here. Are you raising the issue of internal "guts" programmers, or of concurrency? If you are talking about concurrency, then assume a single threaded file system, or at least one that locks the thread during "rename" such that nobody can see two files during the copy+delete process (they wait in queue until RENAME is done). I know that can be a performance problem, but performance is not the topic here. (There may be performance trade-offs between single key and multi-key systems.) -- bh
No, no, and no... and again, the implementation-talk nee

There is a very pure mathematical foundation underlying my views on objects, and there is no direct grounding in any sort of psychology excepting that which originates "object" as a meaningful word. The concept of object is founded in humans observing and creating a mental model of physical reality. The concept of object is necessarily tied to perspective (paradigm) because there is nothing about our observations on physical reality that can even deductively prove reality 'exists', much less objects within it: there is no inherent reason to consider me a "human" as opposed to a conglomeration of water and lipids, there is no inherent reason to consider a particular density of water vapor a "cloud", and so on. But our brains are pattern-matching machines, and so they identify patterns. "Object" identifies certain sorts of observed patterns... those distinct from space, time, masses, materials, and properties. "Objects" are things we can observe (watch, feel, taste, smell), manipulate, and predict. "Objects" have properties that may (in some domains) change over time. "Objects" are unique in that one identified object is necessarily distinct from another (or you couldn't call it "one" object). "Objects" are in some sense 'real' - they inflict their reality on us observers, not vice versa, because changing how we view the object does not change the object.

A mathematical object must also have all those properties... but the concept of a mathematical object need not have us humans and our five senses. Nor does it need exactly three dimensions of space or one dimension in time - they can even be timeless and spaceless, when abstracted, like value objects. All issues of psychology can (and must) be removed - a more abstract 'observer' can be utilized instead of a human observer. A more abstract 'actor' can be used in place of physical forces. To observe an object necessarily requires that one have somewhere to look, which necessitates the existence of some sort of 'object space'. (By definition, our senses observe the 'physical plane' space.) 'Objects' in an object space can be observed... potentially over "time" if the mathematical abstraction includes a time dimension (or more than one). Mathematical objects in this mathematical object-space are as "real" to the abstract, mathematical "observer" of this space as physical objects are to us. Like the objects we experience, mathematical objects are not properties... nor are they patterns of objects with emergent properties (which would associate more closely with "materials" or "masses"). Indeed, individual mathematical objects necessarily must be uniquely identifiable by the observer... or they would not be "individual" objects. A concept used by an observer for identification of an object can reasonably be called an object identity.

Object oriented programming reverses the abstraction; rather than merely observing objects and manipulating them, one builds complex systems by creating objects and designing how they interact. In this abstraction reversal, one will necessarily implement both at least one object space, at least one object identity for every unique object, and a property representation. These things are minimal and necessary to the concept of object. One would generally add "methods" if an action on one object should propagate to another... because enacting complex object interactions isn't the actor's job. However, methods are a domain thing... if objects in a domain don't interact, then methods aren't necessary. These things will be true in any object-oriented system regardless of other features commonly associated with OOP (like support for classification of objects, polymorphism, prototyped objects, abstract objects, virtualization, encapsulation, dynamic dispatch, etc.). These other features are important to making object oriented design easier in complex domains, but are hardly essential.

I am not clear on why auto-number primary keys for database rows does not satisify your requirements for "object identity". They are a lot like RAM addresses in utility: a "dumb" unique number not tied to domain attributes. They just happen to be more visable and more permanent than RAM addresses. Would you be happier if they were not visable? The visability is for practical reference, so that one can say pick up the phone and say which record has a problem. But this nice feature can in theory be tossed to satisfy your definition if that is the stumbling block. -- bh

Either I've misstated or you've misunderstood. Candidate keys of any sort do, indeed, satisfy the requirements of object identity for the tuples in which they are found, and can also be used to further address individual cells within those relational tuples by name. Primary keys and auto-number keys are types of candidate keys, so they also qualify. However, it's quite possible that the object you're considering does not include this identity; that is, it is associated from the outside. This is the case with object addresses, filenames, etc. The location of the object is not necessarily "part of" the object; it's "part of" the world. You just happen to use it to identify the object. The same is true with filenames and files.

I never thought the term "rename" could get so contraversial. I guess CommonSenseIsAnIllusion. Can we go back to DefinitionOfLife? Can life be renamed? :-) -- bh

Yeah. I'd think a definition you can get from a dictionary, and that people use every day, would be the obvious one. Then comes equivocation. "Rename" can only mean one thing for an argument.

I don't see any conflict with implementation choices. To me "rename" is a description of what you want, not how to get it. This topic seems to be growing a bit testy. I suggest we take a break for a few months and ponder it a while. -- bh

I've put together a little primer "false" conversation between the two of us. Please read it and comment; I'm hoping a different approach might better inform you of how I think, and why we're having troubles communicating. Most of your issues are addressed by your double, but there might still be some confusion. Let me know, and I'll insert it to this little 'dialogue'.

BH: People use "rename" on files all the time. GH: Yes, the certainly do. So, what "thing" are they renaming? BH: The file. ... GH: Indeed, they rename the file. So, how do people identify files? BH: By name, of course. GH: Oh, but do they? Certainly I can tell you to fetch the file "/my/foo". Tell me, what does "rename" mean. BH: It's a results contract. If I rename "/my/foo" to "/your/foo", supposing "/your/foo" doesn't exist and all permissions are in order, then "/my/foo" will no longer exist and "/your/foo" will exist, and "/your/foo" will have the same content that "/my/foo" once had. GH: So when people talk about 'rename', they're merely talking about the end result? What if they were renaming a cats? If I copied the cat, gave it a different name, then disintegrated the original, would that be a renaming? BH: That's different. I'm talking about renaming files. For files, you could implement by copy+delete... maybe with some thread locks so nobody notices the interim state. Or you could re-link the file if the filesystem supports linking. I'd hope, at least, that renaming the cat would be more like the latter... GH: I asked what "rename" means in general, not for files, and I didn't ask for implementation details... though I don't disagree with those you offered. Please humor me: what would you call copying and disintegrating a cat to rename it? BH: ... cruelty to animals. GH: Heh... fair enough. But are you sure you wouldn't call it renaming? After all, it meets all the requirements of your results contract. If you insist, I can automate it and do it in a Schroedinger's box so nobody notices the interim state where two cats exist. BH: I feel like this is a trick question. I've already said that "rename" is different for files and cats. GH: And why do you say that? Because you don't want me renaming cats with a disintegrator gun? Can you explain or elaborate as to why "rename" is different? BH: ... No. It just is. GH: I look in the dictionary (WordNet), and here's what it says: rename: verb 1. assign a new name to; "Many streets in the former East Germany were renamed in 1990" 2. name again or anew; "He was renamed Minister of the Interior" ... I see nothing about a results contract. To me, rename looks a lot like you have one object that has moved from one name to another. E.g. if you rename a street, it is the "same" street... except for the name. This works for cats, too, and it's a lot more intuitive than "a results contract". BH: Yeah, it works for cats and streets... but it won't work for files. Files are different. They can't use that definition. GH: Why are files different? BH: You're asking me that? You answered the question yourself, what, a half-dozen times already. Besides, you're playing both parts in this "conversation", so these are your words anyway. Basically, files are different because they're identified by name. You can identify streets by location and heading... so if you rename a street, everyone with a memory can look at it and say, "Hey! This street's been renamed!" They can't "remember" the new name, but they can remember the original location and heading, which allows them to make this conclusion. Similarly, cats can be identified by personality, appearance, behavior, general location, or even DNA. So, if someone who has memory of the cat's old name then looks at the cat's new collar, they can conclude, "Hey! This cat has been renamed!" GH: Memory, eh? What were all your earlier complaints about psychology for? BH: Hey, smartass, these are your words, not mine. Get it straight, and stop trying to confuse the readers. GH: Hey, no need to bite. Let's tone down the insults... BH: *glare* GH: So, if I disintegrated a street, then rebuilt it at the same location and heading, would it still be the same street? If I didn't rename it, would it have the same name? What if I disintegrated a cat then rebuilt it atom by atom with nanolithography, so it had the same DNA, same behavior, strutted the same streets, etc? BH: I don't like this philosophy stuff, and I'd rather not think about how ObjectIdentity would handle under StarTrek technology. Take it to FredsAxe. GH: *pout* Okay. So, you were about to explain why files aren't like cats and streets? BH: No. You were about to explain. I'm just your mouthpiece for this bit. I'd rather you not put words in my mouth, but I'll do it so long as it helps the dialogue. Anyhow, files are identified only by name. That only part is really important. They're like the primary key in RDBMS or the RAM address in C++ objects. GH: Oh? Why? Why not identify files by content, by authorship, by update history, by permissions, etc? How is content of a file different, fundamentally, from the appearance of a cat? ... well, other than the obvious. The state of a cat's appearance and the state of a file at a given time are both values. Files are digital, and composed of bits... but cats are furry, so I could have each hair in some position to represent data. I could read with a digital camera and write with a comb. Now, I'd need to take care to preserve this appearance lest it fall into disarray, so I'd need lots of hair gel, but (... ramble snipped ...) BH: Files can be copied, which means content of two files could be exactly equal. Also, files are logical objects. They're above the physical layer. GH: ... identical twin cats? or perfect copies? (heh. copycat...) Suppose I invent a machine to copy cats. Oh, and technically the appearance of a cat is also a logical object, as is the current state that appearance represents, and the value that current state represents. I just figured I'd need to discuss the physical layer or you wouldn't believe me. BH: A cat's appearance does not normally represent anything. A file's state usually does. GH: And your point is? I know my point: a file's state and a cat's appearance can both change without changing other identifers, like name. If you consider it reasonable for agents, like humans, to identify cats by their appearance, then you must consider it equally reasonable to identify files by their content. BH: Argh! You're rather frustrating to talk to. You know that? GH: I know. BH: Okay, I guess you can use content to identify files... that's what Google does. And those other things would be nice, too. But I am talking about the most basic of filesystems, like the 1980s micros, maybe with folders. It can only go from Name to File. Anything above that you need to add on yourself. If you like abstraction so much, call it an abstract filesystem. GH: ... Then I'd need to add abstract observers and actors, who start collecting information, reading, writing, indexing content, etc. You know, like humans and google. BH: They aren't really part of the filesystem. GH: I dunno. There isn't much point to a filesystem without any users. In at least one sense, they're definitely part of the system. BH: ... okay. I'll grant that; I'm sure there's at least one sense in which it works. But if you keep distracting me, you'll never get your answer as to why files are different. GH: I don't actually believe that files are importantly different, but you may continue. BH: (:grumble: ... I really hope some deity smacks you around a bit ... :grumble:) Fine. They're your words. You say them. I'm done. GH: Certainly. Files are usually identified by name. They can often be identified by other things, such as content or creation time, but one cannot count on such things to always uniquely identify a file. One can always count on filename to identify a file within a filesystem. For the sake of BH's sanity, consider a system in which files may only be uniquely identified by filename... where any identifier not including filename will either identify two or more files or zero files. In this case, rename is impossible. Why? Because it is impossible for any observer to have memories associated with a particular file by anything but its old name. So, if you supposedly "rename" an object, no observer can ever look at that object and say, "Hey! This file has been renamed!" Instead, as far as the observer can tell, the object associated with the old name is gone, vanished, disappeared; it simply doesn't exist anymore. If the filesystem is small enough, and the observer is looking for it, the observer might recognize that a new file is in the filesystem and that this file has the same content as the one that disappeared. It might even be able to tell they're at the same time if the observer has a clock. But there'd never be a reason for this observer to remember this event as a "rename". It'd just record that one file is gone and another exists... a creation + deletion. BH: I keep telling you: renaming is possible. It's just different. Just becuase you can't see the "rename" doesn't mean it didn't happen. GH: That's quite some logic, there. It's true, but it's also of the exact same sort as: "Just because you you don't see the Giant Spaghetti Monster controlling you like a puppet doesn't mean he isn't out there, doing so. or Just because evidence of Dinosaurs exists doesn't mean God didn't just plant them there for humans to dig up. So, if you can prove that you'll never, ever observe'' a rename, then can a rename really "happen"? BH: You can observe a rename. Look at the filesystem. A new file has the same content as the old file, at about the same time... or even simultaneously. GH: Oh, so you're identifying files by content and time now? No, no, no... that's against rules I added for your own sanity, BH. You may only identify files by name. If you try to identify them by content and creation/deletion time, you'll get at least two files back. You might get a million. How would you tell which file was the copy, then?

Interjection #1. I never said that one cannot identify stuff by content. It is just the that file system's contract does not, at least not that can be observed by the user. If you want to create your own "side" identity system outside of the file system based on content, such as brain memories of content, that is fine. It is like the gov't defining cats by an issued collar tag. The gov't might not care if tags get switched between cats. Their *orders* are to use the tag to identify cats. If grandma uses eye color, that is fine. But the gov't doesn't give a flip. The file system has comparable marching orders, grandma or not. Grandma's personal identity system may indeed conflict with the government's. The file system's identify abstraction may not be what grandma wanted; but file systems are dumb automatons that follow rules. She has to work with the file system on its own terms (or hack into the inner workings). Its own terms is map identity using file name. Grandma can argue with the bureaucats (pun) all she wants about cat eye color, but they'll probably just show her the door because they are uncaring rule-followers, just like the file system. If you want to view it in terms of psychology, then we can look at the psychology of a bureaucracy. -- bh

BH: ... Now I'm really starting to feel insane. Okay, so you're essentially telling me that attempting to identify a file by content and creation time is to attempt a different ''object identity''? GH: Yep. That's exactly what you just tried. We could go on a lot longer, but every single time you'll bump into exactly that sort of wall. It is logically impossible to observe a rename if you can only identify objects by name. If you did, it'd be a paradox... one of those "Hey! I just proved one equals two!" situations. BH: I might challenge you on that later, but for now I'll believe you. I'll focus on another point: Just because I cannot observe a rename doesn't mean I cannot perform a rename. GH: Oh? Please elaborate. BH: Well, rename is essentially a results contract. When I decide I want a file renamed, what it means is that I want one file gone and for another to exist with the same content and a different name of my choice. That might be implemented by a "copy-and-delete", but, to me at least, the file was "renamed". GH: Okay, this seems reasonable. However, I'm sure you're aware that you're running the risk of equivocating and ending up in a LaynesLaw debate. The definition I use for rename is very different from the one you're promoting.... and it's better. BH: No it isn't. GH: Yes it is. BH: No, it isn't. GH: Yes, it is... for the following reasons: (a) the definition I provided works for cats, streets, files, and mathematical objects. (b) the definition I provided for "rename" corresponds directly with the dictionary meaning and what people mean when they say "rename". (c) the definition I provided for "rename" allows both recognition that a rename event has been performed (an object that, to your memory, had one name, now has another) and provides a description of the required results for completing a rename (a new name has been assigned to an object). Of course, we must still define 'object', 'name', 'assign', and 'time' for this 'rename' event to be understood in depth. These get rather interesting. BH: cats, streets, files... We just discussed this. It doesn't work for files. You said it was impossible. GH: Sure it does. It just doesn't work for files when you constrain files to only being identified by name. That hardly seems a natural constraint to place on observers of the filesystem in the more general sense. BH: ... (burble, burble) ... GH: Sorry! I forgot about that sanity issue. It's alright, though; being insane isn't all that bad. Beware, though, that you might develop a disconcerting habit of talking to yourself. Well, it isn't all that disconcerting for you, and the conversations can get pretty good, and... I'm sure you know what I mean... BH: I'll get better with a few beers, I'm sure. I have a question, and I want to avoid another debate on definition. Using what I consider to be rename... in that filesystem where objects can only be identified by name... what are your thoughts. GH: Basically that you'd be delusional, maybe a little kookoo on the choochoo, not quite all there in the head if you know what I mean. BH: ... okay, who was telling whom to cut back on insults? And I demand you explain yourself. GH: Sorry. Basically the issue would be thus: You "rename" a file. You tell someone else you "renamed" that file. They go look at the object system and say... "so, where is this file you renamed? I can't find it." Then you say, "That's because you're looking for the file under the old name, you nimrod. Look for it under the new name." Then they say, "Okay, I found the file. It seems new; I don't remember it being here before. What changed?" Then you say, "the name changed." Then they say, "How do you know? To me, it looks like one file is gone and this new one exists." Then you say, "I just know." Then they say, "Oh? How do you know? Does god speak to you?". Then you say, "No. I renamed the file myself." Then they say, "Oh. That makes some sense. Can you show me proof that you did this?" Then you either try renaming another file, which will look to the other observer exactly like a delete + create in close proximity, or you can say "No". Either way, you're certifiable. BH: Did you just have a conversation with yourself within a conversation with yourself? Really, though, I thought we were using my definition. GH: I'll admit to a bit of facetiousness, there. ;o) However, even your definition doesn't allow people to observe or otherwise recognize a rename event from other coincidental create+delete events. It just allows you to say that you've done a rename if your actions have particular results. As such, it's really only half a definition. However, if you can show that the copy+delete is the result of your efforts, then you could show that you've performed a 'rename' by your definition. BH: I don't think it's less correct to define "rename" in terms of its contract than it is to define rename in terms of its recognition. GH: The definition I provided allows for recognition and a contract. I think that makes it more correct. However, that's just expert opinion from a designer of language that supports distributed objects and services... what could I know? BH: But 'recognition' is getting back into all that psychology-based-definition stuff. So is your insistence on observers, actors, etc. Psychology is not cut-and-dry like math and logic; you shouldn't be using it here. GH: I study computation theory, which is pure math and logic. Data, knowledge, memories and such can all be studied from a computation theory perspective. That is what looks, to you, like psychology. But it isn't... at least not in the 'soft science' sense. Go study the ActorsModel, the PiCalculus, and especially their behaviorally typed counterparts. Maybe study a little InformationTheory, Cryptology, and ModalLogic?. Then come back and tell me I'm talking psychology. The 'actors' and 'observers' I've been talking about are probably more abstract and more mathematical than you imagine. (I'll bet a cookie that you've been thinking actor == human user and observer == human watcher.) Humans can fulfill a role as actors and observers, but so may other things (e.g. programs, threads, intelligent agents, pigeons, hardware, etc.)

Interjection #2 - You seem to be saying that studying details of implemenation gives you an insight that says that "rename" should not happen on map keys where the key is a domain attribute also. Yet existing file system "contracts" do that very thing. For most users it makes perfect sense. The world does not end, black holes do not form, the time-line is not altered (or at least we cannot tell if it does) and massive file system corruption does not take place. Perhaps practice trumps theory or math purity. It is true that an internal key may be used in some cases to implement the file-system contract, but the contract does not require it and the user probably can never tell. See also "name bother" section below. -- BH
- You risk equivocation. Unless you admit that deleting a cat and creating another qualifies as an object rename, you're in error to state that the behavior associated with the "rename" operation in a filesystem qualifies as a general object rename. Instead, you have rename-file and rename-object as entirely different things. In that sense, you're failing to treate the filesystem as an object-system because you're allowing the fact that the objects just happen to be files to corrupt your definitions and logic. As an object-system, file objects have properties and are addressed (located) by name. If you only allow that they be identified by name, then there is no such thing as rename-object... it's a logically impossible task.

Far as I'm concerned, it looks fine. Quite entertaining, really. Funny thing is, I can see - and appreciate - both your points of view. Now, I humbly suggest that you both take a step back, endeavour to transcend merely understanding each others' points of view just enough to launch a counter-attack, and take a meta-view: Look for the motivations behind the viewpoints themselves. Explore the belief systems, experiences, and backgrounds that (inevitably?) lead to these points of view. You might even wish to ask each other... Questions.

If undertaken in earnest, I suspect you'll both find ways to transcend the quibbling and express your arguments in a compelling and convincing manner. One of the keys to this is to recognise and even embrace the views of your opponent, in order to address the inconsistencies in your opponent's arguments in his or her terms. -- DaveVoorhis

Who snipped out the government tag inspectors versus granny's cat-eyes identity analogy? And why?

Shark ate it, by reverting to the last legitimate edit prior to it, but I'm not sure why. I've put it back in. Time for some bug-hunting, methinks. DeleteWhenCooked -- DaveVoorhis

Thanks for putting it back. I added a second one, but this time I am keeping a copy in case editing foobars it again. -bh

Is it the name that bothers you?

If you didn't include it, "rename" would be a feature one would eventually ask for after frequent file copy-and-deletes become tiring. Call such operation "Zibtroob" if you want, but I think most users would rather call it "rename". Just think of it as "Zibtroob" and the name won't bother you anymore, correct?

If a customer asks to group a bunch of commands together under a single name to save time, then I doubt you would complain, correct?

Suppose a user gets tired of copying and deleting over and over, and decides to make a command file (like a DOS .BAT) to do it for him/her. Suppose the user calls it "cmd2" for whatever reason. If you ask the user what it is, they may say, "Oh, I just got tired of copying and deleting over and over and decided to consolidate it under one command to save time and reduce typing errors." I assume you won't have any problem whatsoever with this. Correct? It is just automation 101: factor frequently-used sequences into a single command/object/idiom.

But if they call such a thing "rename", it *then* seems to bother you out of some sense of mathematical or logical purity (which I still don't agree with). Thus, it is the name that is the issue, not the existence of such a feature, correct?

If the name does not lead to confusion in the user(s), then there is no practical downside to using the name "rename" that I can see. Unless, you are worried that it will corrupt the user's knowledge of identity issues and cause them conceptual problems down the road. But unless they run the Star Trek Time Transportation and Dialation System in another job, such is unlikely to be a problem. To be frank, the word "anal" comes to mind. It is similar to calling a tomato a "vegetable" in a salad recipe. Technically tomatoes are a "fruit" I hear, but mentally people like to group them with veggies due to the taste and typical usage. If it was say a PhD thesis on biology and evolution of fruits, then and only then could it be a potential problem.

-BlackHat

It is equivocation that bothers me. Rename has a certain meaning for objects in general. You must use exactly one definition for every purpose within any mathematical or logical argument or you lose the ability to perform proofs.

We've been discussing object-identity in a relational database, and made analogy to object-identity within a filesystem. Object-identity in particular is interesting in that such things as cats also' qualify as objects having some identity... thus your definition of "rename", whatever it is, must work for cats as well as file-objects, and even for tuple-objects. The latter is handled vacuously because tuples have no names and thus cannot be renamed, but the other two (file-objects and cats) must share one definition of "rename" within the discussion.

You are using "rename" as a contract. If so, then for this full discussion, "rename" must be a contract. Cats included. Otherwise LaynesLaw is going to interrupt any meaningful discussion. If you must use "rename", call mine "rename-object". It still stands that "rename-object" is logically impossible in that filesystem... even for file-objects.

Even so, your 'rename-file' or 'Zibtroob' is only half of a full definition. It's a contract, and contracts are not (generally) definitions. It fails to allow you to identify a renaming event. Any good definition will conceptually allow for both identification and creation, though one of those steps may require omnipotence or omniscience within the system. (See KnowLedge. Words, themselves, arise as abstractions to allow semantic compression in discussion of higher-level concepts.) E.g. by knowing the definition for parallel lines, you can create and identify parallel lines in many 2-or-higher dimensional surfaces. By knowing the definition for a triangle, you can create and identify triangles in a mathematical universe. By knowing a definition for 'cat', one can identify cats and produce cats... and, if omnipotent, you could create cats. It works for verbs, too; by knowing the definition of the phrase "bicycle-racing", you can both identify the bicycle-racing action and produce a bicycle-race event that follows the definition. However, if all you know is the results-contract for 'rename-file', you could not recognize a rename... even if you were omniscient.

Oh, and tomatoes are legally vegetables (due to lobbying) and scientifically fruit (since 'fruit' is defined in terms of the flesh and seeds). Earlier definitions influencing the ruling in U.S. Supreme Court in 1893 were use-based; 'fruit' was something you ate for dessert or breakfast. I'm not particularly concerned which definition you use here because both are proper definitions of 'fruit'... but I'd blast you for saying that a tomato isn't 'fruit-scientific' because it isn't 'fruit-legal'. Similarly, I'll blast you for saying 'rename(-object)' is logically possible on a filesystem because people do 'rename(-file)' all the time. The two are different definitions and require different evaluation. In the paradigm where files are objects witin a filesystem object-space, 'rename-object' will continue to be the only proper definition. Exceptions cannot be made because you're used to 'files' having a different 'rename'.

(Rename has certain meaning...) By whose command? Are you using "object" in the OOP sense or "real world things" sense?
- (By whose command?) -- By the dictionary. By common use of the word. By its English definition, which is the one that counts since we're speaking English. If your definition is not wholly compatible with the English definition, then I feel the need to reject it. Call it Zibtroob.
- BH: Dictionaries are colloquial tools, not mathematical tools. If you wish to dissect a definition and provide a more precise restatement of said definition that contradicts my assessment, be my guest. (Keep in mind that if there are alternate valid interpretations which contradict your interpretation, I just may point them out.)
- I have done so -- provided a precise definition for rename that is mathematical in nature. It happens to correspond very directly with the English definition. Dictionaries are language tools, and language is quite necessary to both math and logic.
- BH: Wiktionary.org: Rename: "To give a new name to". This does not conflict with file system usage. Your definition assumes a specific implementation. The wiktionary definition is not specific enough to solve technical disputes. "Give" is a human concept, not a computer concept. I don't have a Give-o-meter handy to see if giving took place unless I create a definition within the context of a specific model. Then we are back to whether that model is the "real" model of God.
- I discuss definition issues below. You've failed to create a technical definition that corresponds very directly to the English definition. There are many places of conflict with your file-system usage (with identity only by name).
- BH: I see no blatant contradiction. It passes one test in that it does not result in a problematic alternative that could confuse. I am not claiming it the perfect name, only the best fit. If you weigh the trade-offs, then rename is the best name, flaws or not. You do not have to be perfectly fast to win a race, only faster than the alternatives. You seem to want perfection or nothing. "rename-file" is not perfection either because it scores low on length and context redundancy.
- (Are you using "object" in the OOP sense or "real world things" sense?) -- Both, plus mathematical objects. I've never once seen a sound deductive proof that the two are different... e.g. a proof that real world objects are not logical objects, and vice versa. Until you can prove they are different, asking which I'm talking about simply is not relevant because you won't be able to declare that something that holds true for one does not hold true for the other. Unfortunately for you, I've seen some very sound deductive proofs that it is impossible to deductively prove they are different, so I don't expect to see any progress. I've even seen very cogent (inductive) arguments that they're the same -- a cloud, a word in a paragraph, a cat, a human, a particular character in a string, a node in a list, a whole list... all are 'objects' by any reasonable definition of the term. It is, thus, only important to track the properties of objects and object systems. If you've proven a truth based upon a set of properties that hold true in one system, that is sufficient to prove the same truth holds in any system with those properties. Physical objects happen to have physical properties... even if we are imperfect in identifying them all. For OOP, we can definitely prove that we are outside of those systems looking in, and we can know every property. For the real world our perception apparently exists within the system, and we can only guess at the properties. But while that's a significant difference on the observer, it does not constitute a provable difference between the objects. Instead, just focus on properties; if the properties I'm utilizing for the filesystem or whatnot hold in the "real world" sense, then the proof applies to both.
- BH: This is more or less a statement of TuringEquivalency.
- Eh? I don't believe it is. How does a discussion of behavioral properties of computational models apply directly to properties of objects in general? Or to a question of 'real world sense' vs. 'OOP sense'. I don't see the connection you're making.
- BH: We can only compare models of reality, not actual reality to OOP. Any TuringComplete language can probably model a model of reality. But, phycisists don't have a certain answer for what matter really is, so any model of reality is a guess at this point. AllAbstractionsLie. You can show that OOP can emulate model-of-matter-N for all values of N (at least T.C. N's), but so can assembler language. You are stuck. See below about reality and identity.
- I understand you better now, and you're accurate to a point. It is true that human brains are not beyond the computational capability associated with a turing-complete language. Neither is any language that may be digitally expressed. And physicists certainly can't answer what matter is; heck, they can't even answer with 100% certainty whether matter 'is'. It is deductively provable that it is impossible to disprove solipsism with a deductive argument. However, there is a flaw in your argument. I've not posited that physics and our model of the known universe has anything to do with the universe. If there is any true 'reality', that's all that is needed to make the argument that they are the same... whether we know this truth or not. (Interestingly, even if there is not any true reality, then the same argument can still be made wrgt. OOPL objects because they are instantiated in the 'physical' world and thus are neither more nor less 'real' than 'reality'.)
- BH: I don't dispute that OO can model any physical model or characteristic we can state about real-world things. What I do dispute is that it is the *only* way to model them. Thus, I am not forced to model it as OOP if you agree that OOP is not the only way to model (an interpretation of) reality. Equivalency is not sufficient. It requires a monopoly to force me to define files in terms of objects. TuringEquivalency denies you a monopoly.
The goal is not to produce proofs on file systems, but make them behave how users expect them to behave. Generally if something violates important theory, it produces real-world side-effects and problems. You have not identified such an external problem.
- I think you lost track of your original goal. This discussion has nothing to do with making filesystems behave and everything to do with proofs on file systems as relevant to object identity. You initially decided to challenge the filesystem as being fundamentally different from objects within a typical OOPL... a challenge that requires proof, not making anything. The behavior of filesystems can be discussed, but making them behave (i.e. implementing or repairing them) is entirely irrelevant.
- (Generally if something violates important theory, it produces real-world side-effects and problems.) Where did you get this crazy idea? No. If it is possible to violate a theory, then that theory is wrong, wrong, wrong - fundamentally so. However, you have not proven any violation; you've proved that under a different definition of "rename", users of a system can say that they've "renamed" things. I.e. you've equivocated and called it a violation.
- BH: I am not providing a definition of "rename". It is a characteristic (operation) of a file system interface. It does what it does. However, its usage does not contradict an English dictionary (above).
- One definition contradicts another at those points where one definition rejects and another accepts. In this case, disintegrating and copying of cats would be such a place.
- By using a different definition of rename, which you certainly have provided ('operation of file system', 'results contract', etc.) you enter a different discussion. For the moment, let's both use rename-file and rename-object.
- BH: You are pretending dictionaries are for technical usage. They are not.
- You are pretending the ability to ignore language for purposes of discussion. You lack it. If the definition found in a dictionary can be directly converted to a mathematical (technical) definition, then this ought to be done. The further removed is yours from the English definition, the worse it is. Defining "rename" as "a sphere" is far off and horrible; defining it as a "results contract where ..." is not awful but still rather distant from the English meaning; defining it as "the transition of an object from having one name to having another" is much closer. To complete it you'd need definitions for 'transition', 'object', and 'having a name', but those can also be defined technically in ways extremely close to their usage.
(... your definition ...) It is not "my" definition. It is what file systems do out of tradition. As far as working for cats, see interjection #1.
- It is also not the English definition of "rename". And the above interjection doesn't answer the question. Changing tags from cats identified by body is not the best analagy to your definition of "rename-file", which allows cats to be copied and disintegrated as a valid process of "renaming" them.
- BH: I don't see how that difference is relevant. The nature of the Interjection #1 analogy would still be the same if perfect instant cat cloning was invented.
- ... Not much for logic, eh? I did not say the nature of your interjection had changed. I said it isn't relevant.
- BH: You have not explained why it is irrelevant.
- I grow tired of explaining things to you. You rarely return the favor, and rarely understand the explanations. You are free to ask explanation. I'll treat this as a question. Humans that talk about renaming (rename-object) a cat use various other object-identities to know that it is the same cat from beginning to end of the transition. This other identity could be based on location, atomic structure, associated memories, etc. However, this identity will fail to hold in the event that the cat is copied and the original disintegrated. No human I know, at least, would call it the same cat. The new cat doesn't have the same origins, the same history, the same location, the same set of atoms, etc. Humans would call it a copy. In the tag-based renaming example, humans would also use those other identifiers to identify that it is the same cat from beginning to end of transition. Thus, that example does not apply as a sufficient analogy to the case of copy+disintegrate.
- BH: A gov't agency can define "cat" any way it pleases. In reality if actual identity is very important, one often looks at multiple attributes to ascertain identity and there is no 100% that it will be right. It is an art, not a science.
- When dealing with inductive approaches to object-identity, you never have 100%. That doesn't make it an art, and that doesn't mean all approaches to object-identity must be inductive. (Indeed, all of science is inductive. Everything empirical is inductive.)
- BH: Example: looking at scratch marks on vintage museum artifacts. X-raying paintings to verify their authenticity. You are pretending the world is well-known. I used to work at a place where automobile identification was a big issue. Because VIN and license numbers can be switched or mistyped, there were often problems identifying cars. The accuracy was a function of time and effort. In the real world, there is no true identity, only probabilistic statements. Because your cells change every second, you are not the same person you were yesterday. In computers we use discrete ID's because it simplifies our model, not because it better reflects reality. RealityHasNoCertainIdentity.
- There are plenty of deductive identities for the real world. Using them just requires a different approach to viewing the world than is typical among humans. E.g. if I talk about "the bookshelf on the west wall of my bedroom", and I'm constrained to one bookshelf per wall, then that's a true identity that will work for all time. However, if I move the bookshelf currently associated with that expression to the east wall, then "the bookshelf on the west wall of my bedroom" no longer exists. Similar expressions can work for showing that I am, indeed, "the same person" I was yesterday; after all, I don't identify self by cellular structure.
- BH: And similarly, the government can take the "approach of viewing" cat identity in terms of collar tags. Your bookshelf test is not perfect, I would note. Anybody who has watched Trek knows has a visual of what transporter collision accidents can do to bookshelfs. And don't forget the invention of the Reese's Peanutbutter Cup. See, I can be anal also.
One is not necessarily looking for a consistent definition of "rename" for all purposes. Back to the tomato fruit/vegie argument. "Rename" is to make the user happy, not mathematicians. Again, we could call it "copyAndDelete" or "Zibtroob", but users wouldn't want that.
- I disagree. Equivocation is a major fallacy. You absolutely need one consistent definition of "rename" for all purposes within this discussion. This discussion happens to be on objects in general, which happens to include cats and files. Thus, you necessarily require a consistent definition of "rename" for objects in general... including cats and files. That isn't for all purposes... just for objects.
- BH: Objects is an implementation detail. And I am not looking for name overloading consistency, as described nearby.
- And regardless, "Rename" (Filesystem Operation) does not Rename (English word) in a system where objects are only identified by name. I agree that users want Rename. They're renaming work, labor, etc. I agree that "rename" is a useful filesystem operation. However, the moment you start trying to argue that rename has occurred because "rename" has occurred, you're being ridiculous... in the eyes of any mathematician.
- BH: Fine, it is really Zibtroob. We just like to call it "rename" for the hell of it because nobody found a better name that we liked.
- That's the childish approach to dealing with equivocation. Call it "rename-file" and call mine "rename-object" if you wish to have a more sane approach to dealing with a definition issue.
- BH: Perhaps. Sometimes "childish" is more functional in society than "anal". "Rename file" is a bit redundant in the context of a file system, thus it is shortened to "rename".
- And "rename-object" is quite redundant in the context of an object-system, and thus is shortened to "rename". Thus, when discussing filesystems as a form of object-system, we reach a bit of a quandry. I'm surprised you didn't see both sides of that little argument. Stick with "rename-file".
- BH: "As" an object-system? I only said it was a file system. I didn't classify it beyond that. The users don't want nor need "rename-file". It is unnecessary for their purposes.
(Rename-object is impossible...) How so? Can you prove it logically impossible?
- Yes. I have done so several times already. To understand, you need to substitute the definition I provided for rename in place for the one you keep trying to use. I'll give the quick version again: Conclusion: 'Rename-object' is impossible in a system where objects can only be identified by name. Reason: The definition of rename-object is when an object transits from having one name to having another. If you were to identify a rename-object that occurred, it would require that you identify an object that transitted from having one name to having another. However, you can identify objects only by name, not by history. Thus, it is provably impossible to identify an object that has supposedly renamed. Because such an identification cannot occur, it is impossible for anyone to ever reasonably apply the word rename-object to any object in the system, ever, under any conditions. Thus, 'rename-object' is impossible (under the given condition); Q.E.D.
- BH: As stated below, we can use other attributes (content) for testing and verification.
- Sure. You can even use them for identification. You just need to be in a system where files can be identified by more than name. The filesystem mentioned here doesn't qualify. Since you cannot identify files by more than name, you also cannot use identification by content and time and such for testing and verification; it's literally impossible to do. E.g. if you looked at all the files with the same content that were deleted and created at the same time, you'd find that at least two were created and at least two were deleted with the same content at the same time. It's, thus, impossible to verify exactly what effect your supposed 'rename' actually had, or what occurred.
- BH: Please clarify. You seem to be assuming an internal techie implementation perspective.
- In a system where content can be used for identification and verification, I agree that the above proof does not qualify; rename-object is possible in such a system.
- {In practice, file systems and most OOP systems have an internal physical address that can serve as temporary identity during the rename process. Without this, I agree that under a strict interpretation of "rename", "rename" would not be possible, and would merely be a UsefulLie to cover up copy-and-delete. The real world has more or less the same issue. The physical location serves as the temporary identity while we change the cat's name-tag. Otherwise, we couldn't tell the difference between that and disintegrating a cat to be replaced by an exact clone with a different name-tag. If there are lots of other cats around, then we put it in a cage during the change to avoid mistakes. The cage serves as a temporary ID. - t}
(Contracts are not definitions. ... only half a definition.) I don't think there is a clear-cut difference. They can be interchangeable.
- You're wrong.
- BH: Do you want a definition for file-system rename or rename in general? I am not considering the general definition, per above. The "rename" operation for file systems "feels right" and users like it and black holes do not form when we do it (at least not observable ones). It is a label we like. It makes us feel happier. I have no math to show why it makes us feel happier. We did it to make the customer happy (and I am one of them). It is a social choice, not a mathematical one. Unless you can show a big real-world problem it causes, nobody is going to want to change.
- I'm asking you to not equivocate. If you can't give up being "happy" to avoid logical fallacy, then you should not involve yourself in a logical argument. Use "rename" all you want for file-systems, but do not pretend that it's interchangeable with "rename" for objects in general unless the definitions accept and reject the exact same set of events.
- BH: File system makers didn't give a flip about object theory. They made a file system and put in practical operations. If it gives a mathematician somewhere a headache, few care.
- And why do you care what file system makers think? I recall this discussion has to do with intrinsic object identity. It is a discussion as-a-mathematician, involving proofs and deductive argument. You're wearing the wrong hat to the party... and I'm not talking about the 'black' hat; being skeptical is okay. I'm talking about the "I don't care about logic or truth because I'm an engineer" hat. Put it away.
- BH: Logic and truth does not clearly eliminate the usage of "rename". It may create open logic questions, but it does not outright eliminate it. Nor has a major practical downside been identified. A longer name may indeed reduce the number of open questions. That I won't necessarily dispute right now. But, user-friendly trumps professor-friendly in this case.
Please clarify. A renaming event can be identified by a file (name) disappearing and a new one appearing with the same content. True, "rename" is not the only way to achieve such, but I don't see why that is a problem. Rename is to get something done, not produce a perfect trail.
- (A renaming event can be identified by a file (name) disappearing and a new one appearing with the same content.) -- Not, BlackHat, in a system defined as allowing object identification only by name. If you are able to identify files by things like content and timing, then rename-object applies perfectly well without messing with any results contracts.
- BH: We can use other attributes for testing and verification. For example, in a single-threaded file-system (or one area with other users not given access), if there were N files with content of X before the "rename" operation, then after the "rename" operation there will still be N files with content of X. We can set up test rules like this to verify it complies.
- ... (Sigh) ... Once again with the implementation details, and this time with an ineffective test! And, no, this sort of verification will not work.
- BH: It won't work because you say it won't work?
- No, it won't work because the proposed verification won't verify the results of a rename-file action. It would tell you whether something is horrendously broken, but it won't tell you whether the job was actually done.
- BH: Okay, in addition to the N with X test, the total number of files stays the same, the original name no longer exists in the set of names after the process, and the target name exists in the set of names.
- (Rename is to get something done, not produce a perfect trail.) -- If you don't care about object-identity, then there's no concern as to whether certain objects are vanishing while others appear.
- BH: As long as the vanishing and spontaneous appearance does not violate the file-system contract, Ce La Vi.
We are not defining implementation such that internal guts or internal steps are not an issue. We are making users happy, not God. Things like POSIX are a contract for how things behave just like a business giving contractors requirements. As long as the contractors fit the requirements, the business does not care how they are carried out. There may be 100 different ways to implement a file "rename".
- We are not making users happy. We are discussing object-identity in filesystems and proofs regarding file-objects within those filesystems as they relate to table rows and typical OOPL objects. Or at least I am. If you wish to discuss service-contracts or implementation-details in filesystems, this isn't the wiki-page for you. If you wish to discuss what a filesystem is and whether the contract typically associated with a filesystem's 'rename' action has anything at all to do with the English definition of the word, and how this has anything to do with object-identity, then perhaps you can argue here.
- BH: English dictionary issue addressed above.
- {If we are going with "Typical", then internal addresses/pointers are used to track identity while being renamed, as described earlier. If one has a separate unique identifier, such as RAM address, then a true "rename" can be done. OO and file systems "cheat" that way. They are taking advantage of the implementation guts to avoid copy-and-delete. - t}

Much, if not all, of this page is moot. In the RelationalModel, the tuples (aka rows) in a relation represent beliefs or propositions (see DatabaseIsRepresenterOfFacts) which are characterised by their truth, not objects that are characterised by state. In particular, there is no notion of identity that needs to be preserved across arbitrary changes of state. Identity, as represented by a key, is needed only to uniquely identify a particular fact so that it can be replaced or removed. There is no notion of "change of state" at a tuple level. There are only facts that may be added (INSERT), replaced with a new fact (UPDATE), or removed (DELETE).

While the RelationalModel does not express "change of state" at all (as is typical of mathematical algebras and calculi), a Relational Database certainly does. For a Relational Database, identity of the relation variables (RelVar) must be consistent across arbitrary manipulations - update, delete, insert. Because identity of the RelVar is consistent across operations, one may also speak of 'row' identity = <RelVar, any CandidateKey>. One may speak of the 'state' of any tuple so named (consisting of both 'existence' status and contents of values in non-key columns). As you note, a Relational Database makes direct use of this notion for updates and deletes, so you cannot even argue this notion to not exist in practice. QED. You are free to argue that such identity is certainly second class (similar to the identity of a particular attribute or method in an OO object) but to say the notion does not exist or is unused seems wrong.

Given the following RelVar

 VAR r REAL RELATION {k INTEGER, v INTEGER} KEY {k};

with the following value

 r := RELATION {
   TUPLE {k 1, v 1},
   TUPLE {k 2, v 2}
 };

issue the following update:

 UPDATE r WHERE k = 1 (k := 3, v := 3);

Does it seem appropriate to speak of the value of "the" tuple where k = 1 changing state when there is no longer a tuple with k = 1?

Note that to track such updates under, say, a PublishAndSubscribe model, you would either be forced to create artificial "tuple IDs" in violation of the RelationalModel, or issue two notifications, one representing deletion of a tuple where k = 1 and one representing insertion of a tuple where k = 3. Thus, there is no tuple-level change of state. There is only RelVar-level change of state.

You could also rename a RelVar or table with a DDL command, or even change the URI (e.g. IP and Port) used to access the Database with a higher level configuration option. This is not a problem for 'identity', but is an issue of identity being open rather than opaque. (See ConceptOrientedProgramming for a variation of OO focusing on open identity.) You might ask how PublishSubscribeModel deals with such changes in open identity - whether it insist its target is deleted as opposed to attempting to forward the observer's attention as to its new identity. Which semantics are available depends upon implementation.

{From an implementation standpoint, an RDB still faces the same issue as file names. For one, a copy-and-delete affects the performance characteristics. And there's the issue of existing concurrent users who may want a consistent snapshot of state as it was just before rename or "key" change. It may not be practical to always just make a copy for the snapshot, especially for big data items, but rather keep a pointer to the "original" if large but non-key info is being referenced. In practice, there seems to be some nice benefits in using the internal address name-space if it exists. It's a convenient "cheat" on "pure" identity, and has some roughly real-world counterparts, as the cat cage analogy above illustrates. - t}