When Do Schemas And Classes Deviate

When and why should class structures deviate from relational schemas? The controversy over hierarchies and subtypes is relatively well known, but outside of that, what are specific cases or general rules for why and how they differ? Let's also exclude or separate cases where the schema is poorly designed. For the sake of scope, let's consider "good" schemas and "good" class designs.

For some loose definition of "good" :) In practice, they do differ in some places.

Database and schemas differ from each other by their nature. Intuitively, of course many people will understand what you are what you're trying to say, but trying to formalize when we call them "the same" (or some other good term, maybe let's call them "in agreement": the class definitions and schema definitions are in agreement), may shed some useful insights on the subject.

1) We shall depart from the simplest case: a schema with one table versus a schema with one class. Let's further assume that the class represents some domain of interest for the business, the same as the table. Let's say a STUDENTS table versus STUDENTS class. Intuitively, we'll say they are in agreement when all fields of the class map one to one with the columns of the table. The mapping ideally conserves the names or applies a simple bijective transformation of the name, the same about the types. For example, studentID in the class becomes STUDENT_ID in the table, the type String in Java becomes VARCHAR2(20) Oracle or the type int in java becomes NUMBER (38,0) in Oracle.

Are we speaking theoretically or practically? There is no theoretical reason that an OO language and a relational database can't share the same set of (immutable) ValueObject types - the domains in the relational case. Just because SqlLanguage blows doesn't mean its warts should be representative of relational if we're talking theory. The original question asked "object" vs "relational" and not "Java" vs "SQL", so I presume this is a fair question.

Even if you speak theoretically, you won't be able to get to very different systems to operate under the same type theory. Just think ML and Haskell and SML and Scheme. Do they perfectly agree on the basic building blocks of data structures (excluding functional values, modules, etc., - just plain old data). Is it realistic to assume that at some point they will agree? Practical considerations are also theoretical considerations.

1.a) Already, we hit the first thing from ObjectRelationalPsychologicalMismatch: ObjectIdentity. Many OO programmers will object that if it wasn't for the damn relational databases asking for primary key, they will never consider introducing studentID as a field of the Student class. After all, the identity of object is self-contained and pervasive in OO only environements. It bears repeating though (the original discussion happened in RelationalHasNoObjectIdentity) that even from an OO perspective, having ValueIdentity? built into your classes representing business objects is always to be preferred. A theoretical discussion of object identity from an OO perspective can be found in FundamentalsOfObjectOrientedDatabases and the conclusion is that object systems should support the traditional (relational) view of identity.
1.b) Even if we agree on ObjectIdentity, there are already legitimate reasons for schemas to differ at this level. For example, the STUDENT table may have fields like STREET_ADDRESS, POSTAL_CODE, CITY, STATE, COUNTRY , and those can be grouped together in the Student class in java as an object say, Address. So we already have a reason to deviate. Class schemas support encapsulations and grouping of related information into a table, database schemas in the general case do not.
- I disagee. How a designer splits them or doesn't depends on many factors, and is perhaps a personal preference. What unversal rule of OOP says address info must be a separate class if there is no existing need for it? Related: ContactAndAddressModels.
1.c) 'the type String in Java becomes VARCHAR2(20) Oracle or the type int in java becomes NUMBER (38,0) in Oracle'

VARCHAR2(20) and String are different domains, one cannot hold the values of the other (and, depending on charset, vice-versa). The same applies to NUMBER/int. It is usually not possible to get a one-to-one mapping between the domains. This means, that there can be values in the db, which you cannot represent in the program (or vice-versa). A solution would be to a) Create custom type classes (say StudentName?, which ensures length) or b) ensure the conditions is the class. In the first case we already deviate by introducing a class, that has no direct correspondence in the db.

Unless I misunderstand... Date and Darwen seem to think that there's no issue with having aggregates - even tables - as domain values, which would be one relational way of doing this sort of composition.

This 1.b observation was more from a practical point of view as of 2004. From a theoretical point, you may still have 2 different type systems. So we go back to a maping between the types of the two different type systems.

I could swear we've been down this road before, many times, and they never seem to result in any illumination. So, I'm going to try and recap briefly and hopefully settle this before it becomes sprawling ThreadMess:

The FirstGreatBlunder. DomainsNotRecordsOrTablesAreObjects.
Classes organize code. Tables organize data. DataAndCodeAreNotTheSameThing. You can represent code as data, but outside of certain highly-repetitive domains, that code does not translate well to a scheme. Which brings us to:
Schemas are useful for highly homogenous data, when many records share the same structure. Classes are useful for highly heterogenous data, when comparatively few records share the same structure. In a typical OO design, the vast majority of classes have only a handful of instances. This is the impetus for PrototypeBasedProgramming. It's also why nobody has managed to design a TableOrientedProgramming language that's "tables all the way down", like SmallTalk is objects. {See near bottom of TableOrientedProgramming for a response.}
Schemas are declarative. Classes are imperative, generally.
Schemas do not support inheritance. Trying to introduce it essentially breaks the RelationalModel. See TheThirdManifesto.
Classes (specifically, mutable classes/objects) support constraints only with extreme difficulty. Trying to introduce them requires a check on every mutator method call.
- The last two points are generalized by the FourOutOfFiveRule.

-- JonathanTang

Shall we try to synthesize something helpful, though? I mean, we could repeat the whole damn enchillada from DbDebunk or the usual suspects on comp.database.theory.

''One example: not everybody in the "relational camp" agrees that FirstGreatBlunder is actually a blunder.

Another example, classes organize code + data (not only code), more so in languages like Java and Smalltalk where the only way to structure data is to create classes and instantiate objects. As such, the prompting statement of this page is justified: when the structure of data provided by classes in OO code be divergent from the structure of data defined in the relational schema? D&D's standard answer that one should map classes to database domains (doing otherwise would contradict the thesis put forth in FirstGreatBlunder) is unsatisfactory.

Yet another example: classes support constraint not with extreme difficulty but rather extremely easy. For example do not provide mutatirs provide only constructors, and you only have to check the invariant upon object constraction. Alternatively: do not provide individual mutators, provide only state transition methods that take the object from one valid state to another. In a language like Eiffel, you even have language support for invariants, in AOJ you can easily add it automatically, in Java or Smalltalk you can add it manually. At most you can say that majority of OO developers are not aware or do not practice these techniques.

And last but not least, your bold claim: "Schemas do not support inheritance. Trying to introduce it essentially breaks the RelationalModel.". I find it very hard to support it. It follows from where?

Generally relational "thinking" tends to believe there is no real value in schema inheritance. The philosophy is generally that sets make a better, more flexible classification system than hierarchies. (See TableInheritance)

[That does not address "essentially breaks the RelationalModel"; it only argues "no real value" and "better".]

Actually, it appears the relational model does not really address how schemas are created. Thus, whether they come about via inharitance or hamsters on a wheel is not really its concern. Now value inheritance may be a different issue.

[Well, ok, if that's your new position. But then you appear to be abandoning your previous position that "Trying to introduce [inheritance] essentially breaks the RelationalModel", yes? If so, go ahead and make the appropriate edit.]

The question presumes that classes should mirror schemas or vice versa. I've come to believe that this is a flawed assumption. It seems like relations ought to be classes, but in practice they often aren't. Outside the realm of CRUD screens, isomorphism between the two is just a happy accident. -- EricHodges

But is there a pattern to when they happen to match and when they don't?