Relational Has Limited Modeling Capability

I found this assertion in Kyle Brown and Bruce WhiteKnack? CrossingChasms.

They assert that modeling behavior, creation, inheritance is not easy compared to an object database.

Please help us refine this assertion here. I'd be grateful if KyleBrown himself would do this ...

Questions raised:

How is relational allegedly limited for modeling?
Is it relational in general or only existing implementations that are limited?
How does one objectively measure modeling ability?
Does modeling that reflects how humans think work better?
Does relational only model some aspects well?
Should relational strive to model all aspects well, or settle for a subset to maximize its strengths?

Why would we expect to do modeling using relational theory? We don't think of lambda calculus, turing machines, assembly code, etc. as modeling systems. From the perspective of an application-domain, they are al "low level" concepts (assembly is close to the metal, lambda calculus is close to the math -- but both are distant from the application).

A modeling system should build upon these low level concepts, but should not *be* these concepts. Reductionist scientists explain complex phenonmina by taking the high level and describing it in terms of low level things. In creating a modeling system we are attempting to set up a framework that permits this reduction, so that we can use powerful mathematical tools to reason about the model. But the model itself is an abstraction distinct from the low level concepts.

--DaveWhipp

Please tell us how relational is "low level".

[Relational theory (and most databases) provides almost no mechanism at all to represent various critical aspects of models; in particular, one cannot represent the axioms or the inference rules. Relational is "low level" compared to knowledge and belief representation, learning systems, predictions with confidence and probability and multiple futures, non-monotonic understandings of the world based on assumptions or defaults or abductive reasoning. It is correct to assert that RelationalHasLimitedModelingCapability, but, then, modeling anything other than the data itself is outside the domain of relational. That said, Relational makes a very nice primitive to aide in the construction of a world modeling system... only having real competition from even more flexible AssociativeMemory systems such as TupleSpaces.]

Are you saying in part that relational is too "static"? Existing products may be because most RDBMS are approximate clones of Oracle, the first successful RDBMS product. Efficiency, scalability, and data integrity were their main concern, and 80's technology limited the path to these goals. DynamicRelational, or something close to it, is certainly possible. -t

That's not what's being said at all. What's being said is that while implementations of the RelationalModel may form a suitable data representation foundation for sophisticated knowledge-based systems, they are not, in and of themselves, knowledge-based systems -- at least not of the type described above. By way of analogy, a hard drive may provide a suitable storage medium for an operating system, but that doesn't mean a hard drive is an operating system.

But nothing keeps "custom" relational operators from being built for the domain. As long as the new operators conform to "relational standards", they are still "relational". I don't know of any rule that says the set of relational operators is forever closed. Perhaps we should make a distinction between "classical" relational and new-wave relational, eh?

To the best of my knowledge, any deterministic operator that conforms to this functional notation could qualify:

 resultSetOfTuples = relationalOp(setOfTuples1, setOfTuples2, ... setOfTuplesN,
      domainExpression1, domainExpression2, ... domainExpressionN);

(Gee, looks kind of like SMEQL/TopsQueryLanguage. Whadda coincidence.)

-top

The above doesn't look like anything I've ever seen in any reference on the RelationalModel, though it appears you're suggesting the creation of n-adic relational operators. There's nothing wrong with that, but it would be an example of using the relational model as a foundation for something else -- which is exactly the point made above.

Then where's the "limit" you speak of? -t

Exactly the same "limit" a hard drive has in being an operating system. See above. The RelationalModel defines some structures, constraints, and an algebra consisting of (typically) somewhere between two and eight operators. That's it.

Where are quantity of operators officially capped?
They aren't. That's why most references define between (roughly) two and eight primitive operators. Generally, a definition of the RelationalModel seeks to define the minimal number of operators necessary to support a reasonable set of database retrieval and update operations. Certainly, more operators can be defined, but unless these can intuitively be seen to be universal, i.e., applicable to data management in general, they are, by definition, outside the RelationalModel. They may use the model, but are not considered part of it. The dividing point is usually determined by whether or not the operators would make sense outside the relational model. For example, a relational PROJECT does not make sense outside of the model, since it is defined in terms of a relation and a list of attributes from that relation. An implementation of Kruskal's algorithm (for example) using the RelationalModel, however, is conceptually independent of relations and is implementable on any graph representation, but only makes sense in the context of a graph. Hence, even if Kruskal's algorithm is implemented as a relational operator, it belongs to GraphTheory rather than the RelationalModel.
I'll agree that "base relational" or "classic relational" has "limited modeling capability". But, expanding on ideas is often a good thing. I don't see a real problem here. -t
I have no objection to the notion of an "extended relational model", but until such a thing exists -- and actually is an extended model, and not merely an embedded application of the RelationalModel -- the point made here stands.
Please clarify "embedded". It's an overloaded term. And I'd qualify the title as something more like "current RDBMS have limited modeling capability", although it's too long for a wiki name. I'd like to see more experiments with relational expansion. We have gazillion app languages, but only a few relational languages. SQL is getting long in the tooth.

I know of no reference on the RelationalModel that discusses its representation of inference rules or its capability for machine learning, but there's nothing that precludes using the RelationalModel to help construct mechanisms to support these. However, that's not part of the RelationalModel.

I started the page back in 2001. Two years later, I found not only that OO folks haven't been able to substantiate their claims in any way, but actually they shy away from even a trivial ManyToManyChallenge when it comes to implementing OO models. Actually, the challenge was just posted as an example and somebody else refactored it as a challenge.

The sensible conclusion to me is that actually the relational model has much more powerful modeling capabilities, in the sense that it offers you a higher level of abstraction than your stock OO models (SmallTalk, Java, C++, Python, or even more advanced things like GemStone ODBMS), allowing you to actually realize complex information models with very little (close to negligible for the educated engineer) effort. While it's true that OO will support in the end the same models, it will do that at the extremely boring and frustrating and annoying cost of having to deal with managing pointers, having to embed ad hoc indexing data structures, query by navigating pointers and other low-level details in the code of the project. One only needs to browse GemStone stock examples and API to see how things are done in an object database.

The two kinds of models are equivalent in modeling capabilities only in the same sense that C and Lisp are equivalent as Turing complete. But the reality of it is that I can create a very complex relational schema (hundreds of tables) in less than a day in any decent database, while it will take anyone a matter of weeks just to write the code to support the static information structure. Therefore people should think about whether it is not their stock OO languages and ODBMSes that have limited modeling capabilities.

The thing that's really missing in current SQL-based implementations is an advanced type system, therefore some more advanced features (like sum types, subtypes, etc.) have to be "encoded" rather than represented in an SQL database schema; however, this glitch is far outweighed by the comparative cost of having to write lots of very boring and error-prone OO code myself to manage object relationships. If ODBMSes want to stand any chance of resurfacing again, they will have to learn relational algebra, and be, first of all, relational databases to begin with. As a matter of fact, ChrisDate observed in exchanges that I had with him and later publicly, that the RelationalModel is orthogonal to the type system. In other words, there's no reason not to have a relational database with the type system sketched in TheThirdManifesto, or with advanced type systems like those of Haskell, ML or Lisp. Then there will be no reason whatsoever to distinguish between ODBMS and RDBMS. -- CostinCozianu

While I generally agree with you WRT OO modeling versus relational, there are two issues that I wish to query you deeper on if you don't mind.

First, how would a better "typing system" help relational? (Perhaps this should be under another topic.) I would be interested to see an example.

Second, while I go along with the OoLacksMathArgument, some OO fans claim that navigating one-by-one and not having to follow the "table shape requirements" is liberating and helps them grok a system, sort of like walking through a city on the street level instead of looking only at a map. One could argue that the physical human mind is more like the OO model than the relational model, hinting that nature rejected a relational approach, and that the "organic" nature of OO better matches the organic world around us. IOW, relational may be more logical, but less organic in the process. It is like Spock trying to live in a sector run by emotional, impulsive humans and aliens. He makes a crappy captain in that situation because he cannot relate. Maybe they just don't "get" relational and never will. (I think a better relational system or language would make it more acceptable to more developers, but maybe not all. SQL gives relational a bad name -- SqlFlaws.)

-- A fellow RelationalWeenie

Re: One could argue that the physical human mind is more like the OO model than the relational model, hinting that nature rejected a relational approach, and that the "organic" nature of OO better matches the organic world around us.

Hmmmm. DrCodd versus Darwin. That'll stretch the mind to new dimensions.

We covered the inheritance part in another page. As far as modeling behavior - well SQL isn't turing-complete is it?

Apparently the inheritance reference has been moved or lost. Perhaps it is under LimitsOfHierarchies?

Does it have to be to be useful? Relational is not meant to be a complete solution but a part of it. It makes certain parts much simpler. If you want to play around with something like turing complete relational, try some PrologLanguage for a while.

So you're stuck using another language. Since most people don't want to try to write behavior in PL/SQL or other "built-in" languages that usually means a 3GL or OO 3GL like Java, C++ or Smalltalk. Now, as a simple for-instance, how do you model a Queue (first-in-first-out adding/removal behavior) in a relational database? OK, I guess you could use a timestamp, but what if time isn't the operative factor in maintaining the order of the queue?

What about a generic Queue that will hold any kind of object? Oh yeah, the First Normal Form rule forbids that...

-- KyleBrown

Since when a database is supposed to be modeling behavior?

It's often possible to turn "behavior" into declarative information, and/or reduce it to relational algebra.

If you really think this should be the case, please make a contribution to DatabaseDefinition.

Note that relational and "database" are not necessarily hard-wired together. A relational app tool/language may be warranted outside of a database, at least of the BigIron kind. -t

My thinking is that the behavior, i.e. functions/algorithms operating on data belong to the client application space. They use the data management services of the database. The interface between the application and the database is defined by a data manipulation language and a client/server protocol, where the former is usually implemented on the client side by the means of client library. When needed, a database can be extended by the means of user-defined types and user defined operators/functions that should be integrated in the data manipulation language.

That doesn't mean a database is modeling some kind of behavior as we do in application space. The UDT/UDF are extensions of and should conform to the database model itself, and they should use the same general purpose data manipulation primitives as any other client of data services will do.

How do you model a queue ? See http://technet.oracle.com/docs/products/oracle8i/doc_library/817_doc/appdev.817/a76938/toc.htm for a complex example with details. Everything in Oracle Advanced Queue packaged is persisted as relational tables. It is not the case to discuss the details here.

My position is that any data model that can be modeled at all, can be modeled within the relational model. So the relational model basically has no limit on its modeling capability.

Very often, it is not convenient to model some application domain in the relational model (for example, we might want to model a graph using pointers), but considering the fact that the relational model is intended to support Large Shared Databanks, you will always want to use the relational model for this class of application domains. -- CostinCozianu

A queue that can hold everything serves no purpose in a real domain model. In the client run-time space, it may have some usefulness, though I very much doubt even this. Please come up with an example of a real life domain model where you want a queue that can hold everything, and you want it stored in a database as part of the domain model.

(I agree, but I'll answer your question anyway. ;-)

You can simulate a queue by using a table with a system assigned numeric key and using the "max()" function. Oracle has had sequence number generators for a long time, and SQL Server has identity columns for this purpose.

Oracle example:

Push:

  insert into queue_table (numeric_key, data_value) values (queue_seqgen.nextval, 'Your data here');

Pop:

  select data_value from queue_table where numeric_key = (select max(numeric_key) from queue_table);
  if <error> then
    // queue is empty
  else
    delete from queue_table where numeric_key = (select max(numeric_key) from queue_table);
  endif

And if you want it to hold "anything" make data_value a 'text' or binary blob column, and fill it with XML or other externally defined data.

None of this proves that you can do a queue in SQL, as you still need some external process or script to issue the correct SQL statements. Notice the "if" in the example; where did that come from? Don't tell me the user typed it! ;->

This is confused, not a critique. The if is for the convenience of a calling program, and isn't essential. The push and pop work correctly without the conditional, but the question is, should the pop have been done in the first place? If not, and if we don't do the select and associated if, but just do the pop via the above delete, then the delete will fail, that's all; that's inevitable if you pop an empty stack.
The only applicable critique here would seem to be that SQL all by itself is not a general purpose programming language; it's not TuringEquivalent. Thus it's common, yes, to issue SQL from something like Java. But what does that have to do with anything in this context?

Don't confuse SQL with relational databases.

What other options do we have?

If you mean implemented solutions, then the pickings are a bit slim right now. See RelationalLanguage.

Re: How do you model a queue?

Perhaps we are entering a LaynesLaw situation with regard to "queue". The road I take is that I have an EverythingIsRelative viewpoint of "queue". A queue is a specific viewpoint of data and perhaps involves behavior as part of its definition. Unlike a "strict" definition of queue, one does not limit the data to *only* a queue-view; for other applications or tools may want to do other non-queue things with the same information. In other words, the relationship between a given piece of data and "queue" is considered "has-a" instead of "is-a" in relational thinking. Encapsulation leans toward an is-a view, which is too limiting IMO. (SharingDataIsImportant.) Relational will tend to say that a given peice of information is participating in a queue, while OO will tend to say an item is in a queue.

If you want to limit or prevent others from "looking inside" the queue, then perhaps use a combination of stored procedures and AccessControlList or DB's built-in security to limit "direct" access to given table(s). But, we don't want a DiscontinuitySpike if we later add or subract access. A "has-a" approach is more condusive to such. In OO the cost of going from is-a to has-a is generally high.

Thus, if I am going to "model a queue", I want to model the EverythingIsRelative nature of queues, and relational seems to do at least as good a job as OO.

-- top

Errrrm, why would you want to do behaviour in a database? Makes no more sense than doing data persistence in a functional context. Generally, to add persistence to a queue by using a relational database is simplicity itself.

Perhaps. Such issues are at BusinessLogicInStoredProcedures.

Re: In OO the cost of going from is-a to has-a is generally high.

Plain wrong. It's just changing to using a reference. In Java you will start with references from the start, you Stack will never contain the objects itself, but references to them. In C++ it can be implemented both ways, both are trivial to implement.

Doesn't this break encapsulation? Storing only references means that outside objects that also link to the data can change the contents of the queue withOUT going through the queue's methods. Sure, it can be done that way, but it is no longer "true OOP" (although the definition of OO is rarely agreed upon). It has to be changed to being "less OO" to "fix" the is-a limit. At the very least, it wouldn't qualify as an AbstractDataType anymore because one is back-dooring the ADT's operations. --top

Oracle example:

Push:

  insert into queue_table (numeric_key, data_value) values (queue_seqgen.nextval, 'Your data here');

Seems this implementation supports only a limited number of push/pop cycles, as sometime the generator will overflow. The OO solution does not have this deficiency.

Oh yeah, it's a very serious limitations it overflows around 10^37. Try to hang around waiting for that.

Mirroring Changes of the Real World

In relational modeling, the tables generally correspond to domain nouns, such as invoices, customers, etc. These are generally seen to be proxies of real-world things. If an employee changes their name, then the Employee table is updated so that the table keeps reflecting the real world (the person's name). One tries to make the DB mirror the real world. It is a static view, meaning that the data persists as long as the actual object does. OOP seems to lack or avoid this "static" view. Is there a need or want to have this kind of persistent object that "stays around" as long as the real-world thing it represents stays around? It isn't clear to me how OO viewpoints handle this issue. Some say OOP designs only focus on an application's viewpoint rather than the global model of the organization. This would be somewhat equivalent to relational views or app-specific table copies. Are OO objects only meant to be temporary or regional viewpoints of domain objects?

I should point out that OO's concept of aggregation or containment is lacking in relational. For example, for invoices, there may be an Invoice table and a InvoiceLineItem? table. OO will usually represent this as aggregation. That is, line-items are a "sub-part" of invoices. Relational tends to keep the two at a "normal" distance from each other. This is because it sees relationships between entities as situational or relative. Line-items also contain part numbers, you will note. Thus, it has a relationship to the Parts table as well as to the Invoice table. Relational gives no preference to either of these relationships and the operations (join, filter, etc.) are the same regardless of any real-world aggregation (line-items "inside" of invoice). It is kind of a democratic view of relationships. OO, on the otherhand, tends to focus more on a physical construction viewpoint and emphasizes packing up the line items inside the invoice like clothing in a suitcase.

The relational view is that you may want to look at line items and relate them to part numbers or something else without caring (for the moment) about the invoice level. The OO viewpoint is that in the real world you usually treat invoices and line-items as a nested relationship, and therefore your design should reflect that. If you want to look at a different relationship outside of this nesting, then you just have to do it the hard way for those allegedly uncommon cases. This is why it is called "object oriented": it reflects a physical mental model, almost what you would see in a CAD system. Non-physical relationships are downplayed. Whether you agree with this tradeoff choice may depend on whether you agree that a physical view is indeed the most common view or not.

--top

CategoryModellingLawsAndPrinciples, CategoryDatabase

FebruaryZeroNine