Relational Database Vs Programming Language Flexibility

copied from RelationalHostLanguageImpedanceMismatch:

What we have here, is that in most of the cases, Relational Databases have a much larger collection of domains, and offer much more flexibility than what usual programming languages initially offer you.

I reject this notion based on the fact that I can provide a isomorphic mapping between any relational model and any TuringComplete programming language (well, maybe I can't, but someone could). I contend the issue is not one of power, it is one of convenience. -- MarkAddleman

You're absolutely right, but please note that I spoke about what programming languages initially have to offer, i.e. built-in types and/or standard class libraries. Also note that most often convenience is a very dear thing for those who sustain the ObjectRelationalImpedanceMismatch. My claim is that you don't have so much inconvenience on the database side.

Please provide an example of the alarm clock variety described in ForeignKeysCanOnlyReferenceOneTable that has little inconvenience. Note that the customer gets to arbitrarily add new classes after the solution is presented.


Maybe I'm being dense, but do we need to define the concept of flexibility? What exactly does it mean that a RelationalDatabase is more flexible than programming languages initially?

Given my everyday use of the word flexibility, I'd say the concept isn't a valid point of comparison between the two. Is it hotter in the summer or in the country? The question itself makes no sense.

Further, I'll go out on a limb and contend that any comparison between relational databases and programming languages is meaningless without a problem domain.


To cut it straight I'll give you an example:

in a database you can easily have a datatype Number(18,2) and no need any more effort other than declaring the type. In usual programming languages of today, you have to do a little programming effort to have that.

rebuttal:

SmalltalkLanguage

Smalltalk-80 offers Number(oo, 0) and Number(oo, oo), built-in, with no rounding error. (oo = limited only by available memory.) Unlimited fractional precision is useful when distributing income among huge investments that might be in highly inflated currency in use 20 years hence. I'm uncomfortable choosing an upper bound for either magnitude or precision, and have been since the millennium. If I guess big, I pay now. If I guess small, I pay later. I'd rather just let the computer do the allocation dynamically. -- WardCunningham

The SmalltalkLanguage's concept of numbers gives you a lot more than "Number(18,4)" -- Smalltalk numeric operations use decimal fractions of "unlimited" size. Smalltalk works exactly with any number (of finite size ;-) that can be represented as "one large integer divided by another large integer." So 999999999999999999/10000 is quite valid and represented exactly in Smalltalk. Further, you're not limited to any particular number of digits to the left or right of the decimal point.

Also, 1/3 is represented exactly in the SmalltalkLanguage. Try that in Oracle! ;-> Can you say "RelationalHostLanguageImpedanceMismatch"? (I thought you could. ;-)

Well, you jumped to conclusions too soon. Please see Oracle documentation (>8i) on CREATE TYPE, CREATE OPERATOR, CREATE INDEX, CREATE FUNCTION, and the most important CREATE INDEX. The key here is that a truly relational database should be fully extensible. And I'm afraid I have to tell you that Oracle is pretty close. -- CostinCozianu

CobolLanguage

I also think CobolLanguage can do it, I'm not 100% sure though.

Yes, the CobolLanguage has fixed-place decimal numbers; both packed and zoned decimal. It's a mainframe-ism, supported directly by the IBM 360 CPU. So PL/I and other mainframe languages directly support "PICTURE <total_digits>, <digits_to_right_of_decimal>" declarations. I'm pretty sure RPG has it too.

Microsoft

The MicroSoft world has Currency and Decimal: Currency (if I recall correctly) is fixed at "Number(18,4)". Decimal is about 28 decimal digits (2^96) with a floating decimal point of 0 to 28 digits. (Decimal is NOT a floating point binary number; it's always decimal; there is no rounding error.)

counter-rebuttal

For Smalltalk and Java, you have to do a little programming. Java's BigDecimal offers you slightly less than Number(18,2).


Otherwise they [relational model and programming languages] are absolutely equivalent [in terms of flexibility]. Both a relational database and a programming language allow customized user defined types.

RE: "Both a relational database and a programming language allow customized user defined types."

I disagree with the statement that relational databases allow customized user defined types. For example, no database that I'm aware of provides for an explicit Jpeg type. While it's true that most databases provide BLOBs, this is not the same thing as providing explicit support for Jpeg files. For example, the database with a BLOB column named "Jpeg" would not reject a SQL INSERT of a GIF file. Furthermore, I don't think I can join tables on the Jpeg column (a meaningful operation, by the way).

At best, the customized user defined type that a relational database gives me is limited to aggregations of built-in fundamental types. In practice, this is generally some set of number types and some set of string variants. -- MarkAddleman

I have to mention that a perfect relational database would allow you to do all the things you mentioned.

Perhaps it would. A perfect flomitz would solve world hunger, the Palestinian-Israeli conflict and cure my hangnail. Unfortunately, I only have flomitzes available from the store.

You must also agree that no system will be able to incorporate in itself all imaginable kinds of ready built data type, Gifs and Jpegs being an example, so any kind of system will have to provide extension hooks that would allow you to do just that.

As a matter of fact, I believe you can do what you want (create a function library that will validate Jpegs) in most recent versions of Oracle, Informix and Db2; I'm not sure about Sybase and MS Sql Server.

Having thought about it, I agree with you. My points regarding accepting/reject Jpeg's aren't correct. I'd delete it, but I think there's some value in the exchange. -- MarkAddleman

I also think you can leave them in there. -- CostinCozianu

So, you want to join tables based on Jpegs and say it would be a meaningful operation (care to provide an example?). I'd say your choice of a primary/foriegn key is rather poor. -- Costin Cozianu


I wouldn't call it a "user defined type" unless you could "define type SocialSecurityNumber as char(12)" possibly with "format '999-99-9999'". Then you could "create table TableName ( ..., ssn SocialSecurityNumber NULL, ... )".

Of course you can.

I believe that Sybase and SQL Server support this with "domain" definitions, but that it's *NOT* in ANSI SQL. -- JeffGrigg

Please note, that ANSI/ISO SQL is NOT conforming to the relational model.See AnIntroductionToDatabaseSystems.

That means any custom mechanism that does not violate the relational model should be allowed in this discussion. You should have known by now, that there's effectively no way to develop a generic application against SQL 99 (not even SQL 92), and let it run happily from Oracle, to Sybase, to DB2. -- CostinCozianu


I've heard that Informix supports user-defined database types. But Oracle and SQL Server are really "the marketplace." -- JeffGrigg

Example queries...

The problem with integrating all kinds of "BLOBs like" objects into the databases is a little bit subtler than this discussion, and goes beyond merely defining UDT and operators on our UDT. Given the current constraints of hardware and operating system it would be quite irresponsible even if I as hypothetical DBA would let you do that in Oracle, load them and run some functions inside the DBMS engine. There are some new approaches called Generalized Indexed Search Trees, that would allow you to extend a hypothetical database with customized index structurers in order to achieve meaningful queries capabilities.

By the way, is there a Object Database that would allow you to define a bitmapped index on a large set of images, so that you will avoid having to load them one by one and run some Java or Smalltalk code on the object instance? I think not.


Unsure what you mean by bitmapped index, but GemStone, GemStonej, JyD, and VoSs? object databases will do exactly that. The key is carefully parsing your statement "...so that you will avoid having to load them one by one" where the you is me the application programmer. I'm not playing semantic games; I believe this is important. The object databases I mentioned (and probably others) will pretty much transparently move objects in and out of my application's address space as necessary so that I, as an application programmer, can pretend to have a virtually limitless amount of RAM. No relational database will do this for me as an application programmer.

This is one source of the mismatch that I find between relational databases and OO languages. -- MarkAddleman

Well, in the future I'd appreciate if you check it on the web before blindly responding just in case it happens to hold, because I'm alone on my side of the story and all kind of people tend to false conclusions. Just don't declare victory yet, will you?

Bitmapped index basically means that in order to select images that let's say 'have magenta in it' the database will have no need to load hundreds of thousands of bulky images, but instead will only load a few disk pages, some of them which might be already cached (this is ruled out in case one loads all images, the cache is useless). So with semantic or without semantic, your Gemstone scenario is good to be thrown to the bin.

Ok, I'm somewhat better educated now on bitmapped indices. Like any index, they're an implementation detail designed for query optimization. Presumably in neither the relational nor object database case are you loading the entire image, processing it for the color magenta and reporting the results. In the relational case, I could create a boolean column "HAS-MAGENTA" and a bitmapped index on that column. The fact that the index is automatically maintained is kind of nice, but I, as an application programmer still must maintain the column value should the image colors change.

In the object database world, I could maintain a hashset of all images that contain magenta. The application code required to do that is not much more complex than maintaining the equivalent "HAS-MAGENTA" column. There is no theoretical reason to believe the response time for the query "Give me all of the images with magenta" would be quicker under either scenario. The difference is in coding effort

No, you could not. Both solutions you proposed fall under several principles. One: you introduced redundancy that you have to maintained manually. In an OODB case, is even worse, that you have to maintain an indexs by hand. This breaks the data independence principle. And last, did you thought of indexing on 24 bits color?? How many columns, and how many manual hash-sets?

I guess it's time for you to drop this argument.

As a matter of fact what Gemstone does is possible in today's relational databases also, but it's plain stupid. And not to mention that if you 'pretend' you have an unlimited amount of RAM, you can as well go for a walk until Gemstone is processing your query. -- CostinCozianu

This is not the first time that you've asserted automatically moving objects in and out of memory is a bad idea. Does your aversion also include operating system level virtual memory paging?

Well, it is a very bad idea when you move in and out just because you don't have other solutions. And it is a very bad thing when it happens with a database. If your Windows 2000 workstations is doing page faults (and believe me it's doing a lot more than absolutely needed), it is absolutely no big deal. But if it happens to a database that is supposed to handle lots of transactional concurrent clients, believe me, you don't even want to take it into consideration.

The hashset implementation I described above scales quite well with the number of images. If we turn query around and ask, "Does this particular image contain the color magenta?" the search for one OID in a set of OIDs is quite fast, O(1), in fact. However, if we scale to some arbitrarily large set of images where that set of OIDs cannot fit into RAM, it doesn't matter. The object database only has to perform a few more object swaps. The key here is that the hashset itself is made up of objects and they're all treated the same by the oodb, only those few internal objects necessary to compute the hash value, find the appropriate bucket and make additional comparisons need ever be loaded into memory.

See above. My dear friend, you try to reinvent the wheel because you're very fond of your object database. And I'm afraid that your wheel might come out square. The key is that your hashset, is indexing by hand in the client application. You don't want to do this. If you are not sure why you don't want to do indexing by hand , I'll address these issues in another page.

I'm not sure this is different from a "HAS-MAGENTA" column and a CREATE INDEX statement conceptually or efficiently. The only difference is the oodb maintains the link between application and persistent address space automatically for me while the rdbms requires me to maintain it myself.

I'm not sure you have a point here. Please elaborate.


re: try to reinvent the wheel because you're very fond of your object database... index by hand

If your point is that we should all use RelationalDatabases because it's easier to deal shuffling data in and out of memory manually, handle the RelationalAndPolymorphism? problem, and deal with the RelationalHasNoObjectIdentity problem (perhaps these problems don't really exist?) than to maintain indices, I assert that you're wrong and have your priorities backwards. See a OaooApproachToIndices.


re: I'm not sure you have a point here. Please elaborate

The point is simply that I see no conceptual or efficiency difference between using a rdbms versus a oodbms for the purposes of handling the example query. If you see one, please explain it rather than simply asserting that one exists


re: Well, it is a very bad idea when you move in and out just because you don't have other solutions.

I certainly have other solutions. I could store it all in physical memory, serializing in and the start of the application and serializing out at the end. For small jobs, this work just fine. Of course, it scales rather poorly in both the number of users and size of data dimensions.

I could use a relational database which scales well in both of these dimensions, but requires me to do a lot of data maintenance work such as maintaining both the object AND the relational schema, creating and updating special meta-information tables/columns to deal with polymorphism, and somehow detecting the that an object in memory may have its persistent state changed by someone else.

Please stop making assumptions about how things work in a relational databases. Limit yourself to describing your proposed solutions in an objectual database. You assume all kinds of things about working with relational databases, which you probably read from only one OO side ( and especially the 'Smalltalk school' - authors who think in Smalltalk first , this is not a bad idea per se, but thinking in smalltalk just doesn't cut it when addressing the issues that a database is supposed to address ) in papers and books about object mapping. Those things are all wrong. If you really want to know how to integrate OO concepts with a relational database, I strongly recommend you to study the books I mentioned in ObjectRelationalImpedanceMismatchLinks. Otherwise, I'll have all the issues described in summary, but one at a time.

I could use an object database which scales well in both dimensions and handles these problems automatically for me. Well, you just didn't illustrate how the Gemstone scales in the example at hand.

Which should I choose?

You should always choose a relational database when you need a database at all. When you only need a persistence engine, you can have it your ways (GemstoneS,GemstoneJ, Jasmine, whatever). See DatabaseDefinition, and I'll add some more stuff soon. -- CostinCozianu


re: illustrate how the Gemstone scales in the example at hand

I apologize, I thought I had for one of the dimensions. There are two dimensions that we're considering: number of images and number of users. The example, as it stands is rather contrived: Give me all the images with the color magenta. I think it's reasonable to add a third scaling dimension: colors. So, now the example is "Give me all the images with the color X".

Scaling through number of images

Clearly you need some number of operations to actually store the image. This is going to depend a great deal on the behavior for the image required by the application. In the simplest case, it's nothing more than an array of bytes, so the writes and reads of images aren't much more than whatever it takes to write/read a stream of bytes. Since everything else in the application refers to these bytes through the OID, the time to access them remains constant. Pretty scalable.

Number of users

GemStone and other object databases provide full transactional control over object manipulation. GemStone, in particular, gives you the option of lock objects or you can use its built-in concurrency controls (which provides automatic detection of write-write conflicts). I'm not an expert on concurrency models, but from my reading of this mechanism's theory, it is generally regarded to scale quite well through the number of users.

Scaling through number of colors

Provide a map between colors and images. If we assumed 24-bit images, a highly specialized map would be pretty easy to construct since we know all the keys a priori. However, I feel as if this sort of optimization is cheating, so let's just say we have a hashmap containing 2^24 keys to arrays of image object OIDs. Assuming the map is implemented as a hashtable, eventually that hashtable will grow and rehashed until it reaches its optimum size (which is around 2^24 buckets). Note that there a lots of hashtable implementations around that will do this for me. I do realize that the rehash operation is rather expensive, but judiciously creating my hashtable at the beginning can greatly alleviate that problem.

A search for an arbitrary color would require a constant number of disk reads. I believe the worst case scenario is:

 One disk read to read the hashtable object into memory.
 One to bring in the image color in
 After computing the appropriate bucket, one to bring in bring that bucket into memory
 Then, I'll need some small number of reads to verify that to verify the correct mapped key
 against the actual key (of course, this depends on how well the keys are distributed)

Since this operation occurs with a more or less constant number of disk reads, I'd say it scales rather well.


This debate has become tiresome for me mainly because you have, so far, refused to address RelationalAndPolymorphism? and why you believe automatic consistency management between the application address space and the persistence address space is a bad idea.

It seems that your only substantiated gripe with object databases is that the relational model is isomorphic with them and therefore the object databases are unnecessary. I put it to you that since they are isomorphic, relational databases are unnecessary. I actually don't believe that, but I do believe that I would rather model a relational database in an object database than a object database in a relational.

I never said even ONCE that relational model and object database model (whatever that might be, from what I saw here, I hope it's not Gemstone's model) are isomorphical. If you have patience, and don't put words or intentions in my mouth you'll have your curiosity satisfied, even for RelationalAndPolymorphism?. -- CostinCozianu


EditText of this page (last edited February 15, 2004) or FindPage with title or text search