OO systems are good at managing transient state; relational systems are good at managing persistence, especially persistent facts and deriving new facts from these. There is no impedance mismatch, except OO environments that support dynamic class and/or object definitions may provide a convenient syntax for handling data retrieved from relational systems external to the OO environment, whereas this may be awkward in OO environments that only support static class definitions. -- DaveVoorhis
I've started the page and really liked the conclusion, that's why I've elevated it to the top. Thank you all for your effort. I've kept the entire discussion below (including the old start). -- AurelianoCalvo
Both tables and classes can be thought to be "containers" of objects if we think that ObjectsAreDictionaries. The problem is that they don't contain the same objects. That's why they mismatch. This mismatch is specially hard on strongly typed language (contemplate both Rails' ActiveRecord and ObjectGraph? (AKA: og) in RubyLanguage).
IMO, Rows from the database (resulting from a query) are classless objects (or at least, they have a dynamic class with accessors for each column). Rails' ActiveRecord (for instance) use the RubyLanguage dynamism to generate the classes for the tables on the fly, avoiding the extra and useless work of retyping the class definition for the tables. .Net implementation of the ActiveRecord pattern (Castle ActiveRecord) can't do that and the code is significantly more verbose in C# than in RubyLanguage because of this.
Well, it's just a thought. What do you think? (AurelianoCalvo)
I think it's not quite accurate, though the essential idea is valid.
Classes should not be thought of "containers" of objects. They are (typically) only syntactic mechanisms for generating objects. They do not contain objects. Container classes permit generation of container objects, but it is the generated objects and not their classes that contain other objects. I am not aware of any language that allows you to iterate all objects that belong to, or were generated by, a given class.
- In any Smalltalk environment, look at the results of evaluating "SomeClass? allInstances". While you may be pedantically correct, in that this is a behavior of the environment rather than the language, I believe that distinction is not germane to this discussion. In SmalltalkLanguage, everything is an object, each object has a class, and each class has a unique name. Thus, the #allInstances method is implemented (virtually, if not actually) by traversing the set of all objects and collecting those whose class has the same name as the receiver of the original expression. -- TomStambaugh
- Interesting. I'm not a Smalltalker, so I wasn't aware of '#allInstances'. It appears that Smalltalk is a language (or more accurately, environment) that treats a 'class' as both an instance factory and an instance container. That said, I have also seen cases (and built them) in C++ and Java where a factory is specifically designed to maintain a container of all instances it generates. But to get back to my original point, this is not an implicit feature of the language itself, and I was attempting (perhaps awkwardly) to focus on the conceptual distinction between class, instance, and container rather than specific implementations (like Smalltalk, as I now see) that intentionally blur these. I still hold that a (pure) class, at a conceptual level, should not be considered a container but a generator. This is especially important when relating classes, containers and instances to database tables. A class is not equivalent to a table. A container instance is equivalent to a table. A class may either be roughly equivalent to a table header (i.e., a table's definition or schema) or a column domain. Following on from this, an instance may either be a row or a row's column value. However, there are some - e.g., C J Date and Hugh Darwen - who compellingly argue that class and table header equivalence is incorrect, and maintain that a class is strictly equivalent to a column domain or type, hence an instance is strictly equivalent to a row's column value. In this view, a row is a classless container of column instances. See DomainsNotRecordsOrTablesAreObjects.
- Ultimately, I suppose this comes down to the definition of "class." At a conceptual level, is a class an instance factory, or is a class an abstract notion of those instances that share some meaningful characteristic(s)? -- DaveVoorhis
The essence of this category of impedance mismatch, as you have essentially stated above, is that certain languages only support static, compile-time class definitions. In such languages (e.g., C#, C++, Java), there is no mechanism to conveniently generate a new class at run-time, hence there is no mechanism to conveniently produce a class from a dynamic query's result set at run-time, and thus there is no way to generate (at run-time) row objects that conveniently use object syntax to reference member elements. Languages that permit run-time, dynamic class and/or object definitions do not suffer from this problem.
- In Smalltalk, the very concept of "compile-time" isn't meaningful. In Smalltalk, every class has a dictionary of methods that pertain to the instances of that class. Those methods have compiled byte-code implementations. In that context, "compile time" means "the time when the source of a method is saved." I suggest that the problems you describe have less to do with static typing and more to do with whether or not the environment is "pure" -- all objects, all the way down. -- TomStambaugh
- Hence, Smalltalk is a language that supports run-time, dynamic class and/or object definitions, yes? -- DaveVoorhis
Furthermore, it is probably not accurate to say that this is especially hard on strongly typed languages, because it is possible to have a strongly typed language that permits run-time, dynamic class definitions. It just so happens that certain popular languages which happen to be strongly typed also only support static, compile-time-only class definitions.
- Yes, I think you're now saying the same thing I observed above. I suggest that the difference lies in whether the environment is or is not "pure object oriented", more than whether or not it provides strong and/or static typing. -- TomStambaugh
In short, this type of OR impedance mismatch is only caused by static class definitions, and resolved by dynamic class and/or object definitions. --
DaveVoorhis
- I think it's more accurate to say that this type of OR impedance mismatch is caused by attempting to mix object and non-object entities, and is resolved by making the language and environment purely object-oriented, all the way down. -- TomStambaugh
- I think we're in violent agreement here. That is effectively what I was trying to say, using slightly different terminology. Dynamic class and/or object definitions may imply a purely object-oriented language. However, it also implies non-OO languages (using "object", above, in a broad sense) that permit dynamic definition of structures or whatever, as long as it uses a convenient syntax to access result sets. -- DaveVoorhis
So, in the end you agree that the
ObjectRelationalImpedanceMismatch is mainly caused by the static class definitions? Is there any
ObjectRelationalMapping that generates classes for the tables at compile time for Java, C#, C++, etc. (using the data base schema as a source)? That would be a nice advance over Hibernate, EJB, etc. Just emulating what rails does at runtime. --
AurelianoCalvo
I agree that this type of OR impedance mismatch, i.e., awkwardness in conveniently manipulating result sets, is mainly caused by static-only class definitions. There are other potential impedance mismatches that this does not solve:
- Possible difficulty, complexity, and/or performance issues caused by mapping intricate object graphs to a set of tables, though you're starting off on the wrong foot if you think the purpose of an enterprise database is to store objects rather than statements of fact; or
- database vs. application language type incompatibilities; or
- constructing queries via string twiddling which is possibly subject to SQL injection security exploits; or
- the fact that functions/procedures/methods defined in the application language are not accessible to the database language (or at least are awkwardly integrated), and vice-versa,
- etc.
Some (many?) of these are not an OR impedance mismatch per se -- they're an application language (whether OO or not) vs. database language, or even application environment vs. database environment, impedance mismatch.
- Yes, the above issues are independent of the alleged OR impedance mismatch. -- TomStambaugh
My first bullet point, above, deserves some expansion. I believe the notion of constructing application-side classes, e.g., Customer, Employee, Invoice, Payment, InventoryItem etc. to represent real-world entities, which then use the database as a persistence engine, is a naive and flawed approach. The application side should implement mechanisms to facilitate presentation of the database -- perhaps starting with base classes like Form, Report, Menu, Query, DatabaseConnection etc. -- and let the database management system deal with entities, or, more accurately, facts. This approach treats the database management system as an equal partner in implementing application functionality, rather than deprecating it to a mere storage device for client-side objects, and it reduces degree of impedance mismatch because you're generally not having to maintain client-side classes in sync with server-side tables.
However, I have used in-house tools to generate Java and C++ classes from database metadata. Having not used any other ObjectRelationalMapping tools, I can't say whether they do or do not support such a thing. It seems obvious, so I would assume they do, and some of the descriptions on the ObjectRelationalMapping page suggest that they do. -- DaveVoorhis
- I beg to disagree with Dave here. My experience has been that the effect of this is to surface the relational model to the ultimate user, so that the user interface most often becomes a glorified and pretty-fied SQL query builder. The reason for this is that, in my experience, the things I model have behavior, and the behavior is important. The relational model explicit blocks this. A point of agreement might, however, be in Dave's phrase "which then use the database as a persistence layer." The key is to include the behavior of whatever is modelled in the persistent store along with the state. I think this does, in fact, evolve into Dave's suggestion that the persistent store be treated "as an equal partner in implementing application functionality." Dave may also be alluding to a more imperative style of modeling, along the lines of RDF and the semantic web. While I think this is also promising, it breaks the relational model just as thoroughly as it breaks the (classical) OO model. This doesn't mean we shouldn't do it, but I think it does mean that it lies outside the scope of a discussion about the claimed OR impedance mismatch. -- TomStambaugh
- It's important to note that a relational database is much more than just a persistent store. A relational database management system is a fact processor. It allows one to expressively manipulate existing facts and derive new facts via the composability of the relational algebra. The fact (!) that such systems are persistent is almost a mere bonus - one that is all too often elevated to its only purpose. If all you need is object persistence - perfectly appropriate for some applications - a relational database is not necessary.
- However, I think this comes down to a point you made below, that "[t]he fundamental mismatch has more to do with the problem space than any tools used to solve it." With this, I agree. My background is mainly in developing enterprise data management systems, where not only is the user interface a glorified and pretty-ified SQL query builder, that is precisely the goal. Such systems are conceptually intended to capture and permanently record changes to real-world facts, more than maintain a singular current entity state. Modelling "behaviour," as such, is often implemented by recording new rows (i.e., storing time-dependent business facts) rather than (say) updating existing rows to reflect a change in state. While this is a valid and required approach for enterprise data management systems, it might be inappropriate for (say) a CAD system that maintains changes to objects as the user edits her diagrams. Though, arguably, even a CAD system might benefit from this approach (performance and implementation issues aside), as it would permit an infinite history of all changes, thus supporting unlimited "undo" facilities, implicit versioning, etc. But that's beside the point. What I'm getting at is that there may be fundamental differences in approach between systems designed for shared data management, and those designed to preserve application state.
- I disagree, however, that this breaks the relational model. How does it break it? True relational systems do not preclude or deprecate behaviour, they're just picky about where the behaviour lies -- either in user-defined functions or operators, active constraints, or (in the degenerate, pseudo-relational/procedural case of SQL) in triggers and stored procedures. In relational systems, data updates may result in behaviour (which may result in data updates, and so on.) In OO systems, behaviour may result in data updates. The end result is the same. -- DaveVoorhis
Having flexible/dynamic objects/classes may allow your objects to dynamically mirror table schemas, but that does not really solve the root of the problem: OO and relational are governed by different rules and philosophies. Sure, you can better fit relational by making objects mirror or imitate relational, but then it is not OO, but emulation of another paradigm. It is not so much a technical problem as a philosophical one. Perhaps you could argue that static classes add a technical barrier
on top of a philosophical barrier. -- top
Well, if you want you can model tables as objects. What I see is that the tools currently in use to match rows in the database with objects force the developers to repeat information if they want to use objects to encapsulate them. The Rails implementation of the ActiveRecord pattern is a step to not force that. I believe that ObjectRelationalMappingCostsTimeAndMoney and I would like to spend as little as possible solving this accidental complexity. Because data bases are great tools for some things and it's possible to model them in objects (just model them, it's not necesary to reimplement them), I'm just looking for the better way to use them when they are appropiate. I really don't see a philosophical barrier between objects and the relational model. What I see is that you erect a psicological barrier between them for yourself. I'm really happy to use both tecniques in my professional life every day. My practical concern is that I'm not able to apply the DryPrinciple? (Don't repeat yourself ;) ). If the classes representing the data base were autogenerated, I would not need to rewrite the data base schema as classes. If the schema were autogenerated, I would not need to rewrite the classes as the data base schema. The first approach is the one in Rails' ActiveRecord and the second one is the approach in ObjectGraph? (og). No philosophical barrier, no fuss, no problems. Just a little metaprogramming. -- AurelianoCalvo.
I'd to suggest that there is a simpler mismatch that underlies all this: a tradeoff between row-major and column-major modeling. When we pull back to look more at the forests and less at the trees, I think that "object oriented" modeling begins with a fundamentally row-major viewpoint: when I add an element to a schematic in a CAD system, I mostly care about collecting the various things in that element, and my drawing is mostly a container of those elements. Conversely, when I seek those elements whose price changed by more than a certain amount in the last year, I care mostly about a specific property across all the objects. The first is row-major, the second is column-major. It seems to me that we can then make several observations:
- The fundamental mismatch has more to do with the problem space than any tools used to solve it.
- One tool that can handle both kinds of problems is better than two.
- The relational paradigm -- not its implementation -- is highly optimized towards column-major problems, at the expense of row-major problems.
-- TomStambaugh
I agree with this. It is in part why I am interested in languages like TutorialDee that specifically seek to address these issues. Intuitively, I think there is a fundamental region of conceptual commonality or reconciliation (or rapproachement, to use C J Date's preferred term) between object oriented, relational, functional, etc., systems that we have not yet discovered. The same holds for the divide between application functionality and environments vs. database functionality and environments. Alternatively, perhaps the paradigms are not reconcilable, but no single paradigm has been evolved to the extent that it can definitively trump the others. The fact that we can successfully build working systems using any or all of these types of machinery, yet simultaneously complain about impedance mismatches where they meet, suggests that either of these may be true. -- DaveVoorhis
A more fundamental distinction between the concepts of "table" and "class" is that a class provides both behavior and state, whereas a table provides only state. The relational model admits no concept of behavior, whereas an object is fundamentally defined as a package of behavior and state. The inclusion of behavior is problematic within the worlds of CeeSharp, CeePlusPlus, and JavaLanguage as well as within the relational world. Hence the mistaken assertion that ObjectsAreDictionaries. Yes, an instance of an object can be thought as a dictionary, but only so long as that dictionary also contains a mapping to the methods that the object supports. -- TomStambaugh (moved form the top, because it didn't follow the page flow).
Well, tables have behaviour. But its behaviour is more "hardcoded". One can add records, delete records, update records and query them. The queries are special beasts because they can join multiple tables, but they have the same essence. I see clear behaviour there. My bet at modelling relational in a object oriented way is to provide this primitives and, if necesary, add new one. But repeating this for every class is a pain in the ass. That's why I say that RelationalOoImpedanceMismatchIsCausedByClasses. I mean that if one has to write all the code in the class, and can't generate it at runtime, there is a problem because usually the data base is the place were the data is based. If I need more behaviour from the tables than what they provide, I would do the usual tricks: inheritance, façade, adapter and decorator patterns come to my mind. Think of the data base as a framework (and/or library) you use but can't change, and everything works like the rest of the frameworks. Having to retype the framework is a pain, using it and extending it when needed don't seems such a big deal. -- AurelianoCalvo.
- I've experimented with the idea of putting code in tables, but the problem is that data and actions tend not to map one-to-one to each other. For example, something like a table-oriented StrategyPattern looks like a good idea on paper, but usually doesn't work out so well in practice because having one function or function pointer per record is not a realistic mapping of data to behavior. Thus, the tight integration of data and behavior proved not very useful. It is true that a database may make a better code repository than tree-folders when the tools catch up, but treatment of the data and behavior as relatively independent things is simply modeling their actual relationship to each other. I would also welcome DBA-extendable query languages (and app-level mini-DB's that can do the same). But this would require simplifying the query language syntax, which SQL and Tutorial-D both fail at IMO, and thus TopsQueryLanguage. -- top
Go one step beyond thinking of the database as a framework. Think of the database as an external device and the "impedance mismatch" disappears entirely. We don't worry about the printer-OO impedance mismatch because printers are missing some OO feature. --
EricHodges
Building on the above comment from EricHodges, I view the database as the second tier of a three-tier storage model: primary, secondary, and tertiary. Primary storage is dynamic, loads and stores are (relatively) fast, and volume is (somewhat) limited. Think "memory". Secondary storage has longer latency times but high bandwidth once the transfer begins, and volume is nearly unlimited. Think file system and database. Tertiary storage has very long latency times and sometimes very low bandwidth, but essentially unlimited volume. Think magnetic tapes, DVD, and similar bulk storage. ALL behavior is provided by a processor running methods, including the internals of a database implementation. The purpose of a database is persistence. Period. Persistence of everything -- including methods, metastructure, processes, stack frames -- EVERYTHING. Certain services are often best provided by a database running SQL. I fully support using a database running SQL as a provider of those services. On the other hand, when I have to provide behavior -- even simple behavior, like computing a current balance -- then the database is very nearly the worst place to attempt to provide that behavior, because it is so highly specialized and optimized for its primary function (which it should be). Now, having said all that, it is certainly true that there are shifts in abstraction level, and one could imagine all sorts of innovative architectures where the virtual computing engine sits in some sort of abstract hyperspace with some of the computing engine's behavior provided by stored procedures. That's all well and good, but that virtual computing engine is a virtual computing engine, NOT a database. A database is for providing secondary storage. Period. Persistence. -- TomStambaugh
I believe it is a narrow view to treat a (relational) database purely as a persistence engine. Equally limited would be, say, using an OO language but never taking advantage of inheritance and polymorphism. If you're using a SQL DBMS purely as a persistence engine - and by that I assume you implicitly mean you are not using it as a fact processor, i.e., a tool for deriving facts (e.g., summaries, joins, calculated columns, views, restrictions, etc.) from other facts (tables) - then why not use a persistence engine such as Prevayler or the Berkeley DB? Either of these would probably be more suited to preserving application state (which is what I assume you mean by persistence of methods, metatstructure, processes, stack frames, and everything) than translating objects into SQL.
- Related: DatabasesAreMoreThanJustStorage. If you treat a DB as just an external service, then it tends to result in verbose, interface-happy code where most of the code is devoted to interfaces and interface management rather than doing actual work. Beurocracies might like it, but code verbosity and red tape ain't for me. -- top
Again, I pose that persistence and data management are different things. As evidence of this, imagine that future machine architectures all have primary storage that is high volume, low latency, and non-volatile. This is certainly well within the realm of possibility. Would that make relational databases redundant, now that persistence is implicit? I say "no". A relational database still provides powerful means for expressing and deriving facts in a manner that is concise, composable, dynamically optimisable, and easily proven - and all this is independent of persistence.
For example, compare the conciseness of an ad-hoc natural join in a typical relational DBMS vs. the equivalent OO operations. In TutorialDee, this is simply A JOIN B - where A and B are tables with column names and domains in common - and it's not much more verbose in SQL. In a non-relational language, this requires that you write some mechanism involving iteration and maybe keyed lookups, or perhaps maintaining object associations at all times, and possibly incorporating mechanisms to dynamically choose a mix of these if you wish to implement the optimisation that relational DBMSes provide. Of course, there's nothing to stop you from building OO facilities that employ a terse syntax to use your home-made join, but you're simply reinventing something that a relational DBMS already provides and optimises for you.
Please note, by the way, that I wrote "ad-hoc natural join" above. Predefined static relationships or associations between objects, in an OO environment, are of course trivially navigable. -- DaveVoorhis
I think there's some heat caused by different uses of the term "persistence". I think a relational DB should be used for persistent information. I don't think a relational DB is "purely a persistence engine". RDBMSs offer much more than persistence, but they shouldn't be used for transient information. There are other tools better suited for that. -- EricHodges
We seem to be slicing our terms pretty thin here. When I say "persistence", I include finding and retrieving information. A relational database is ideal for finding three specific items out of three million. I think further discussion of "facts" belongs on a different page. Meanwhile, "summaries, joins, calculated columns, views, restrictions, etc." are all useful in storing and retrieving data that changes infrequently in comparison to the timeframe of a specific process. I think this is another way of stating Eric's observation that they shouldn't be used for transient information. The sorts of systems that Dave describes are all very interesting, and I think have very little to do with relational databases. Yes, it's true that I can use an automobile to run a generator to run a pump to empty a pond. That doesn't change the fact that the primary service provided by an automobile is transportation. A relational database is for PERSISTENCE. -- TomStambaugh
I agree that a discussion of "facts" probably belongs on a different page - like DatabaseIsRepresenterOfFacts - which elegantly (though perhaps incompletely) supports the "fact based" view of relational databases. I'd be happy to move this thread there, if the other parties in this discussion agree to it.
However, I do not agree that the systems I describe have little to do with relational databases. Actually, they are precisely what relational databases do well, and precisely where they are often used. For example, I worked extensively on enterprise bookkeeping/accounting, payroll, inventory, and billing systems (i.e., what are now typically unified into ERP systems - see http://en.wikipedia.org/wiki/Enterprise_resource_planning), for which relational databases are ideal. In such systems, we often need to record the fact that a change occurred, not just the result of the change. For example, rather than represent an account in silico as a virtual real-world account - which might only maintain a current balance - we typically represent an account as a collection of business facts about sales, credit/debit memos, charges, etc., that result in a derived, rather than stored, account balance. In short, we are more concerned with recording information about business activities than we are with simulating the behaviour of business entities. Relational databases are good at the former. OO environments are good at the latter.
- I am skeptical of the "simulating behavior" claim (surprise). Do you have an example? Perhaps if you simulate it as little people doing the process manually, then OOP would be the way to go, but if you factor/design the schema right, often there are relational shortcuts that are more declarative in nature. Sometimes it seems OO is more concerned about copying physical activities when the best solution may be to transcend the shackles of the physical world. OOP may indeed better model the real (physical) world, but the real world sucks. -- top
- An example of the "simulating behaviour" claim? Sure. See my Account example on BusinessTransaction. -- DV
Furthermore, I agree that persistence is a
foundation of databases (though, interestingly, it is categorically
not the foundation of the relational model), but having established that foundation, we treat it as given. By applying the relational model to facts (persistent yes, but that's a convenience and a benefit, not a necessity), we derive all manner of interesting goodness. Date's seminal
AnIntroductionToDatabaseSystems text briefly discusses persistence on page 11 (of the 8th edition), and indeed uses it in the definition of database - "A database is a collection of persistent data that is used by the application systems of some given enterprise." However, once stated, persistence is only mentioned on
two other pages - both related to the notion of "persistence orthogonal to type" in a section on object oriented databases. So what are the other 980 pages about? Primarily, they are about the use of databases (relational in particular) to represent, process, and derive facts! So, using a relational database as a facts processor is hardly a marginal application of relational databases, it is
the purpose of relational databases, with emphasis on "relational" over and above "databases." It is the relational model, independent of persistence, that gives a relational database its power. Otherwise, why bother with the relational model at all? We could simply use an object persistence mechanism.
Another way to look at this debate is to ignore the issue of persistence entirely, and simply view the relational model as a set of tools for manipulating data. If you're not familiar with RelationalAlgebra, try developing a set of classes to implement it. Then experiment with it, as you would any other class library. I did this, back when I too viewed a relational database as little more than a persistence engine, and it was an eye-opener. (My experiments eventually evolved into the RelProject.) Relational algebra and relations are powerful tools for manipulating data, period. I wish every OO language incorporated the relational model, persistence aside, because it simply allows you to manipulate data in an expressive, concise, composable, provable, optimisable way that is much more awkward without it.
OO proponents sometimes seem to have philosphical objections to the relational model and relational databases, and an equal number of relational proponents have objections to OO programming. Both views are limited. OO and the relational model are both powerful and complementary, and I would no more reject one above the other than I would reject the screwdrivers in my toolbox in favour of the wrenches, or vice versa. I'm also lazy - if there's something a relational database will give me for free, I'll take it, and save the effort of reinventing it in the application environment.
-- DaveVoorhis
I'm familiar with the relational model. If I have a gazillion objects, and I want three of them, and my criteria is based on the contents of those objects, then the relational model is the best choice I know of. Attempting to model it in OO is a rathole that at least one OO database vendor (Ontologic/Ontos) fell down -- the problem is that in the OO paradigm, you have to load the object from storage in order to do *anything* on it. Whoops. I still think we're in violent agreement here. Specifically, there is no "impedance mismatch" to be caused by anything. Instead, there are (at least) two perspectives on the data. -- TomStambaugh
Truth.
If I might take a stab at a summary, perhaps toward some ultimate refactoring of this page: OO systems are good at managing transient state; relational systems are good at managing persistence, especially persistent facts and deriving new facts from these. There is no impedance mismatch, except OO environments that support dynamic class and/or object definitions may provide a convenient syntax for handling data retrieved from relational systems external to the OO environment, whereas this may be awkward in OO environments that only support static class definitions. -- DV
I'd like to build on Dave's summary. As sketched above, the key factor in creating an OO environment that can "support dynamic class and/or object definitions" is being object-pure -- all objects, all the way down. I know of no OO environments that are pure and that also "only support static class definitions" -- the static class limitation is a direct consequence of the compromises made by those environments that attempted to maintain non-object entities. My suggestion for those who struggle with resolving this apparent impedance mismatch is to get first-hand experience with a pure OO environment. SqueakSmalltalk is an excellent and free starting point. The next step is to work within a pure-object environment built inside any of the more widely available technologies -- I have first-hand experience with doing this in LispLanguage, JavaScript, PerlLanguage, and PythonLanguage. Ruby proponents claim it is possible in Ruby. If strong typing is desired, investigate EiffelLanguage. The two alleged OO environments that are most commonly encountered -- CeePlusPlus and JavaLanguage -- are the two environments in which it is most difficult to even simulate a pure-object environment, CeePlusPlus being a particularly challenging case. I suggest that most of the claimed mismatch is an illusory fog created by attempting to perform OO development in one of the latter two environments with no prior experience to draw on. -- TomStambaugh
Relational and "Transient"
Re: "OO systems are good at managing transient state; relational systems are good at managing persistence"
When I used to use nimble table environments, I used tables for "transient" stuff also. For example, one might take a result set from an ODBC query (or the equiv. in those days), which could be a local or "transient" table, and further massage it for something local or task-specific. The problem is that most development environments don't support nimble local or virtual tables because they fell out of style. Your statement is only true from a current vendor/implementation standpoint, but not an absolute comment on the paradigms/techniques being compared. It is a BigIron-only DBMS viewpoint. Nimble tables entirely changed the way I viewed programming; and when I got hooked, I had to be dragged away from it kicking and screaming (sometimes called "trolling" here :-). -- top
In an OO environment, you can implement Relation and Tuple classes with associated relational algebra operator methods. A Relation is a container of TupleS, and a Tuple is a container of objects. If you like, you can go one step further and implement relational Expression and Operator classes, where an Expression instance is a container of Operator invocation instances that represent an expression in the relational algebra. This allows you to implement dynamic, automated optimisation of relational expressions, though it's really only of benefit if you're dealing with large volumes of tuples. A lightweight implementation of the relational model (call it "nimble tables," if you like) in an OO context leverages the power of both paradigms. Where it's warranted, I've found it to be of considerable value, and it makes a nice addition to standard or built-in collection/container libraries. If all you need is a transient container, however, standard container classes do the job just fine. But, I digress. My real point in all this: OO systems make it easy to create lightweight implementations of the relational model. Therefore, my claim that "OO systems are good at managing transient state" still holds, even if the transient state is maintained in a lightweight implementation the relational model! I might add, however, that (in agreement with the above) in this case the relational model is good at managing transient state, too. -- DaveVoorhis
- Whether OO is better for building a nimble DB engine or not is generally off-topic. One can build such in Fortran or BrainFsck also if they want. However, hand-rolling a DB engine and expression parser from scratch is kind of reinventing the wheel. Anyhow, my main point is that DB's can be useful for "transient" stuff also. OO tends to be overkill for local, temporary stuff in my opinion. -- top
- You're right, it's off-topic. You could build it in anything (my current direction is a persistent LISP-like language I'm developing, which is sweeeeet), but use of inheritance and polymorphism to construct a variety of Relation types - some in-memory, some disk-based, some referencing a relational DB, some referencing other things - is elegant and simple. It is precisely the opposite of overkill. Furthermore, no expression parser is needed if the expressions are constructed from Operator instances that reference each other, i.e., manually build the syntax tree. E.g.: Expression e = new Expression(new Restrict(new Project(new Join(relvarA, relvarB), new ColumnSpec().add("Column1").add("Column2")), new Compare("Column1", GREATER_THAN, 3))) Of course, this is more awkward than Expression e = new Expression("(relvarA JOIN relvarB) {Column1, Column2} WHERE Column1 > 3"), which is why I implemented TutorialDee. Expression parsers are not difficult, and it's only reinventing the wheel if there's an existing wheel that does what you need. In my example below, there wasn't. -- DV
- ExpressionApiComplaints could perhaps apply here. Ideally the local language would be SQL so that processing or tables can be moved to BigIron DB's if needed. (Not that I think SQL is the best possible relational language, but it is the current standard.) However, I agree that SQL may be bulky for some uses. (Related: MinimalTable). The SqLite C-based engine is not very big, I would note. As far as "types of persistence", I don't think a tree taxonomy is appropriate. I don't want my code to worry much about whether it goes to disk or RAM during processing. dBASE dialects would use RAM if available, for example. Perhaps have some efficiency "hint" settings, but one does not need types for that. Composition or its equiv would be more appropriate IMO. Example:
resultSet = query(theSql, diskable=true, compressable=false);
Or
rs = new QueryResultSet();
rs.diskable = true;
rs.compressable = true;
rs.sql = theSql;
rs.execute()
- ExpressionApiComplaints could well apply, except in applications I've built that required expressions built from objects, these are almost invariably constructed at run-time as a result of user actions, not built by the developer. As for your example, I have -- just for fun - compared a procedural vs OO approach on TopsQueryResultSet. -- DaveVoorhis
Re: When I used to use nimble table environments, I used tables for "transient" stuff also.
- I'm sorry, but talking about a "nimble table" in any of the common RDBs - Sybase or Oracle, for example - is like talking about the "sports model" of a dump truck. Especially when elaborated as a "result set of an ODBC query." These mechanisms didn't just "fall out of style", they make NO sense in any sort of reasonable OO environment. If someone wants to make a claim based on performance, reliability, scalability, or whatever, then please - make the claim. Put up an example of the specific problem in question, put up the "nimble table" approach, and then give the rest of us an opportunity to offer OO solutions to the same problem. Please. -- TomStambaugh''
- A table in a DBMS wouldn't be a "nimble table," by definition. We're talking mainly about in-memory implementations of relations and the relational algebra, or some subset thereof. -- DV
- It does not have to be in-memory. It could be disk-and-ram buffered as-needed. -- top
- You said you had used these "nimble tables" (I've never heard of the concept). Please identify a specific example of what you mean, so that we can understand how to either use it or do something comparable. I'm particularly interested in the performance of something on the other side of ODBC. -- TomStambaugh
- In the past top has explained that these "nimble databases" were a feature of FoxPro. That was his preferred development tool and much of his discontent seems to spring from the fact that he isn't allowed to use it. -- EricHodges
- Do you agree with this, top? -- TomStambaugh
- I miss FoxPro/ExBase for local tabling, but I don't expect it to come back in style because it is too different from SQL. Perhaps SqLite is the nimble table engine of the next generation.
- For example, suppose you had to create a report with color coding to hilite various conditions, delta's (value change degree), etc. The rules for all these colors and delta calcs would make for ugly or slow server-side SQL. Thus, you pull it down to a local table, add a color column(s), and then process the local info to decide which colors to use using a combination of local SQL and IF statements. (It may be possible to do it all on the server side with enough code, and some people actual do such, but it is a butt-ugly MisuseOfSql in my opinion.)
- We're perhaps inching a tiny bit closer to an example. Surely it's safe to assume that this report is a report on some data? While you're building your "nimble tables", figuring out the incantation to correctly hook them up through ODBC, building the sql, and making it all work, I'm writing a MumbleReport? class (probably in Smalltalk), with a simple method or methods that reference the data and applies the constraints. I'm eager to see the details that show us that the use of sql or anything like it offers any kind of performance or other advantage. -- TomStambaugh
- Respectfully disagree, if I'm understanding what you guys mean by "nimble tables". The use of local temporary tables offers the possibility of using queries involving a mix of local and shared data; this can greatly simplify some code, and can provide performance benefits in some cases. A typical example I've encountered (and which recently came up at work again) is when the end-user can select individual items from a potentially large set, and further processing requires correlating this set with shared data. The most robust way to do this is, and the fastest when large quantities of data are involved, is to have the selected items in a temporary table, and join against related data. Having tools on hand to easily manage temporary local tables for this sort of thing is very desirable, IMO. -- DanMuller
- An example I worked on some years ago: A telephone company repair department wanted to generate ad-hoc, real-time reports against dynamic data, mainly consisting of current telephone line status obtained from the hardware switches, and related information obtained from a variety of sources. I represented the various kinds of data sources as classes inherited from Relation (see above), and hid the relational algebra behind a user-friendly, somewhat QueryByExample-like interface. In effect, this formed a simple relational database, where certain relvars were based on an immediate view of hardware status and other data. This allowed the users to define ad-hoc joins, restrictions, views, temporary tables, etc., and generate arbitrary, real-time reports based on these. -- DaveVoorhis
- I'd like to repeat my request for an example, specific enough that we can look at real technical alternatives. DaveVoorhis seems to have started in this direction, perhaps DanMuller and/or top can help ensure that the example does, in fact, illustrate the situation they're describing. -- TomStambaugh
See
NotesOnaCeePlusPlusRdbmsApi for some information on a Tutorial D-inspired C++ library that is being used in a commercial product. If you look at the example code there, you'll see a TupleList
? object declared. In that library, which is built on top of the Jet DBMS, a TupleList
? is implemented as a temporary table in a local Jet database. That actually performed admirably compared to various alternatives. They're treated as read-only, and can't be used as expressions in further queries -- they're considered the end-product of a query, and they have a notion of order and no requirement that all tuples be unique. (Although in practice they almost always are.) Later, I added a LocalRelation
? class to represent temporary result tables that
are relational, can be modified, and can be used in subsequent queries. The implementation was similar, a transient table in a local database. The UI developers used these to greatly simplify code in a number of their most complex views. --
DanMuller
See also: ObjectRelationalImpedanceMismatch, TypeSafeJdbcWrapper
AugustZeroSix