Relational Host Language Impedance Mismatch

Let's consider the ImpedanceMismatch that occurs between RelationalDatabases and the Host Languages that use them:

The "relational database" model of the world does not match that of most host languages, so one often encounters difficulties at the interface between them. Yes, there are workarounds for each of these issues, but the fact that a workaround is needed validates that there is an issue.

No, I wouldn't qualified what we discuss here as workarounds. If there's an issue is with the host language not providing you enough already made data types and operators. Naturally when these things are missing from the host language, you have to program them, or buy a third party library.

What we have here, is that in most of the cases, Relational Databases have a much larger collection of domains, and offer much more flexibility than what usual programming languages initially offer you.

I reject this notion based on the fact that I can provide a isomorphic mapping between any relational model and any TuringComplete programming language (well, maybe I can't, but someone could). I contend the issue is not one of power, it is one of convenience -- MarkAddleman

You're absolutely right, but please note that I spoke about what programming languages initially have to offer, i.e. built-in types and/or standard class libraries. Also note that most often convenience is a very dear thing for those who sustain the ObjectRelationalImpedanceMismatch. My claim is that you don't have so much inconvenience on the database side

(see RelationalDatabaseVsProgrammingLanguageFlexibility)

More, I'll start at some point of this discussion draw the line between the relational model per se, and relational database implementations as of today, a fully relational database will allow you to extend the schema through the use of UDT and operators, however even with imperfect database implementations of today, I think we already have enough to play with, without worrying about impedance mismatch.

So we can safely say that relational database model of the world does include the host language model of the world.

Hence we cannot speak of a impedance mismatch, here, because at a limit one can refrain to use in a relational database schema types other than what can be directly mapped to and from the host language. Of course, this is only for the sake of the argument.

The correct solution is to extend the host language capabilities through the use of user defined types and functions (or classes for OO). This effort needs done only once per database, and you can't blame the database for this, you have to blame your language.

Most of the databases offer you ready made libraries or class libraries to deal with this issues on the client side. -- CostinCozianu

Relational database set operations -vs- host language procedural looping:

Databases process non-procedural declarative statements of set operations, like SELECT statements. Few host languages can operate directly on sets, requiring significant infrastructure, like cursors and "recordsets" to resolve the differences.

If there were no ImpedanceMismatch, then it would be as easy to join tables and sort results in the host language as it is in SQL.

Well, this is an unfortunate limitation for the host language. you cannot assert a impedance mismatch here. On the other hand, you can easily build library to operate on small sets and small relations, but a host language library will never be able to operate on truly large and shared sets and relations of data, unless you reinvent the wheel, this is the job of a relational database, just let it do what it is intended to do. -- CostinCozianu

A "Recordset" is a Microsoft object-oriented wrapper for a cursor: It's a mix of Container, Iterator, and single-row-object. (IE: It's an object with confused identity.) It supports operations like Count() = number of records in the container, Next() = move to the next record, and Field("ColumnName") = value of the given column in the current row.

The NULL Value:

Most host languages can't process NULL values as relational databases, requiring some rather odd workarounds:

Indicator Variables:

A long-standing approach to handing a nullable "char(16)" column to a host program is to have the host program define a "char(16)" data variable and a "short int" indicator variable: If the value is NULL, the indicator variable is set to -1, and the data variable has an undefined value. For normal values, the data variable has the correct value (and the indicator variable is typically set to the string length). Pity the poor programmer who forgets to check the indicator variable before using the data variable; it may contain undefined data. Also, good luck trying to process data-indicator pairs with 3rd party subroutine libraries or any code that doesn't understand that "this pair of independent variables, used together based on certain silly rules, defines the real value we wish to process."

A simple approach that works for me is to use <null>, and test for it. Please also note that if you don't do the same in SQL (i.e. use IS NULL/ IS NOT NULL operators, then you're in for big troubles).

Using Variant:

Microsoft's approach is to use "VARIANT" variables - a tagged union of lots of types, which includes a "NULL" type, and most other commonly types. So, the variable containing the nullable "char(16)" value may have the "NULL type" or the "String" type, assuming nothing unexpected happens. Unexpected things can happen due to the loss of static type checking implied by the use of Variant variables. Also, the runtime performance overhead of Variant types can be significant.

Many common pointer-based languages like C, Java and C++ can represent in a rather natural way NULL data with null pointers to non-existent objects. That some DBMSs and interface libraries confuse NULL with empty strings is another matter.

Data Type Mismatches:

[...moved to DataTypeMismatchsBetweenRelationalDbAndHostLanguage.]

I think this is knocking down a straw-horse (addressing a different question). ImpedanceMismatch refers to the fact that a mapping is necessary at all. It does not say a mapping is impossible. That makes no sense in the context of Turing machines. It does however refer to architectural attributes like performance and maintainability. The more costly the transformation, the greater the analogous ImpedanceMismatch. Its fuzzy, but its also useful. --RichardHenderson

Exactly. The need for mapping is the mismatch. -- JeffGrigg

Isn't this just a language interoperability issue? If you want to work directly on database data without mapping, you can use PL/SQL with Oracle. Oracle's "i" series databases also give tight integration with Java with automic conversions as necessary (I have not used this enough to be aware of its scope and limitations though). -- ChrisBrooking?

If you really want to reduce the discussion to 'why don't I program against a relational database like programming against no database at all', then I'll address that too, if needed. I hope you don't really mean that. -- CostinCozianu

Ah, but programming against an OO database is very much like "programming against no database at all", except that my application's data gets preserved across different runs - in the OO database. OO databases are designed to have minimal ImpedanceMismatch with their OO host languages. Relational databases aren't, and so the ImpedanceMismatch can be rather high. -- JeffGrigg

Because data is about a space larger than a single application. -t