New Query Languages On Existing Engines

Moved from HowOtherQueryLanguagesAddressSqlFlaws because it was getting TooBigToEdit.


What features do various new/proposed query languages have to facilitate usage with existing RDBMS engines? I suspect TutorialDee will have difficulty here because of its strong "anti-bag" stance, a BagSetImpedanceMismatch. (see "Non-Unique Results Sets" at HowOtherQueryLanguagesAddressSqlFlaws).

I presume by "anti-bag" you're referring to a fundamental axiom of the RelationalModel, that no relation can contain duplicate rows aka tuples. As the developer of an implementation of TutorialDee (RelProject), I can state with some authority that enforcing adherence to the RelationalModel presents no difficulty, whilst giving all the benefits of strict adherence to the relational model. At worst, some edge cases at the boundaries of the system require keywords to handle circumstances appropriately. For example, you can explicitly identify what strategy to use to link to a SQL table that lacks a defined primary key. See BagAtational and related discussions.

To avoid getting into the Great Bag Debate yet again here, let me present a scenario: one wants the target query language to be usable with existing RDBMS installations with existing data (tables), and probably existing SQL queries, since there is probably lots of code that would have to be rewritten for installations around for more than a few years if the SQL is to be immediately replaced. A gradual transition, or even co-existence seems far more practical. Some of the existing tables or views may not have formal primary keys, or the key may not be exposed (knowable) given the security settings of query users. It's probably not a show-stopper, but it may be an extra cost to consider when trying to bring bag-aversive query languages into bag-permitting systems and conventions.

As noted above, you can explicitly identify what strategy to use to link to a SQL table that lacks a defined primary key. The same applies to linking to SQL views, or tables where the primary key is obscured in some fashion. Specifying a keyword or two in such cases is a negligible "extra cost", but that's a small price to pay to gain more predictable query results, reduction in errors due to inadvertent duplication of data, and improved performance from leveraging relation-based automated optimisation strategies. Most DBAs already recognise the value of these in existing SQL databases, so they make sure there is an explicit, visible primary key on every table. Then the "extra cost" drops to zero and the gains from adherence to the RelationalModel become pure profit.

If you make a fake key and pretend it's real, you'll still have most of the same problems. Essentially you are having TutorialDee manufacture a lie to satisfy its purity itch. A fake key may be even more dangerous than no key because it may give the misleading illusion of being keyed. A query user may mistake a temporary key for a permanent one, for example, and start using it for cross-references. Further, artificial key generation may be a noticeable performance or storage cost for larger data-sets. -t

What's a "fake key", and why would I pretend it's real? You appear to be constructing a StrawMan. I assume you're familiar with MicrosoftAccess? In Access, when linking to tables in external DBMSs where no explicit primary key is available, you must manually identify the key in order to permit certain operations in Access. A similar mechanism -- for exactly the same reasons -- is required by an upcoming release of the RelProject (which is an implementation of TutorialDee; the TutorialDee specification does not cover implementation details) in order to permit certain operations on or using linked tables. In cases where there is no primary key because duplicate rows are either present or likely, a unique ID must be generated. It is certainly not a "fake key". It is a real key. Obviously, it might not be propagated back to its source and mistakenly exposed and used as if it was, but this is true of any data that might be joined to a given data set as part of presenting reports. However, this is not a "purity" peculiarity forced by the RelationalModel. It is a practical consideration imposed by lack of an unambiguous row identifier, one that must be tackled by any heterogeneous DBMS -- whether SQL, relational, or otherwise -- that can access data sources with potentially duplicate rows.

I am not talking about merely "accessing", but being embedded, or at least tightly integrated with a particular RDBMS engine to serve as a direct alternative to SQL on that "box". Your argument is basically, "When in Rome, we'll force Phonetician customs on you using fiddling and diddling." (And I wouldn't use MS-Access's connectivity as the ideal reference model if I were you, anyhow.)

The most "embedded" or "tightly integrated" an external DBMS can be with any vendor's SQL DBMS is to be a heterogeneous DBMS. I know of no vendor's DBMS, commercial or OpenSource, where you can trivially unplug the language parser and replace it with another. The closest you can reasonably get to any SQL DBMS is ODBC and/or JDBC, or whatever native equivalent is provided by the vendor, like Oracle's OCI. No idea what the "Phonetician customs" is about. Once again, you appear to be constructing straw men. By the way, MS-Access's connectivity was not cited as an "ideal reference model", so I don't know where you got that. I merely used it as an example with which I assumed you'd be familiar.

It may not be trivial to directly include it, but certainly possible. Some such DB engines reduce SQL queries down into an internal language which is kind of a "relational assembler language". (Some DBA optimizer and analyzer tools will expose some of this underworld.) If an alternative language becomes popular or a viable sales advantage, then most likely the new language would be included by making a new "compiler" that produces the internal sub-language. Being bag-aversive could mean there's more validation and work-arounds that have to be produced. A similar issue would arise if the compiler produced native SQL. Again, I'm not saying it's a show-stopper, but rather an additional cost, complexity, and training/education time. -t

The usual term for the query "underworld" or "relational assembler language" is a "query plan". I'm not aware of any DBMS (though there may be some) that allow programmers or users to generate query plans from scratch or alter generated plans except via optimisation hints. The hypothetical embedding of new query languages in existing vendors' DBMSs, therefore, requires the vendors to implement them. So, if an "alternative language [like the RelProject] becomes popular or a viable sales advantage", it doesn't matter how much effort it takes DBMS vendors to replace their existing SQL parsers, and certainly the differences between relations vs bags would be negligible compared to the overall effort. Being "bag-aversive" is good, because it gains optimisation strategies and eliminates erroneous data duplication. Relational purity is worth the negligible cost and/or complexity of implementation. As for additional "training/education time", is there any SQL or relational database education, anywhere, that advocates leaving out primary keys or ignores them entirely in order to expedite teaching? The opposite is generally true; students are typically told to always specify a primary key. The cases where this is violated are negligible. The cases where the impact of poor optimisation or duplicate data is significant? Legion.


CategoryQueryLanguage, CategoryCompatibility?


EditText of this page (last edited January 6, 2013) or FindPage with title or text search