How Other Query Languages Address Sql Flaws

This is a comparison of how different query languages, proposed or actual, deal with various SqlFlaws and shortcomings. Motivated by a discussion of goals in SqlFlawsDiscussion. It can also be a discussion of different approaches to solving them, regardless of whether an existing language actually has such a feature. It's an exploration of techniques more so than specific languages. Fortunately, the field of SQL alternatives is still new, and thus this is a good time to hash out the options.

The "problem" categories are listed below in no particular order other than creation date (nobody agrees on the importance ranking anyhow). Languages are listed in alphabetical order.


Non-Unique Results Sets


A better Null or replacement to Null (NullVersusNone)

                   TYPE Date UNION;
                   TYPE ValidDate IS {Date POSSREP {Year INTEGER, Month INTEGER, Day INTEGER}};
                   TYPE MissingDate IS {Date POSSREP {}};
.


Reference (named) sub-division of large query statements (including multiple references of a single given definition)


Flexibility of column selection


Hierarchical Queries


Auto-Joins or a better NaturalJoin


User or DBA Extendability of Language


Avoiding SQL's Complex Syntax


ImpedanceMismatch

ImpedanceMismatch is a long and involved issue. Perhaps the "solution" should be broken up into multiple titles to avoid confusion.


Approach to Facilitating SeparationOfDatabaseAndDomainMath

Notes:


User-defined Database Objects

Please define "database object". It's not clear how this section differs from "User or DBA Extendability of Language" (above). Different languages allocate what is handled by the base language versus handling features via "objects" and/or libraries differently.


Queries that are not affected by multiplicity changes (AccessPathIndependence)[3]


Facilities in the language that make it easier to achieve high levels of normalization in an automated way (NormalizationBySynthesis)[4]

I don't believe such is the domain of a query language, but rather a design or refactoring tool of some sort. -- top

So... SmeQl is only a DataManipulationLanguage? but not a DataDefinitionLanguage?? (what about the DataDictionary that TqlChainedJoin may use... isn't that something that could also be considered out of the domain of a query language (and part of the domain of a design or refactoring tool of some sort)

[I'm not clear what point you're trying to make. Automated normalisation can certainly be the role of some external tool, ala Toad or some other database-oriented IDE.]


Availability of Implementation


Leverage Tables Instead of Compete with Them

It is my opinion that things best left to tables should be left to tables rather than invent custom or special language syntax or commands. It not only reduces the complexity of the syntax or language, but also increases meta-ability. It also gives one the option of using a TableBrowser to enter "configuration" info.

Note that this doesn't exclude the possibility of a command language performing operations that are built around tables if the command language can create tables (including virtual or temporary tables) and then "act" on them. For example, if there is no cross-table key reference dictionary as shown in TqlChainedJoin, then one can use existing SmeQl to create one for a given context or script. If you make it only do-able within a language, then you are stuck using that language instead of other database tools and languages.

I always felt the CREATE TABLE command typicaly found in SQL-RDBMS suffered from this also. The schemas should be defined in tables, not command syntax. If one wants to create a table from a schema, they reference the dictionary table and/or or an existing table in the CREATE TABLE command(s) rather than describe the table using some goofy command syntax. Pseudo-code: "CREATE TABLE x USING SCHEMA TABLE y".

If one needs to do it in the command language, then the language should provide a mechanism to describe a table (real or virtual), such as table "y" above, and then reference that table in the command. This reduces the number of command language idioms and commands because table definition only needs to be described once. (Another technique I learned from ExBase.)

A distinction should perhaps be made between using an existing table's structure as the schema reference and a DataDictionary. A CREATE TABLE command could perhaps use both techniques/sources, but if there is a "get schema" command that creates a DataDictionary from an existing table, then a dedicated command may not be necessary. Whether it's a convenient shortcut or not is another question.

 // pseudocode for creating a table from existing schema
 save schema from table A to table B   // B is a data-dictionary
 create table C from schema table B

// potential shortcut pseudocode create table C from schema of table A

However, for what I propose, you wouldn't have dedicated syntax for the above, because one changes (system) tables to add/change schemas, not use dedicated schema commands. If other processes need to know about the changes, then internal triggers can perhaps be used. In short, Data Definition Languages (DDL) are a violation of conceptual OnceAndOnlyOnce. Designers of existing languages didn't think "meta" enough about this and similar issues, and it's part of the reason for SQL's and similar language's syntactic and linguistic bloat. (Although, VendorLockIn may also have been a motivation. It's harder to clone complex languages.)

--top

There are research DBMSs that do this. In some, updates to the catalog via the DML are equivalent to the DDL and can be used as an alternative to it. In others, there is no DDL; the only way to create new constructs is via insertions into the catalog. The former is a handy facility; the latter is awkward. For example, defining a table via CREATE TABLE syntax is simple and intuitive. Defining a table via a series of inserts to the catalog's "Tables" table is not intuitive, and in most cases represents an unnecessary level of indirection.

If a TableBrowser is handy, I'd rather use it than CREATE TABLE-like syntax. It can even provide pull-down lists and/or type-ahead for things like type names. And what do you mean by "level of indirection"?

If a SyntaxAwareEditor is handy, it too can provide pull-down lists and/or type-ahead for things like type names. VisualStudio and the EclipseIde do that now with various languages.

By "level of indirection", I mean that a syntax that supports "CREATE TABLE" directly requires less indirection than a syntax that does not directly express "CREATE TABLE", but instead requires the user to translate his intent ("CREATE TABLE") into something that expresses it (a series of INSERT statements).

If a TableBrowser is a natural tool to a user/DBA, there is no "translation". Further, it is possible to simplify a "series of INSERT statements" to make similar tasks and related syntax more convenient. The advantage is that such shortcuts can be used on other tasks, not just scheme definitions. Inventing custom syntax for every specific DBA-like task is anti-parsimony. -t

Whether a TableBrowser is familiar or not, constructing a table by editing metadata requires a translation from the user's intent ("CREATE TABLE blah (x INTEGER);") to the user's action ("INSERT INTO sys.Tables VALUES ('blah'); INSERT INTO sys.Columns VALUES ('blah', 'x', 'INTEGER');"). Obviously, you can certainly simplify the series of INSERT statements to make similar tasks and related syntax more convenient. The way SQL does that is to provide the CREATE TABLE command. Inventing custom syntax for every specific DBA-like task is not just "anti-parsimony", it would be unreasonable. However, providing custom syntax for commonly-used commands is convenient, helpful, and reasonable.

My "intent" does not look like Oracle or DB2 DDL SQL; I don't know where you are getting that intent thing. (If anything, my intent looks tabular; after all, I have a TOP mind.) As far as text-centric approaches, textually serializing and importing and exporting tabular data is one of those "commonly-used" features you talk about. It's not isolated to just table definitions. Perhaps we should focus on that as a general problem and then only abandon general solutions for use in table definitions IF we cannot get it to work well as a general technique. I am open to better ways to do such (CSV, XML, UniversalStatement, hybrids, etc.). If we can universalize it, then switching back and forth between textual and (generic) table-browsers could be made easier. But let's start another topic if we do; perhaps ImprovingTextualTabularDataFormats? or something. I find it highly silly that RDBMS have invented huge entity/feature-specific syntax trees to handle what is (or could be) essentially data changes to schema tables. --TopMind

My SQL (which is standard, and not specific to Oracle or DB2) was merely intended to illustrate the intent of "create a table", using familiar terms. How does "INSERT INTO sys.Tables VALUES ('blah'); INSERT INTO sys.Columns VALUES ('blah', 'x', 'INTEGER');" clearly demonstrate the intent to create a table? Does it even create a table?

However, I can appreciate your desire for elegant simplicity using a few elemental operators and mechanisms which can be composed in an infinite number of ways to define every possible operation. That is the essence -- and some would say the beauty -- of languages like Lisp and Forth. Every language strikes some balance between potentially obscure and laborious elemental simplicity vs overly-verbose pre-specified keywords and constructs. The choice of balance point is at the discretion of the language designer. Apparently, SQL didn't strike that balance where you would have liked.

As mentioned in MaspBrainstorming, Lisp perhaps has the right idea, but wrong base data structure. A map or perhaps "ordered map" would make a better base data structure.


The issue of the scope or role of the query language seems to be common question or issue. Different participants seem to have widely different views about this. Potential roles to consider include:


Footnotes

[1] My preference based on my experience is to either not have nulls for strings, or treat them the same as empty strings. -- top

[2] See SeparationOfDatabaseAndDomainMath. For the purpose of illustration, SmeQl generally borrows from SQL's scalar operators.

[3] This title appears to presume multiplicity changes are a problem. In a quarter century of developing business applications, I can count the number of times on one hand... However, there may well be domains where this is an issue. See SurvivorBias

[4] Presuming, of course, that this is desirable, or that it should be part of the base "query language".

[5] I am open to a "warning mechanism" that won't allow "bag" results without including a special keyword or "override" flag. I wish Cartesian joins in SQL had a similar protection in place. --top


See also: QueryLanguageComparison


CategoryQueryLanguage


AprilTen and again JanuaryTwelve


EditText of this page (last edited July 29, 2014) or FindPage with title or text search