Duplicate Tuple Output Discussion

Continuation of portions of ComplexityOfOutputtingDuplicateTuplesInTutorialDee:


Your point in the side-effects and XML examples is not clear: Many query languages do have side-effects, such as SQL's UPDATE statement.

[SQL is not a single language. SQL is made up of several smaller languages, covering various aspects from transactions and access control to data definition and data manipulation. Only the stuff you are allowed to place inside the 'SELECT' statement is truly part of SQL's 'query' language. While I do not practice SQL very often, I do not recall the ability to embed UPATES into the SELECT statements, even though vice versa is okay. If my recollection is correct, then SQL is properly a 'layered language' with respect to SideEffects. We are suggesting that RelationalModel query languages should, similarly, be layered with respect to presentation details (such as ordering, or XML format)... i.e. such that these features cannot complicate the 'query language' proper.]

For example, SQL could add syntax like: SELECT * TO FILE 'data.txt' FORMAT XML FROM foo. ExBase was sort of like this.

[SELECT statements can be embedded within one another. So one question is: what, exactly, would that mean (in terms of semantics, performance, concurrency issues) in the event it was embedded within another SELECT statement? Similarly, SELECT statements may be used to define UPATEs and VIEWs and such. What would SELECT TO FILE' mean in those cases? Additionally, SELECT statements are also performed over the wire, between unlinked FileSystems What would SELECT TO FILE mean for purpose of security and accessibility issues?]

Like a prior suggestion, only the "outer" final statement should perhaps be allowed to create output. I would note that it's not conceptually much different than "SELECT ... INTO myNewTable..." of some existing dialects of SQL. It would be interesting to test an inner "into". As far as security, file writing should probably be disabled by default for DB users. I see it more useful on a DesktopDatabase and DBA's security level.

In SmeQl, I'd probably have a SAVE(...) and FILE(...) or Export(...) operations to create new tables from queries and files, respectively (see TqlDataAlterationOperators). They'd only "work" as the outer or final query, though. The final/outer operation has more leeway. This is closer to the "layered" approach you talk about above.

Most non-proprietary info processing tools can import and export standard formats. That's how they talk to the outside world.

[Sure. So can RelProject. It simply isn't part of the 'query' language. Export doesn't need to happen in the middle of a RelationalModel query. Sometimes import is needed (i.e. lazy import, or to handle updates) but creating 'new' links shouldn't be part of the query itself, and so linking of external data resources is also expressed outside the query language.]


I'm not following what you claim is "logical". SELECT has feature set A,B,C,D,E,F,H while EXPORT has feature set A,B,C,D,E,F,G. "H" is the ability feed to another table, and "G" is the ability to have a "bag" result, and the rest of the letters being typical column preparation and calculation idioms. Based on these two differences, you create two different commands. As interface design, this is generally to be avoided. It's almost like having a Print() statement and a Print_Black_And_White() statement. It's AttributesInNameSmell. And your doubling-up "both" option is confusing to users. I'd generally agree to having an "allowBag" modifier, or at least complain far less about it.

In TutorialDee/RelProject, the RelationalAlgebra operators are fully exposed, such that there is no SELECT operator per se -- at least not in the SQL sense. For example, given RelVars a, b, p, q, and s, the following are all valid relational expressions or queries:

 p
 a MINUS b
 TCLOSE q
 SUMMARIZE s BY {x} ADD (SUM(balance) AS total, AVG(balance) AS mean, COUNT() AS n)
 p WHERE x = 3
 a INTERSECT b WHERE x = 3
 (a {x} UNION b {x}) MINUS c
 a UNION b WHERE x = 3 {x, y, z}
Note that only the last two expressions feature an explicit projection (specification of columns, duplicates eliminated in the result). Many times, we don't need to perform a projection. Sometimes, we may have reason to project columns in the interior of an expression, as an intermediate result. The second-last expression, above, is an example.

This is quite different from SQL, where a SELECT always requires a column specification, either explicitly or by wildcard, immediately following the 'SELECT' keyword.

In most cases, the relation that results from evaluating an expression/query will be used directly by a client-side application. In that case, no EXPORT is needed. We send a query to the DBMS and do whatever we like with the ResultSet that comes back. In other cases, we might wish to assign the relation to some variable for later use on the server-side. E.g.:

 p := a UNION b WHERE x = 3;
An EXPORT is only needed if we wish to export the relation (or ARRAY) to some external target. For that, there is an EXPORT statement. It only provides a means to (optionally) exclude columns; there is no need to specify their inclusion, because that can be done by an explicit relational projection as shown above. EXPORT also specifies the target (file, device, ODBC/JDBC table, etc.), which obviously has no meaning within the relational algebra.

For example:

 EXPORT r FORMAT CSV TO "myfile.csv";
In the above, 'r' can be an arbitrary relational expression. The following are all valid:
 EXPORT a UNION b FORMAT CSV TO "myfile.csv";
 EXPORT p WHERE x = 3 FORMAT CSV TO "myfile.csv";
 EXPORT a UNION b WHERE x = 3 FORMAT CSV TO "myfile.csv";
 EXPORT a UNION b WHERE x = 3 {x, y, z} FORMAT CSV TO "myfile.csv";
Note that only that last example has an explicit specification of columns, i.e., {x, y, z}. This is part of the relational expression, not the EXPORT statement.

We might wish to exclude a column from the output, in order to produce a bag with explicit duplicates. That is part of the EXPORT statement, because it has no meaning (or value) within relational expressions. We can do things like this:

 EXPORT a UNION b EXCLUDE q FORMAT CSV TO "myfile.csv";
 EXPORT a UNION b WHERE x = 3 {x, y, z} EXCLUDE z FORMAT CSV TO "myfile.csv";
Now let's look at features: So we have relational expressions with feature 'A' but not 'B' or 'C', because 'B' and 'C' are meaningless in the relational algebra.

Conversely, and for the same reason, EXPORT implements features 'B' and 'C' but not feature 'A'.''

Note that from a user interface point of view, it's all one "query", e.g.:

 EXPORT myRelvar {k, a, b} ORDER(ASC a) EXCLUDE k FORMAT CSV TO "myfile.csv";
The fact that this is a combination of two distinct operator categories -- one being relational expressions, the other being EXPORT -- is largely invisible to the user, who need only think in terms of queries. Conceptually, however, relational expressions and EXPORT are completely different beasts with no overlap. Their functionality is mutually exclusive.

Therefore, it's not "almost like having a Print() statement and a Print_Black_And_White() statement." It's more like having a Print() statement and expressions, and the Print() statement can present -- with special formatting (i.e., specify a target and optionally exclude columns) -- the result of evaluating an expression.

First, I object to excluding EXCLUDE from queries.

There's a distant relative available for projection (and other column selection) lists called ALL BUT. For example, given the following RelVar definition...

 VAR myvar REAL RELATION {x INTEGER, y CHAR, z RATIONAL} KEY {x};
...You can project on x, y by inclusion with:
 myvar {x, y}
...Or by exclusion with:
 myvar {ALL BUT z}
The above examples are equivalent. However, ALL BUT in a projection and EXCLUDE in an EXPORT are very different things. The EXCLUDE clause in EXPORT allows you to drop one or more columns and possibly emit duplicate records. Projection always produces a relation, which (by definition) means it will never emit duplicate tuples.

Second, most existing RDBMS have various kind of export commands, but they run on the server side, not the client side and thus are usually of little use to typical ad-hoc query users. That the query language has knowledge of or access to the local file system may be assuming too much. The query engine typical does not know what the end-user is going to do to the result set. In many client-side query tools one can browse the result set in a data grid and then later decide to save that info to disk, and this info is not sent back to the DB engine and it would be too late even if it did. Is Rel growing into a network OS also? Something doesn't make sense. -t

Rel is a desktop DBMS, a server DBMS, and a general-purpose programming language with integrated database capabilities. I'm working on facilities to seamlessly link server Rel DBMSes to your desktop Rel. That will allow you to run EXPORT locally to export data from your local database or remote databases.

I applaud your effort, but it may be safer to walk before you fly by releasing a basic RDBMS first and get feedback on that rather than try to build the whole kitten caboodle first. And just because an integrated client is available doesn't mean it will be readily available in a production environment. But we'll see.

{RelProject works, today, which is more than you can say for any project you've ever advocated. Why don't you start crawling and build your SmeQl.}

I'm working on a different open-source database tool project right now, kind of a web-based half-Toad/half-MS-Access. Originally it was to incorporate SmeQl, but I canceled that idea for various reasons, mostly because while writing the doc, I decided teaching the user both the tool and SmeQl was too much to absorb. -t


Yes, other tools can do the necessary formatting, but it's usually more set-up steps. If you find yourself having to install, use, and run two tools together, that suggests they should become one (or at least extract and refactor the related features).

[No, that means the tools should be part of the same toolset. Just because you need to use ProceduralProgramming and SQL queries together doesn't mean SQL queries should be capable of procedural behaviors. This has already been explained to you, multiple times, and you've seen plenty of examples... i.e. TutorialDee/RelProject can perform EXPORT or run WRITELN in a FOR loop; you do not need to install or run a separate OS process to use these export and imperative tools, and yet the RelationalAlgebra of TutorialDee cannot call EXPORT or WRITELN, and thus remains independent of (and uncorrupted by) those external tools. Maybe it would help if you start thinking of RelProject and SQL DBMSs as 'toolboxes' rather than as individual 'tools'.]

Yes, but you had to introduce or rely on imperative operations to do it. There are better and cleaner linguistic and/or layering approaches to the same thing.

You have not demonstrated clearly what the "compromise costs" [downsides of bags] are. You say I'm ignorant of such costs. Yes, you're right, I am. You are not explicit enough. Your justification is HandWaving ("bad things happen, just trust me."). Reduce your explanations to a finer granularity and maybe you'll hit it.

[These were explained in BagAtational, and have been explained elsewhere, and are well known in the literature. Your ignorance is not due to stupidity on your part, nor due to lack of detail available to you. Your ignorance exists because you are an ignoramus: you actually like to be ignorant. You are nobody's student. You don't even self-educate.]

And there's also the issue of being able to hook up to existing DB engines. If you follow their conventions, then hook-up is easier.

[Sure, it's easier. RelProject will need to include some extra facilities for hooking up to SQL DBMSs, such as auto-numbering tables that potentially possess duplicates. On the other hand, why have a new DB engine if you're simply going to follow the conventions of existing ones? If 'easy implementation' is a priority, you should probably stick with SqLite or MySql or Oracle... because the 'easiest' implementation solution is to use an existing implementation.]

The existing DB engines are not the main problem (although I'd like to see DynamicRelational). SQL as a language is the main problem. I'm not bothered by bag and Null support.


OctoberTen


EditText of this page (last edited November 11, 2010) or FindPage with title or text search