Rel Export Discussion

Moved from BagNeedScenariosReWork:

The Rel EXPORT is sometimes presented as a solution to removing primary keys from "output" of queries. However, as described in ComplexityOfOutputtingDuplicateTuplesInTutorialDee, similar file- and text-writing commands in vendor languages such as Pl/Sql and TransactSql are not typically available to app programmers and non-DBA query tool users. The RDBMS may not even know what the client will do with a result set, and thus cannot give a different treatment (such as stricter uniqueness rules) depending on what is to be done with the result set.

Further, query tools often cache a result set locally, and then the user can decide to save that cached result as text long after the query is complete. Thus, the query itself cannot know what is to be done with the result grid/table. In practice, I see very little reason to make a "hard-bordered" distinction between emitting tables and emitting "output". It's better reuse of concepts to share such features. -t

Rel EXPORT was an example of removing primary keys from "output" of queries in order to allow duplicates, as was the "FOR" example before it. However (as has been pointed out before) in database systems as they are typically used, there is a client language which can even more trivially handle or produce duplicates simply by ignoring primary keys. There is no "hard-bordered" distinction between emitting tables and emitting "output". There is a hard-bordered distinction between calculation and presentation, just as there is in almost every other programming language. What you seem desperately unwilling to understand is that relational projection is not, not, not emitting tables or emitting "output". It is calculation; an operation that generates a new relation from an existing relation given an attribute list. It has no more to do with "output" in a relational language than "+" has to do with output in a conventional programming language. In Rel, for example, projection does not produce user-readable output. Nor does UNION, INTERSECT, WHERE, SUMMARIZE, or any other relational operator. OUTPUT, WRITELN and WRITE produce output given a tuple, scalar type, or relation as input.

This issue was discussed at DuplicateTupleOutputDiscussion, PageAnchor "formatting_01", but never resolved. My view is that the concepts overlap in practice and it's best to take advantage of that to avoid reinventing the wheel and creating unnecessary extra steps or layers. I care more about tool utility than definitions. If my scissors can also serve as a basic knife with a few minor tweaks, then that reduces the quantity of times I have to reach for a (dedicated) knife, making me more productive and the company more profitable. If overlapping "breaks" the definition of "scissors", I don't give a flying flip. My job is to be productive, not make dictionaries. -t

It was quite clearly resolved -- in the RelationalModel, projection is a relational operator; it is not a presentation mechanism. Indeed, the RelationalModel does not define presentation mechanisms, so they are implicitly outside of any relational operator. Cf elementary algebra (i.e., "school maths"), which similarly does not define presentation in +, -, /, etc. If you wish to create operators that have side-effects like output, that's fine, but it's not the RelationalModel. Like SQL, it will be something else. Rel is an implementation of the RelationalModel, so it adheres strictly to the theoretical separations.

Transformation is transformation. Call it a relational operator, call it "formatting", call it pizza making, whatever. It's still transformation. "Output" is relative and the query language often cannot know what the user/requester/app will do with it.

Projection is transformation from a relation to a relation. EXPORT (and friends) is transformation from a relation to a bag and/or other structure. The relational algebra is closed under relations, so projection returns a relation, not a bag.

[I venture to guess that Top is missing this important distinction. It's easy to do, if you're not thinking beyond what you're accustomed to from SQL. In SQL, "SELECT" (and, less frequently, UNION) is used both to produce an intermediate query result, and to produce output. In a language like Rel, these are distinct concepts. That is not a liability in any way whatsoever. (Analogously, in SQL, you can't use the phrase "INNER JOIN" to produce output by itself, it produces something that can be consumed only by a FROM clause. This is entirely normal.)]

[Looking at it from the point of view of types produced by expressions, relational expressions produce relations, period, and relations have certain specific characteristics that you can't talk someone into changing without forcing a complete redesign into a language that is then no longer relational. Converting a relation to some form of output is the job of "EXPORT", which produces a sequence or rows. A sequence has an ordering, and it can have duplicates -- it's a different beast from a relation.]

[There is no unnecessary cost associated with this, either runtime or conceptual. Conceptually, it's simply how the language works and no more or less complicated than hundreds of other very simple distinctions made in any language. The runtime cost of producing output is inherent to the task; EXPORT makes this explicit, whereas in SQL the cost of producing output versus providing an intermediate result for further calculation is implicit in the context in which the SELECT appears. Does that help clarify anything at all?] -- DanMuller

As already stated and linked, EXPORT would not typically be available in practice any more than the file and text features of Pl/Sql or TransactSql are available for use to typical query users and apps. Thus, EXPORT will not solve the standing issues. The industry has settled in a de-facto "standard" in the way query languages, query transport mechanisms, and query browsers relate to the RDBMS.

The typical query session can be more or less modeled with the following steps:

1. The API or query tool ("tool" for short) sends the text of the query to a RDBMS. This text is typically SQL, but could in theory be other query languages.

2. The RDBMS sends a "return packet" back to the tool. The return package will include:

A. A data "table", essentially a rectangular "grid" of data. We'll consider this CSV text in our model. The table can be empty.
B. A very basic data dictionary describing the name and type of each column, per RDBMS vendor's type names.
C. Optionally, primary key information, but this is hit and miss such that one typically does not rely on it.
D. Vendor-specific meta-data. One typically does not rely on this except if the conditions of queries are constant and known.

Note that there is no prevision to indicate what the tool is going to do with the result package. It's a basic contract without info on intended usage. The contract says, you supply this, and then this is what comes back.

Within this framework, a Rel RDBMS has only two choices:

1) The EXPORT command is treated similar to an SQL SELECT clause to determine the "shape" of the return data grid. This essentially makes Rel have a SELECT clause whether it "wants" one or not.
2) Anything in the EXPORT command is ignored. It would be treated similar to an UPDATE clause where there is no result grid returned. This would render EXPORT useless from the perspective of anybody or anything using the above framework.

[Most of your answer made no sense, except for item 1) just at the end. Yes, if you really must think of it as "SELECT", then do so and be done with it, realizing that this "SELECT" has only the role of shaping the query output, and no role in relational expressions. The statement "would not be typically available" is nonsensical and seems based on some weird misunderstanding of your own; whatever is part of the language, is available. The details of how you think current systems work under the covers is entirely irrelevant, what matters is what's sent and what's returned. There's no conceivable reason whatsoever for a new system to conform to (your notion of) the inner workings of existing systems. (There is, BTW, indeed a provision for knowing when a query in a typical SQL system is being executed to develop output; it's implicit in whether a query is a top-level statement or an embedded query. The distinction is even made in the SQL standard; it's just not highlighted by a distinct keyword.) I've made my attempt at contributing some clarification, but won't participate in the further proliferation of pages to debate endless permutations of completely irrelevant -- and invented -- details.] -- DanMuller

It's not "endless permutations" and it's not "invented"; it's what the industry supports and expects. Ignore reality at your own peril. I've used dozens of products with dozens of drivers in dozens of companies. Many drivers support the minimum necessary to "get the job done" because driver writers are often lazy, and the "minimum necessary" is usually to deliver a tabular data-set after the query text is sent. There is no wide-spread support for any notion of output to CSV or a screen or an associative array in PHP or whatever. That info is usually not available to the query text processor (RDBMS). Thus, the EXPORT command is useless in such environment. It does not know what will be done with the result table.

In fact, it can't know in many cases because the user can choose to do with the result whatever he/she chooses after the query is processed. I can look at the result table on screen in Toad, for example, and choose to save it as CSV or clear it off the screen forever. The query is already processed and cannot know the future, which is my decision. Is this not true? There is no technology that accurately predicts the future.

Further, the specifications for ODBC, JDBC, SQL, etc. often offer a lot more than what actually works in practice. Vendors and driver writers can ignore what they want without penalty, and/or the shop's RDBMS has not been configured for a certain feature. You guys keep saying the spec may allow this and it may allow that, but it's a crapshoot if it actually works in a given shop. True, I haven't done actual tests, but neither have you.

You appear to be HandWaving the fact that reality won't properly support Rel's EXPORT command except in an obscure custom arrangement. (I feel dirty for using the "H" word. Now I have to shower.) SQL has at least been reality-tested for many years. It's not perfect, but at least it satisfactorily addresses some very practical issues.

--top

Top, see above where I wrote, "Rel EXPORT was an example of removing primary keys from "output" of queries in order to allow duplicates, as was the "FOR" example before it." You are making an Everest out of a very small, insignificant molehill. Whatever consumer exists for a relation -- whether EXPORT on the server side or some arbitrary bit of client-side application code -- can trivially produce duplicates if needed. The only place where this becomes difficult is if you're using some limited-capability pre-packaged JDBC/ODBC client that does not support dropping columns from the result set, perhaps because it assumes SQL is being used. Obviously, this would be a rare circumstance.

Why would this be a rare circumstance? It's the norm in fact because SELECT was (is) sufficient to control the "shape" of the result grid. The existing infrastructure assumes this is the case and does not have a secondary mechanism for filtering columns because it was never needed. Rel assumes either a more complex client or more control over the client. More on this below.
It's rare because the vast majority of database accesses are via client-side applications, which issue a query and then retrieve specific columns (attributes) by name or sequence number, or request metadata about the query result and then retrieve specific columns (attributes) by name or sequence number. It's rare because even when command-line access is required to a DBMS, it's usually via the DBMS vendor's own command-line utility (e.g., Rel's DBrowser, Oracle SQL Developer, pgAdminIII, etc.) which provides facilities tailored to its DBMS. You're apparently addressing a circumstance where a command-line or administrative utility, or some application program, assumes the DBMS is SQL (or SQL-like) and yet it allows TutorialDee (which Rel implements, but not SQL) queries, but it assumes all column identifications will be pre-specified via (and only via) a SELECT statement. So, in that case, regardless what query is sent down the pipe to the Rel DBMS, all columns in the result that comes back must be displayed on the screen, exported to CSV, or whatever. Thus, presumably, there will be circumstances where you can't generate the desired output because there is no equivalent to the column list in a SELECT statement. For example, you issue a Rel query like "a UNION b" and are now forced to create a CSV file containing columns x, y, z, p, q and r when all you want is x and y. You could, of course, issue "(a UNION b) {x, y}", but then you lose the duplicate rows in x and y that you presumably need. Sure, yes, that could happen. I submit it's a vanishingly rare situation, solved trivially by using a different client-side utility. If you don't have control over that, I doubt you have the freedom to choose to use Rel either, so it's a non-issue, isn't it?
Shops that use a lot of different DB vendors like to use a single tool rather than switch between vendor-specific query browsers in part to make it easier to gather data from or transfer between multiple sources. And the learning curve is shorter because the interface is the same regardless of DB vendor. But yes, a dedicated Rel browser/client can indeed to anything it pleases.
As far as "vanishingly rare", I disagree. However, we'll LetTheReaderDecide and not argue any further about probabilities.

[I wrote one set of comments on this, and then decided I'd misinterpreted Top's objection. It's hard to understand the objection; most of it seems completely irrelevant. An API between a host language and a DBMS typically lets a program send a command in the DBMS's query language and, optionally, receive a result back. One category of commands is a request for query output. If EXPORT or TRANSMOGRIFY or TO_ARRAY or whatever is the command in the DBMS's language for getting output, then that's what you'd send. You wouldn't send a SELECT statement to a DBMS that doesn't understand SQL, which is the context of this discussion. DBMS-specific drivers written to the API handle other details, as they always do; particularly transport issues and translation of representation. No subsystem knows what a user intends to do with its output, it only knows what it was asked for, so I don't know what the comments about intent relate to. Bringing up flaws in the implementations of existing drivers for other DBMSs, or problems with misconfiguration of those DBMSs, is also mystifying. They're not the DBMSs being discussed.]

Regarding these two flow sequences:

 A. RDBMS --> Driver --> Tool/Lang-Specific-Adapter --> Output
 .
 B. RDBMS --> Driver --> Output

You guys seem to be assuming the "B" configuration where your handy dandy Rel driver can simply adjust the output as it sees fit. A good many, perhaps most, tools fit configuration "A". Your output has to conform the adapter. In some cases you may be able to force "B", in others not because it's controlled the tool vendor. For example, the tool vendor may not allow you to control how CSV is created and saved. A work-around is to let EXPORT act like a SELECT statement and adjust columns as needed, but you've merely re-invented a SELECT statement clone despite complaining about them.

Do you really think MicrosoftAccess is going to let your query commands and/or ODBC driver dictate that the output format is CSV and control what it looks like and where it is saved? It shouldn't. In fact, that's a security breach. Your driver will usually only be allowed to deliver tables and only tables.

Even if you driver has the ability to write text files, existing tools probably will not use that ability since they didn't have to with SQL and often have their own mechanism for making CSV etc. If you include direct file writing it your driver, that function/method will simply be ignored. You are trying to change the query culture, not just the query language.

Now, you could probably make your own driver or API for a language such as C, and make it do anything it wants, including save to files. Web language communities may complain, however, because the driver could potentially be a security breach if it can save to files.

[Your concern still doesn't make sense to me. The output side of a driver (and I'm assuming a client-side "driver" here; these really aren't "drivers" in the classic sense of an operating system driver) has to conform to whatever API it is implementing. E.g. in ADO.NET, results are usually returned as a DataSet object, or are accessed via an object that implements the IDataReader interface. How the data that populates that object is transmitted from the RDBMS to the host program is the driver's concern. In the latter case, the specific concrete type of the object is even left entirely to the driver, specifically to give the driver flexibility in implementation. I really don't see where CSV or files come into the picture.]

This was partially addressed already with the "C" example. Yes, you can make your own driver with whatever features you want. However, it doesn't make sense to reinvent what you call "formatting", such as conversion to CSV. It might be possible, but will result in confusion and duplication. For example, there may be many existing API's or methods to convert a DataSet object into CSV etc. If you make your own, you are unnecessarily reinventing the wheel and circumventing conventions. But, I was talking more about non-programmable or semi-programmable tools such as query browsers.

[And as I've already stated, if you want to think of an EXPORT-like statement as being a SELECT, fine, as long as you stop ignoring the distinction that it produces output, not a relational result. It takes a relational result as input, produces something that need not be specified for transmission to the host, and the DBMS-specific driver transforms that something into whatever the host program's DBMS API requires. When considering the output data flow, this is no different than how things operate today with SQL-based systems. Your alternative A is always the case; at the very least there's a bit of code to write a data structure to a screen. But it doesn't matter, as long as that code is written to an API that is faithfully implemented by the driver.]

Which usually requires a table-shaped structure of some sort. And I'm not "ignoring the distinction", rather you are trying to force a distinction to conform to your idealistic world view. Your EXPORT will confuse developers and will probably cause inconsistent handling by query tool makers as some will ignore it and others will treat it just like a SELECT statement.

I've been getting feedback from developers and database students using Rel since 2004, and have never heard anything -- well, until you came along -- to suggest confusion between output formatting and relational operators & expressions. Indeed, that seems to be quite intuitive.

They are probably using the native client.

Why would that make a difference? The language -- TutorialDee with extensions -- is the same no matter what client is used.

No, it is NOT the same thing. The "shape" of the result set is limited in practice. You can do a jillion different things to generate that result set, but still that's the only thing you can deliver in the end if you want to "communicate" with the majority of existing query infrastructure tools. Again, take MS-Access or Open-Office Base. Why would Access let your custom driver write to CSV files etc. outside of it's existing framework??? Even if your custom Rel driver had that capability, you can't force MS-Access to use those extra methods (other than tricking it by hiding it under a more common method and lie in the description). -t
Shape of the... What? Sorry, not following you at all. What is this "custom driver write to CSV files etc. outside of it's [sic] existing framework" about? Your understanding of ODBC/JDBC (that's what you mean by "driver", yes?) must be very different from mine. Given I've been working with these things since there was an ODBC (and, later, JDBC), I suspect it's your understanding that's at fault. I know of no ODBC/JDBC driver that emits CSV files. All ODBC/JDBC drivers provide an API that interfaces with a specific DBMS and accepts queries and returns ResultSets. What application programs using that API do with the ResultSets is up to them. Any application program using such an API that is unable to exclude columns -- because it presumes, unconditionally, that they will be specified by a SQL 'SELECT' statement -- is flawed. It's even flawed with SQL. What if the user specifies "SELECT * FROM myTable" which returns columns x, y, z, p, q and r, and then decides he wishes to exclude x, y and z? [Or, for that matter, a "aTable UNION bTable", which is also a legitimate query statement in SQL, and does not involve a SELECT at all.]

[Rather than concede a point, he descends into completely absurd nonsense. Tell me, Top: In SQL, do you find the distinction between the uses and purposes of the keywords INNER JOIN and FROM to be confusing? If so, do you ignore one of them?]

I honestly don't know what the hell you are talking about. Your analogy makes no sense to me.

[Quoting from your text: "Your EXPORT will confuse developers and will probably cause inconsistent handling by query tool makers as some will ignore it ..." This is analogous to saying that "Your FROM clause will confuse developers and will probably cause inconsistent handling by query tool makers as some will ignore it...". EXPORT (or whatever output statements Rel supplies) are part of the query language. It's no more or less confusing than any other part of the language. (Notwithstanding the fact that it seems to confuse you.) Really, neither of the two people still conversing with you on this subject can understand what you're on about. Don't you think that's an indication that you're misunderstanding something? The only guess I can come up with is that you have made some erroneous assumption about how the host language <-> RDBMS communications pipeline works, but I can't figure out what that assumption might be.]

The two people in question don't have enough practical experience in a "typical" environment in my assessment. They sound like well-educated but field-naive newbies to be frank. I don't trust their judgment on such matters.

I'll try to explain this another way with some pseudo-code. Let's say you are making a "typical" query browser tool (related: ListOfQueryBrowsers). The internal code for the browser may resemble something like this:

 // code in Query Browser Foo 3.0
 row = new odbc.row;
 securityContext = new odbc.security(.....);
 d = new odbc.driver(dropDownList.odbcVendorKey);
 resultSet = new d.executeQuery(queryTextBox.text, securityContext);
 while (row = resultSet.getNextRow()) {
   displayToScreenGrid(row);
 }
 ....
 /* event Save-As */
 ....
 if (event.menu = "save_as_csv" && resultSet.hasData) {
   f = new file(filePrompt(...));
   resultSet.resetCursor();  // cursor back to 1st row
   while (row = resultSet.getNextRow()) {
     li = makeDelemitedString(row);
     f.writeLine(li);
   }
 }
 ...
 function displayToScreenGrid(odbc.row r) {
   string colValue, colType;
   for (i=0; i < r.columnCount; i++) {  // for each column
     colValue = r.getColumnValue(i);
     colType  = r.getColumnType(i)
     /* display value on screen grid */
     ....
   }
 }
 string function makeDelimitedString(odbc.row r) {
   string colValue, colType, lineString;
   lineString = "";
   for (i=0; i < r.columnCount; i++) {  // for each column
     colValue = r.getColumnValue(i);
     colType  = r.getColumnType(i)
     /* convert to delimited */
     if (! isOneOf(colType,"numeric,float,double,integer,etc")) {
        colValue = "'" . colValue . "'";  // surround with quotes
     }
     if (i != 0) {
       lineString = ',' . lineString;  // add delimiter
     }
     lineString = lineString . colValue;
   }
   return(lineString);
 }

Now regardless of the specific ODBC driver, it's still going to call getColumnValue etc. Essentially the ODBC API assumes a tabular grid and processes tabular grids. The ODBC driver writer cannot change the above code, for it's part of the query browser, NOT the ODBC driver. And, it already has features to save and convert the result set buffer to CSV files and decides what context to do it in.

Even if the Rel ODBC driver author puts methods/features into their driver to get EXPORT results, the query browser will not use those methods (without a special rewrite). The only way to pass query results to the Foo 3.0 query browser is thru a tabular-shaped result set, which has rows and columns. And if EXPORT wants to get it's results through, it has to produce a tabular result set. In other words, act like a SELECT statement, which is what a Rel query withOUT the EXPORT key words also generally acts like. -t

[I think that the objector, although perhaps quite experienced with high-level DBMS tools, hasn't got the first clue about how the tools that he uses are implemented. I can think of no reason that the above would not work with an alternative RDBMS like REL, provided that queryTextBox.text returns a correct REL output statement of some sort. If you're talking to SQL database, that would be a query statement, e.g. a SELECT. When talking to REL, it would be EXPORT or some such. None of this code cares in the least what the query syntax is, and there is absolutely no reason that the driver would not return a resultSet that worked just like one from any other DBMS. If you dispute this, then I'm stymied.]

There appears to be a misunderstanding somewhere. Let's revisit the following line:

  resultSet = new d.executeQuery(queryTextBox.text, securityContext);

This runs on a client; let's say query author's desktop. "queryTextBox.text" is just the content of a GUI text-box widget, similar to an HMTL "textarea" form tag. The client generally does not parse that string (or is at least not obligated too). It simply sends the string to a RDBMS server. More explicitly, it sends it to the odbc driver, which then forwards it to the RDBMS server. (I added "securityContext" which may just be username, password, etc.)

It's true the driver potentially has a lot of options at this point, and itself could potentially parse the query language (Rel) and/or let the DB server do it. HOWEVER, it does NOT have a lot of options in terms of what it can hand back to the query browser (client), because the query browser uses a fixed interface to the results. With polymorphism, you can change the implementation, but you cannot change the interface without rewriting the software that uses that interface. I hope this clears things up. -t

Re: "and there is absolutely no reason that the driver would not return a resultSet that worked just like one from any other DBMS."

Please clarify. If it's the same, then what is the purpose of EXPORT? And what would 'EXPORT...FORMAT CSV TO "myfile.csv"' mean to the client? (Suggested Rel syntax from the pro-Rel side). -t

[There certainly does appear to be one or more misunderstandings, but their nature hasn't been identified yet.]

[Forget EXPORT for a minute, OK? You're fixated on something wrong. EXPORT doesn't appear to be defined in Rel anyway, according to the version of the grammar currently on its Web site. (http://dbappbuilder.sourceforge.net/Rel.php#) I think it was mentioned hypothetically. In any case, I don't really know Rel, except insofar as I have some familiarity with Tutorial D.]

EXPORT is not in the current release. It's a work-in-progress.

[There are many ways that Rel could be designed to produce output for a host program. Let's assume e.g. that if you send it an expression (as opposed to a statement), it sends back a representation of the result. That's a handy characteristic of many REPLs. We don't know or care much about the detailed form of that result, except that it should be a complex enough representation that it completely describes the result. Now since this is a strict relational language, if you send it a relational expression, you'll get a relation back. Well, a relation has no defined order for its rows, and it doesn't allow duplicates, so although that's useful, it's not really adequate. But -- and this is important -- the driver for a given API should have no trouble translating the representation to conform to the API's representation of a query result.]

[OK, so how do we get something a bit more like what you can get from a SQL DBMS? Easy. The result of a SQL query is more like an array of rows. Arrays define an order of their elements, and unlike sets, can have duplicate elements. So let's hypothesize a syntax for converting relations to arrays: <relation-expr> AS ARRAY {<attribute-list>} ORDER (<order-item comma list>). You with me so far? (The current Rel grammar doesn't appear to have anything like this, although it has a less flexible way to load a relation into an array variable. This is all hypothetical, just like the rest of the discussion has been.) The brace-delimited list specifies what columns to include in the resulting array. (I'd make this list optional if I were actually defining this language addition). You can imagine what the <order-item comma list> might look like. (Again, I'd make the whole ORDER clause optional. Its syntax is based on existing Rel grammar, see earlier link.)]

Rel has something precisely like that. It's 'r ORDER()', where 'r' is some <relation-expr>. ORDER() returns an ARRAY. If no arguments are provided, the order is undefined. A specific order can be imposed as, e.g., 'r ORDER(DESC Name, ASC Date)'. This is a Rel short-hand for some slightly more awkward TutorialDee constructs. However, this is not "a solution to removing primary keys from 'output' of queries" as mentioned at the top of this page. That said, I could easily add an "EXCLUDE" keyword to ORDER -- e.g., 'r ORDER(DESC Name, ASC Date, EXCLUDE CustomerID, EXCLUDE CustomerPhone)' or something similar. Top, would that meet your requirements, at least in concept?
[Ah, thanks for that explanation, that definition of ORDER() makes sense. I would much prefer to be able to list the columns that I want in the result, rather than listing those that I would like to exclude. This, to me, makes the code more usefully self-documenting. I'll also mention here that, in general, I'm not wild about Tutorial D or Rel syntax at all. It strikes me as being too much like SQL, with a large collection of varying syntax productions. I'd much prefer a syntax based on more on the syntax function calls, with fewer infix operators and special keywords to introduce clauses. But that's really a matter of personal preference. Also, I puzzle over the LOAD statement for putting an array in a variable. What's wrong with simple assignment, from e.g. an ORDER() expression?]
You'd have to ask Date &/or Darwen why TutorialDee (and, hence, Rel) syntax is the way it is. I can only speculate, so I won't. As for LOAD, I recall seeing some note to the effect that it was made deliberately awkward in order to actively discourage students from creating iterative solutions to relational problems. Note that TutorialDee was intended to be a teaching tool and an illustration of Date & Darwen's abstract "D" language specification; practical programming is meant for the "Industrial D" implementations -- of which there will some day be many -- each one providing a syntax (or even paradigm) favoured by a particular group of programmers. Defining ORDER() as an operator rather than a sub-clause of LOAD (as it's defined in TutorialDee) is a Rel-specific extension borne of my frustration with being forced to endure LOAD's (pedagogically reasonable, but irritating to the point of intolerability) awkwardness.

[So now, if you want a more array-like result, you send something like e.g. "tableA AS ARRAY {colA, colB} ORDER [DESC colA]" as your query. The system sends back a representation (which we don't have to specify to have this discussion) of an array of rows. Again, the RDBMS-specific driver should have no trouble exposing that through your typical host API as a query result. And as far as a higher cost of execution goes compared to a SELECT statement that does something similar: Forget it, there is no higher cost. Conceptually higher cost? Not really. Once you've wrapped your head around relational algebra (which would be a prerequisite for using a language like Rel with any success at all), then the distinction between the characteristics of a relation and an array are completely natural, and you wouldn't have to think for more than an eyeblink to decide which you want in a given situation.]

I was responding to the hypothetical EXPORT examples. I agree there are many ways to "format" the results to give one the power of SELECT, but why go for a kludgy, two-layered ugly copy? SELECT has proven convenient and practical. Keep what works unless you find a nicer replacement, not a round-about replacement. If you want to "protect" result sets from getting "into" the RDBMS, then simply forbid duplicate-containing result sets from being fed to other tables and queries within the RDBMS, and ignore internally nested "ORDER BY" clauses. (I'm not saying actually have a SELECT keyword, but rather stop adding extra keywords like ARRAY and EXPORT to get the same thing. Don't gum up the language with mostly-superfluous clauses. They are solutions looking for a problem.)

[Like I said, you're fixated on the wrong thing. The discussion was never about a particular syntax like EXPORT, although you try to make it so. It's about how to provide output with duplicates (and order) from a system of relational operators. This entire paragraph is bankrupt. A) this is not really about formatting, it's about an operation applied to one type of value, producing a result of a different type, which of course is a boringly commonplace thing in programming. B) It is not kludgy, it's systematic (what you would call "pure"), which is a good thing, it makes comprehension easier, not harder, for most people. C) It does not require a copy of anything. D) Calling SELECT convenient and practical makes me laugh. With its conflation of renaming, omission, projection (by adding DISTINCT), its sometimes implicit summarization, the numerous optional clauses it has accrued in addition to DISTINCT, both standard and dialect-specific, it's a mess. The order of all the clauses of a SELECT expression -- SELECT, FROM, WHERE, GROUP BY, HAVING, ORDERING -- have little relationship to the order in which they're applied. Scoping of names is a nightmare, to the extent that numerous implementation have mistakes. Columns can be referred to by their ordinal positions in some contexts. The list goes on. As I write this, my laughter fades to tears. E) This may seem roundabout to you, but to most programmers, invoking an operation to transform a value of one type to a value of another type is perfectly natural, and makes far more sense than changing the very definition of the first type to be more like the second type. F) Proposed keywords or operations like ARRAY or EXPORT do not "get the same thing", OBVIOUSLY, otherwise they wouldn't be proposed. (Or, if you meant "the same thing as a SELECT", then of course something needs to be added to get array-like results because the language doesn't have SELECT.) How could you write anything that entirely obtuse, as if you had just joined the conversation in the past minute?]
PageAnchor: David_Letterman
A) I was not the one who created the "formatting" argument. Second, "type" is not necessarily the best model/abstraction. For example, one can view a set as a bag that "has-a" uniqueness constraint. "Is-a" is not necessarily the right choice. B) Comprehension is your measure? Is that a WetWare argument I hear? Wasn't I recently lambasted for relying on WetWare? [PageAnchor Psycho_A] C) Maybe you should try it. D) Until I see something better, it's the least evil right now. Your comments about the other SQL sections are mostly irrelevant since we are not talking about them. If I had my choice, I'd go with SmeQl, which would eliminate most of those issues. E) See "A" again. F) Your ARRAY is too close to tables to bother making them a different type. You don't have a full set of relational-like operators for ARRAY or BAG or whatever you call it this week, but only for "pure" relational tables? Why the operational bigotry? And I meant too close to SELECT, but in a query tool environment, ARRAY and EXPORT would probably also serve a too-similar function. You need to refactor your output mess.

[Coming back now to your 'EXPORT...FORMAT CSV TO "myfile.csv"': It seems fairly obvious that a hypothetical statement like this tells the RDBMS to send a query result to a file, in a specific format. You wouldn't send this statement to the system if you expected a query result back, just as you wouldn't send a command like "BACKUP db1 TO FILE 'mybackup.bak'" and expect a query result back. I can't fathom why you would even entertain the idea that you would send this and expect a query result back.]

Somebody else suggested that EXPORT was a solution to get "output" duplicates (and possibly sorting) as query results, not me. Your criticism of the idea sounds a lot like mine, actually. Maybe you can explain it to the other guy(s); I didn't have much luck.

Here's a compromise: have an optional "ALLOW DUPLICATES" clause in Rel for roughly 5% of the time where you want them in the "outer" results.

Where do you propose this clause appear? The only place it could even remotely be considered is as part of, e.g., the ORDER operator (which generates an ARRAY) or some other ARRAY generator. Relations cannot contain duplicates, just like an integer can't contain a fractional portion or a floating point number can't contain the letter 'p'.
If it makes life easier to have integers grow donkey ears and fly, then By George, let them fly. If they don't qualify as "integers" anymore, tough titties. Practicality guides me, not Webster's.
- {I believe you apply this flawed philosophy even to logic: if it makes life easier to use fallacy and HandWaving, then By George wave those arms! If it doesn't qualify as "logic" anymore, tough titties. Delusion guides you, but it sure seems practical in your mind - which, according to your beliefs about WherePsychologyMatters, is the only relevant concern.}
- PageAnchor Psycho_A
- Not the only concern, but one under-rated by you. If I'm delusional, then so are Oracle and IBM. They also realized that bags sometimes better model the domain and can reduce resource usage. Go ahead and lock them up in Purity Jail.
- {You've not presented the logic of Oracle and IBM, nor even a document from them claiming that they're keeping bags around for the reasons you listed. And it is your logic - or utter lack thereof - that gets you in trouble. If I said TheSkyIsBlue? because of donkey-eared integers, I'd be an illogical nutcase, even though the sky is blue. If I said that bags are necessary and helpful because of TopMind's BagNeedScenarios, I'd be an HandWaving crank, even though modern RDBMS's do support bags. If I claimed without evidence and as an outsider that Oracle and IBM keep bags around to "better model the domain", I'd be a fraud because I would know that I'm making shit up.}
You need to show that flying donkey-eared integers provide an advantage without introducing new problems. Likewise, I'm not convinced that deprecating the relational model improves the situation without causing unpleasant problems -- e.g., introducing inadvertent duplicates in query results whose inputs have no duplicates -- that we've already discussed elsewhere. Fortunately, as I've suggested above, we don't have to do that. See my point above that starts with "Rel has something precisely like that." As I wrote there, I could easily add an "EXCLUDE" keyword to ORDER -- e.g., 'r ORDER(DESC Name, ASC Date, EXCLUDE CustomerID, EXCLUDE CustomerPhone)' or something similar. Top, would that meet your requirements, at least in concept?

But anyhow, it will probably still come down to expressiveness and convenience versus what I call "purity" of set-centric relational. We'll probably never see eye-to-eye on that philosophical issue. -t

{You have not demonstrated "expressiveness and convenience". Your BagNeedScenarios are weak, only marginally better than your justification for donkey-eared integers. Your argument isn't respectable as a 'philosophical issue' because it fails to first be respectable as an 'argument'.}

I could say the same about your stubborn sets-or-death stance and weak justification. I stand by my statement that bags sometimes better model the real world and better fit existing infrastructure and conventions.
Databases don't "model the real world". That's the role of simulators. Databases record facts about the real world. Why store a duplicate fact? If it's true that I have ten cans of cat food or that the printer is offline, I only need record these facts once. Recording them more than once is redundant. Of course, there may be systems outside of the database that emit or wish to consume duplicate data. Fortunately, as we've shown repeatedly, it is trivial to deal with duplication in the world outside the database without compromising the integrity within it by housing meaningless or even erroneous duplicates.
Sounds great on paper. You should perhaps be in politics, where dreamy idealism sometimes makes a career (while the real stuff is nitty-gritty under-the-tabel bribes.....I mean negotiations.)

[This is pretty much a throw-arms-in-the-air-and-walk-away moment in the discussion. There's nothing about the proposals that sacrifices expressiveness or convenience. They're merely different from SQL, and much, much more like general-purpose programming languages, both new and old, in making a clear distinction between different data types and the operations that you can apply to them.]

"Clear"? I think not. Gee, that sounds like a WetWare-based argument. -t

[Please, don't anyone take that as an invitation to start yet another long, meandering, unenlightening thread about type systems.]

Q: What does relational and you have in common?

A: Projection.

Top, such ad hominem attacks -- funny as they may be -- only degrade your credibility. Please consider deleting them. Anyway, for the third time, let me ask this: I could easily add an "EXCLUDE" keyword to ORDER -- e.g., 'r ORDER(DESC Name, ASC Date, EXCLUDE CustomerID, EXCLUDE CustomerPhone)' or something similar. Top, would that meet your requirements, at least in concept?

You called my replies "unenlightening" and "meandering". Is that not a personal and rude attack"? The "H" word comes to mind.

Observe the use of italics and other indicators of authorship. I have not insulted you at all, in any way.

And, why would EXCLUDE go with ORDER? ORDER selects which fields to sort on, not which to emit. I like the idea of EXCLUDE, but it's still just another way to make a bag. -t

ORDER() generates an ARRAY from a relation. Relations have no notion of order. ARRAYs do, and they can include duplicates, so it makes sense to provide a way to generate duplicates from a relation by excluding the key.

So you can have bags. Welcome to the bag club! Please kick back, relax, and unpack your bags. (Related: reply "A" above under PageAnchor David_Letterman.

Your sarcasm is ignored. An ARRAY is primarily intended to fullfil the same role as a SQL CURSOR -- it allows the programmer to iterate the tuples of a relation. "FOR" is used to iterate the tuples of an ARRAY, which is why it was used as an example of handling duplicate tuples, waaaay back when these discussions started. A relation has no notion of order, so it must be converted to an ARRAY so that tuples can be iterated in a specified (or arbitrary but fixed) order. ARRAY is also available as an output format -- i.e., an ARRAY can be the result of an expression evaluation -- so that clients of the DBMS can obtain tuples in a specified order. If you need duplicate tuples too, it makes sense to provide a means to exclude key attributes. There is, however, no ARRAY algebra or any other operations on an ARRAY, as there's never a need to maintain duplicate tuples inside a DBMS.

Their intended use doesn't change anything: you still have bags; it's just that you take an "is-a" approach to bags while most RDBMS takes a "has-a" approach (uniqueness is a feature, a constraint attached to a table, not a type). HAS-A is better reuse of concepts and implementation of a query system because it means that RDBMS bags are not crippled; that is, given a decent package of operators.

I believe your reasoning is that if your bags are crippled, then people will use them less. It's a discouragement technique. However, the likely result is that they'll use Rel less because of the artificial and harsh line it draws between two otherwise very similar concepts.

Further, your approach is anti-polymorphism (if one does go the IS-A approach). The context and operations available to each are unnecessarily different. -t

Sorry, you've lost me. "Anti-polymorphism"? I don't need RelationalAlgebra-like operators on an ARRAY any more than I need to find the square root of a phone number.

I've pointed out how a pure relational system can accept input from bags (via import/link mechanisms discussed on other pages) and output to bags, via EXPORT, ARRAY, and FOR. I've explained that bags are pointless inside a true relational DBMS because a DatabaseIsRepresenterOfFacts about a domain. Furthermore, I've explained that bags inside a database may lead to inadvertent or erroneous duplicates. You have not shown why bags are needed inside a database, as BagNeedScenarios all deal with bags at the boundaries -- the input and output -- of a database system. Those, as I've shown, are trivial to handle. Nor have you shown that dealing effectively with bags at the boundaries of the system and eliminating them within the system presents an undue, unacceptable, or insurmountable burden. You haven't even shown that these conditions are awkward or annoying. Indeed, the clear distinctions between concepts and types in TutorialDee/Rel is one familiar to programmers, and will likely find greater favour with them than the indiscriminate hodge-podge of operations in SQL's 'SELECT'.

Thus, it is clear that true relational systems are inherently superior to BagAtational systems. Q.E.D.

I'd like to thank you for creating and participating in these discussions, Top. By making me think about these topics, it's confirmed to me that true relational systems are the way forward, and that BagAtational systems -- whether using SQL or some other language -- should be discouraged, deprecated, and avoided. I shall return to working on the RelProject with renewed vigour, purpose, and vision.

Whatever floats your boat.

Re: "I don't need relational-like operators on an ARRAY any more than I need to find the square root of a phone number."

Each relational operator has a very similar corresponding "bag" equivalent (including other column-based or column-projected structures); similar to how integers and real numbers can both be added. Your analogy is odd. (However, it may be part of encryption.)

Re: "I've pointed out how a pure relational system can accept input from bags"

Not disputed, but made harder.

Re: "I've explained that bags are pointless inside a true relational DBMS because a DatabaseIsRepresenterOfFacts about a domain."

That's argument-by-definition, not argument by utility. Plus, "fact" was never clearly defined. Facts can also be known ambiguity and probabilistic knowledge. Are you going to argue that one shouldn't store fuzzy photographs in a database because there is ambiguous or imperfect info in it?

Re: "Furthermore, I've explained that bags inside a database may lead to inadvertent or erroneous duplicates."

True, but any tool can be misused/ abused. Making bags more difficult will not necessarily stop such problems, and may create new and different problems.

Re: "You have not shown why bags are needed inside a database, as BagNeedScenarios all deal with bags at the boundaries -- the input and output -- of a database system."

As already explained, "inside" is relative, and the query processor often cannot often know what is being done with the "output". Further, it may be possible to put more restrictions on "internal" queries than non-internal queries. You don't have to cripple one to stop the other.

Re: "Those, as I've shown, are trivial to handle."

Your approaches are awkward, extra code, and potentially confusing.

Re: "Nor have you shown that dealing effectively with bags at the boundaries of the system and eliminating them within the system presents an undue, unacceptable, or insurmountable burden."

And you haven't shown that allowing them creates and "an undue, unacceptable, or insurmountable burden". Sets are not the default. You seem to think that sets are the default unless insurmountable evidence says otherwise. Actually, bags are the default because most RDBMS already accept them.

Re: "clear distinctions between concepts and types in TutorialDee/Rel is one familiar to programmers, and will likely find greater favour with them than the indiscriminate hodge-podge of operations in SQL's 'SELECT'."

There is no "clear distinction". You are forcing a distinction. HAS-A is a legitimate contender to IS-A, and is often more flexible and better concept reuse. And programmer preference varies widely. I won't disagree that purity-obsessed personalities may gravitate toward Rel. I just hope I don't have to work with such annoying, fixated people. The WorseIsBetter IBM's and Oracle's already won, and will probably win again against your MIT-style contraption.

Didn't X11 originate from MIT and succeeded? Not that X11 is based on sound theory or anything.. just saying.

Further, if the anti-bag movement by chance caught on, then SQL can be modified to limit them on internal queries, and Rel will then very little extra to offer. You don't understand economics, only abstraction purity.

The difference between us is that I view abstractions as TOOLS, as UsefulLies. However, you view them as MANDATES. -t

A lot of abstractions are useless lies. You've not offered any convincing arguments that bags are useful within a relational database. Even your arguments that they are useful at the 'edges' of the database are circular, since they depend ultimately upon bags being useful inside other people's databases, which has not been convincingly argued. If you want to argue that bags are a useful lie in databases, you'll need to find a better scenario.

You haven't raised a single meaningful objection and offer only repetition of past points, nonsense, and thinly-veiled ad hominems. I particularly liked your quip about facts, as if bags will deal better with "known ambiguity and probabilistic knowledge" than sets. If you ever come up with a sound and logical argument, be sure to post it here.

Projection, and likewise. You are the Joe McCarthy? of bags.

Uh huh. Yet you still can't give any reason why bags might be desirable inside a database -- not at the boundaries, or some hypothetically "relative" ambiguous region or contrived situation, but simply an ordinary table within an ordinary database.

Please clarify that last clause of that sentence. I'm not sure what it modifies. Note that a technique enforcing uniqueness on a table or all tables will not necessarily affect the query language itself. Scenario 4 had an example where tossing the uniqueness enforcement mechanism can save space on a given table without switching DB engines. -t

By "ordinary table within an ordinary database", I mean the sort of thing -- inventory, CRM, ERP, general business recordkeeping, etc. -- for which typical DBMSs -- MySQL, PostgreSQL, Firebird, HyperSQL HSQLDB, DB2, Oracle Database, Microsoft SQL Server, Sybase ASE, Ingres Database, etc. -- are used. I.e., a typical database scenario, unconstrained by unusual space requirements like your "Scenario 4". In a typical database context, is there any reason why a bag might be desirable?

Okay, I tend to agree. But I disagree that situations that fall outside of that are "contrived", and protest your use of that word. -t

NovemberTen

CategoryDatabase CategoryThreadMess?