Rel Export Discussion

Moved from BagNeedScenariosReWork:

The Rel EXPORT is sometimes presented as a solution to removing primary keys from "output" of queries. However, as described in ComplexityOfOutputtingDuplicateTuplesInTutorialDee, similar file- and text-writing commands in vendor languages such as Pl/Sql and TransactSql are not typically available to app programmers and non-DBA query tool users. The RDBMS may not even know what the client will do with a result set, and thus cannot give a different treatment (such as stricter uniqueness rules) depending on what is to be done with the result set.

Further, query tools often cache a result set locally, and then the user can decide to save that cached result as text long after the query is complete. Thus, the query itself cannot know what is to be done with the result grid/table. In practice, I see very little reason to make a "hard-bordered" distinction between emitting tables and emitting "output". It's better reuse of concepts to share such features. -t

Rel EXPORT was an example of removing primary keys from "output" of queries in order to allow duplicates, as was the "FOR" example before it. However (as has been pointed out before) in database systems as they are typically used, there is a client language which can even more trivially handle or produce duplicates simply by ignoring primary keys. There is no "hard-bordered" distinction between emitting tables and emitting "output". There is a hard-bordered distinction between calculation and presentation, just as there is in almost every other programming language. What you seem desperately unwilling to understand is that relational projection is not, not, not emitting tables or emitting "output". It is calculation; an operation that generates a new relation from an existing relation given an attribute list. It has no more to do with "output" in a relational language than "+" has to do with output in a conventional programming language. In Rel, for example, projection does not produce user-readable output. Nor does UNION, INTERSECT, WHERE, SUMMARIZE, or any other relational operator. OUTPUT, WRITELN and WRITE produce output given a tuple, scalar type, or relation as input.

This issue was discussed at DuplicateTupleOutputDiscussion, PageAnchor "formatting_01", but never resolved. My view is that the concepts overlap in practice and it's best to take advantage of that to avoid reinventing the wheel and creating unnecessary extra steps or layers. I care more about tool utility than definitions. If my scissors can also serve as a basic knife with a few minor tweaks, then that reduces the quantity of times I have to reach for a (dedicated) knife, making me more productive and the company more profitable. If overlapping "breaks" the definition of "scissors", I don't give a flying flip. My job is to be productive, not make dictionaries. -t

It was quite clearly resolved -- in the RelationalModel, projection is a relational operator; it is not a presentation mechanism. Indeed, the RelationalModel does not define presentation mechanisms, so they are implicitly outside of any relational operator. Cf elementary algebra (i.e., "school maths"), which similarly does not define presentation in +, -, /, etc. If you wish to create operators that have side-effects like output, that's fine, but it's not the RelationalModel. Like SQL, it will be something else. Rel is an implementation of the RelationalModel, so it adheres strictly to the theoretical separations.

Transformation is transformation. Call it a relational operator, call it "formatting", call it pizza making, whatever. It's still transformation. "Output" is relative and the query language often cannot know what the user/requester/app will do with it.

Projection is transformation from a relation to a relation. EXPORT (and friends) is transformation from a relation to a bag and/or other structure. The relational algebra is closed under relations, so projection returns a relation, not a bag.

[I venture to guess that Top is missing this important distinction. It's easy to do, if you're not thinking beyond what you're accustomed to from SQL. In SQL, "SELECT" (and, less frequently, UNION) is used both to produce an intermediate query result, and to produce output. In a language like Rel, these are distinct concepts. That is not a liability in any way whatsoever. (Analogously, in SQL, you can't use the phrase "INNER JOIN" to produce output by itself, it produces something that can be consumed only by a FROM clause. This is entirely normal.)]

[Looking at it from the point of view of types produced by expressions, relational expressions produce relations, period, and relations have certain specific characteristics that you can't talk someone into changing without forcing a complete redesign into a language that is then no longer relational. Converting a relation to some form of output is the job of "EXPORT", which produces a sequence or rows. A sequence has an ordering, and it can have duplicates -- it's a different beast from a relation.]

[There is no unnecessary cost associated with this, either runtime or conceptual. Conceptually, it's simply how the language works and no more or less complicated than hundreds of other very simple distinctions made in any language. The runtime cost of producing output is inherent to the task; EXPORT makes this explicit, whereas in SQL the cost of producing output versus providing an intermediate result for further calculation is implicit in the context in which the SELECT appears. Does that help clarify anything at all?] -- DanMuller

As already stated and linked, EXPORT would not typically be available in practice any more than the file and text features of Pl/Sql or TransactSql are available for use to typical query users and apps. Thus, EXPORT will not solve the standing issues. The industry has settled in a de-facto "standard" in the way query languages, query transport mechanisms, and query browsers relate to the RDBMS.

The typical query session can be more or less modeled with the following steps:

1. The API or query tool ("tool" for short) sends the text of the query to a RDBMS. This text is typically SQL, but could in theory be other query languages.

2. The RDBMS sends a "return packet" back to the tool. The return package will include:

Note that there is no prevision to indicate what the tool is going to do with the result package. It's a basic contract without info on intended usage. The contract says, you supply this, and then this is what comes back.

Within this framework, a Rel RDBMS has only two choices:

[Most of your answer made no sense, except for item 1) just at the end. Yes, if you really must think of it as "SELECT", then do so and be done with it, realizing that this "SELECT" has only the role of shaping the query output, and no role in relational expressions. The statement "would not be typically available" is nonsensical and seems based on some weird misunderstanding of your own; whatever is part of the language, is available. The details of how you think current systems work under the covers is entirely irrelevant, what matters is what's sent and what's returned. There's no conceivable reason whatsoever for a new system to conform to (your notion of) the inner workings of existing systems. (There is, BTW, indeed a provision for knowing when a query in a typical SQL system is being executed to develop output; it's implicit in whether a query is a top-level statement or an embedded query. The distinction is even made in the SQL standard; it's just not highlighted by a distinct keyword.) I've made my attempt at contributing some clarification, but won't participate in the further proliferation of pages to debate endless permutations of completely irrelevant -- and invented -- details.] -- DanMuller

It's not "endless permutations" and it's not "invented"; it's what the industry supports and expects. Ignore reality at your own peril. I've used dozens of products with dozens of drivers in dozens of companies. Many drivers support the minimum necessary to "get the job done" because driver writers are often lazy, and the "minimum necessary" is usually to deliver a tabular data-set after the query text is sent. There is no wide-spread support for any notion of output to CSV or a screen or an associative array in PHP or whatever. That info is usually not available to the query text processor (RDBMS). Thus, the EXPORT command is useless in such environment. It does not know what will be done with the result table.

In fact, it can't know in many cases because the user can choose to do with the result whatever he/she chooses after the query is processed. I can look at the result table on screen in Toad, for example, and choose to save it as CSV or clear it off the screen forever. The query is already processed and cannot know the future, which is my decision. Is this not true? There is no technology that accurately predicts the future.

Further, the specifications for ODBC, JDBC, SQL, etc. often offer a lot more than what actually works in practice. Vendors and driver writers can ignore what they want without penalty, and/or the shop's RDBMS has not been configured for a certain feature. You guys keep saying the spec may allow this and it may allow that, but it's a crapshoot if it actually works in a given shop. True, I haven't done actual tests, but neither have you.

You appear to be HandWaving the fact that reality won't properly support Rel's EXPORT command except in an obscure custom arrangement. (I feel dirty for using the "H" word. Now I have to shower.) SQL has at least been reality-tested for many years. It's not perfect, but at least it satisfactorily addresses some very practical issues.

--top

Top, see above where I wrote, "Rel EXPORT was an example of removing primary keys from "output" of queries in order to allow duplicates, as was the "FOR" example before it." You are making an Everest out of a very small, insignificant molehill. Whatever consumer exists for a relation -- whether EXPORT on the server side or some arbitrary bit of client-side application code -- can trivially produce duplicates if needed. The only place where this becomes difficult is if you're using some limited-capability pre-packaged JDBC/ODBC client that does not support dropping columns from the result set, perhaps because it assumes SQL is being used. Obviously, this would be a rare circumstance.

[I wrote one set of comments on this, and then decided I'd misinterpreted Top's objection. It's hard to understand the objection; most of it seems completely irrelevant. An API between a host language and a DBMS typically lets a program send a command in the DBMS's query language and, optionally, receive a result back. One category of commands is a request for query output. If EXPORT or TRANSMOGRIFY or TO_ARRAY or whatever is the command in the DBMS's language for getting output, then that's what you'd send. You wouldn't send a SELECT statement to a DBMS that doesn't understand SQL, which is the context of this discussion. DBMS-specific drivers written to the API handle other details, as they always do; particularly transport issues and translation of representation. No subsystem knows what a user intends to do with its output, it only knows what it was asked for, so I don't know what the comments about intent relate to. Bringing up flaws in the implementations of existing drivers for other DBMSs, or problems with misconfiguration of those DBMSs, is also mystifying. They're not the DBMSs being discussed.]

Regarding these two flow sequences:

 A. RDBMS --> Driver --> Tool/Lang-Specific-Adapter --> Output
 .
 B. RDBMS --> Driver --> Output
You guys seem to be assuming the "B" configuration where your handy dandy Rel driver can simply adjust the output as it sees fit. A good many, perhaps most, tools fit configuration "A". Your output has to conform the adapter. In some cases you may be able to force "B", in others not because it's controlled the tool vendor. For example, the tool vendor may not allow you to control how CSV is created and saved. A work-around is to let EXPORT act like a SELECT statement and adjust columns as needed, but you've merely re-invented a SELECT statement clone despite complaining about them.

Do you really think MicrosoftAccess is going to let your query commands and/or ODBC driver dictate that the output format is CSV and control what it looks like and where it is saved? It shouldn't. In fact, that's a security breach. Your driver will usually only be allowed to deliver tables and only tables.

Even if you driver has the ability to write text files, existing tools probably will not use that ability since they didn't have to with SQL and often have their own mechanism for making CSV etc. If you include direct file writing it your driver, that function/method will simply be ignored. You are trying to change the query culture, not just the query language.

Now, you could probably make your own driver or API for a language such as C, and make it do anything it wants, including save to files. Web language communities may complain, however, because the driver could potentially be a security breach if it can save to files.

[Your concern still doesn't make sense to me. The output side of a driver (and I'm assuming a client-side "driver" here; these really aren't "drivers" in the classic sense of an operating system driver) has to conform to whatever API it is implementing. E.g. in ADO.NET, results are usually returned as a DataSet object, or are accessed via an object that implements the IDataReader interface. How the data that populates that object is transmitted from the RDBMS to the host program is the driver's concern. In the latter case, the specific concrete type of the object is even left entirely to the driver, specifically to give the driver flexibility in implementation. I really don't see where CSV or files come into the picture.]

This was partially addressed already with the "C" example. Yes, you can make your own driver with whatever features you want. However, it doesn't make sense to reinvent what you call "formatting", such as conversion to CSV. It might be possible, but will result in confusion and duplication. For example, there may be many existing API's or methods to convert a DataSet object into CSV etc. If you make your own, you are unnecessarily reinventing the wheel and circumventing conventions. But, I was talking more about non-programmable or semi-programmable tools such as query browsers.

[And as I've already stated, if you want to think of an EXPORT-like statement as being a SELECT, fine, as long as you stop ignoring the distinction that it produces output, not a relational result. It takes a relational result as input, produces something that need not be specified for transmission to the host, and the DBMS-specific driver transforms that something into whatever the host program's DBMS API requires. When considering the output data flow, this is no different than how things operate today with SQL-based systems. Your alternative A is always the case; at the very least there's a bit of code to write a data structure to a screen. But it doesn't matter, as long as that code is written to an API that is faithfully implemented by the driver.]

Which usually requires a table-shaped structure of some sort. And I'm not "ignoring the distinction", rather you are trying to force a distinction to conform to your idealistic world view. Your EXPORT will confuse developers and will probably cause inconsistent handling by query tool makers as some will ignore it and others will treat it just like a SELECT statement.

I've been getting feedback from developers and database students using Rel since 2004, and have never heard anything -- well, until you came along -- to suggest confusion between output formatting and relational operators & expressions. Indeed, that seems to be quite intuitive.

They are probably using the native client.

Why would that make a difference? The language -- TutorialDee with extensions -- is the same no matter what client is used.

[Rather than concede a point, he descends into completely absurd nonsense. Tell me, Top: In SQL, do you find the distinction between the uses and purposes of the keywords INNER JOIN and FROM to be confusing? If so, do you ignore one of them?]

I honestly don't know what the hell you are talking about. Your analogy makes no sense to me.

[Quoting from your text: "Your EXPORT will confuse developers and will probably cause inconsistent handling by query tool makers as some will ignore it ..." This is analogous to saying that "Your FROM clause will confuse developers and will probably cause inconsistent handling by query tool makers as some will ignore it...". EXPORT (or whatever output statements Rel supplies) are part of the query language. It's no more or less confusing than any other part of the language. (Notwithstanding the fact that it seems to confuse you.) Really, neither of the two people still conversing with you on this subject can understand what you're on about. Don't you think that's an indication that you're misunderstanding something? The only guess I can come up with is that you have made some erroneous assumption about how the host language <-> RDBMS communications pipeline works, but I can't figure out what that assumption might be.]

The two people in question don't have enough practical experience in a "typical" environment in my assessment. They sound like well-educated but field-naive newbies to be frank. I don't trust their judgment on such matters.

I'll try to explain this another way with some pseudo-code. Let's say you are making a "typical" query browser tool (related: ListOfQueryBrowsers). The internal code for the browser may resemble something like this:

 // code in Query Browser Foo 3.0
 row = new odbc.row;
 securityContext = new odbc.security(.....);
 d = new odbc.driver(dropDownList.odbcVendorKey);
 resultSet = new d.executeQuery(queryTextBox.text, securityContext);
 while (row = resultSet.getNextRow()) {
   displayToScreenGrid(row);
 }
 ....
 /* event Save-As */
 ....
 if (event.menu = "save_as_csv" && resultSet.hasData) {
   f = new file(filePrompt(...));
   resultSet.resetCursor();  // cursor back to 1st row
   while (row = resultSet.getNextRow()) {
     li = makeDelemitedString(row);
     f.writeLine(li);
   }
 }
 ...
 function displayToScreenGrid(odbc.row r) {
   string colValue, colType;
   for (i=0; i < r.columnCount; i++) {  // for each column
     colValue = r.getColumnValue(i);
     colType  = r.getColumnType(i)
     /* display value on screen grid */
     ....
   }
 }
 string function makeDelimitedString(odbc.row r) {
   string colValue, colType, lineString;
   lineString = "";
   for (i=0; i < r.columnCount; i++) {  // for each column
     colValue = r.getColumnValue(i);
     colType  = r.getColumnType(i)
     /* convert to delimited */
     if (! isOneOf(colType,"numeric,float,double,integer,etc")) {
        colValue = "'" . colValue . "'";  // surround with quotes
     }
     if (i != 0) {
       lineString = ',' . lineString;  // add delimiter
     }
     lineString = lineString . colValue;
   }
   return(lineString);
 }
Now regardless of the specific ODBC driver, it's still going to call getColumnValue etc. Essentially the ODBC API assumes a tabular grid and processes tabular grids. The ODBC driver writer cannot change the above code, for it's part of the query browser, NOT the ODBC driver. And, it already has features to save and convert the result set buffer to CSV files and decides what context to do it in.

Even if the Rel ODBC driver author puts methods/features into their driver to get EXPORT results, the query browser will not use those methods (without a special rewrite). The only way to pass query results to the Foo 3.0 query browser is thru a tabular-shaped result set, which has rows and columns. And if EXPORT wants to get it's results through, it has to produce a tabular result set. In other words, act like a SELECT statement, which is what a Rel query withOUT the EXPORT key words also generally acts like. -t

[I think that the objector, although perhaps quite experienced with high-level DBMS tools, hasn't got the first clue about how the tools that he uses are implemented. I can think of no reason that the above would not work with an alternative RDBMS like REL, provided that queryTextBox.text returns a correct REL output statement of some sort. If you're talking to SQL database, that would be a query statement, e.g. a SELECT. When talking to REL, it would be EXPORT or some such. None of this code cares in the least what the query syntax is, and there is absolutely no reason that the driver would not return a resultSet that worked just like one from any other DBMS. If you dispute this, then I'm stymied.]

There appears to be a misunderstanding somewhere. Let's revisit the following line:

  resultSet = new d.executeQuery(queryTextBox.text, securityContext);
This runs on a client; let's say query author's desktop. "queryTextBox.text" is just the content of a GUI text-box widget, similar to an HMTL "textarea" form tag. The client generally does not parse that string (or is at least not obligated too). It simply sends the string to a RDBMS server. More explicitly, it sends it to the odbc driver, which then forwards it to the RDBMS server. (I added "securityContext" which may just be username, password, etc.)

It's true the driver potentially has a lot of options at this point, and itself could potentially parse the query language (Rel) and/or let the DB server do it. HOWEVER, it does NOT have a lot of options in terms of what it can hand back to the query browser (client), because the query browser uses a fixed interface to the results. With polymorphism, you can change the implementation, but you cannot change the interface without rewriting the software that uses that interface. I hope this clears things up. -t

Re: "and there is absolutely no reason that the driver would not return a resultSet that worked just like one from any other DBMS."

Please clarify. If it's the same, then what is the purpose of EXPORT? And what would 'EXPORT...FORMAT CSV TO "myfile.csv"' mean to the client? (Suggested Rel syntax from the pro-Rel side). -t

[There certainly does appear to be one or more misunderstandings, but their nature hasn't been identified yet.]

[Forget EXPORT for a minute, OK? You're fixated on something wrong. EXPORT doesn't appear to be defined in Rel anyway, according to the version of the grammar currently on its Web site. (http://dbappbuilder.sourceforge.net/Rel.php#) I think it was mentioned hypothetically. In any case, I don't really know Rel, except insofar as I have some familiarity with Tutorial D.]

[There are many ways that Rel could be designed to produce output for a host program. Let's assume e.g. that if you send it an expression (as opposed to a statement), it sends back a representation of the result. That's a handy characteristic of many REPLs. We don't know or care much about the detailed form of that result, except that it should be a complex enough representation that it completely describes the result. Now since this is a strict relational language, if you send it a relational expression, you'll get a relation back. Well, a relation has no defined order for its rows, and it doesn't allow duplicates, so although that's useful, it's not really adequate. But -- and this is important -- the driver for a given API should have no trouble translating the representation to conform to the API's representation of a query result.]

[OK, so how do we get something a bit more like what you can get from a SQL DBMS? Easy. The result of a SQL query is more like an array of rows. Arrays define an order of their elements, and unlike sets, can have duplicate elements. So let's hypothesize a syntax for converting relations to arrays: <relation-expr> AS ARRAY {<attribute-list>} ORDER (<order-item comma list>). You with me so far? (The current Rel grammar doesn't appear to have anything like this, although it has a less flexible way to load a relation into an array variable. This is all hypothetical, just like the rest of the discussion has been.) The brace-delimited list specifies what columns to include in the resulting array. (I'd make this list optional if I were actually defining this language addition). You can imagine what the <order-item comma list> might look like. (Again, I'd make the whole ORDER clause optional. Its syntax is based on existing Rel grammar, see earlier link.)]

[So now, if you want a more array-like result, you send something like e.g. "tableA AS ARRAY {colA, colB} ORDER [DESC colA]" as your query. The system sends back a representation (which we don't have to specify to have this discussion) of an array of rows. Again, the RDBMS-specific driver should have no trouble exposing that through your typical host API as a query result. And as far as a higher cost of execution goes compared to a SELECT statement that does something similar: Forget it, there is no higher cost. Conceptually higher cost? Not really. Once you've wrapped your head around relational algebra (which would be a prerequisite for using a language like Rel with any success at all), then the distinction between the characteristics of a relation and an array are completely natural, and you wouldn't have to think for more than an eyeblink to decide which you want in a given situation.]

I was responding to the hypothetical EXPORT examples. I agree there are many ways to "format" the results to give one the power of SELECT, but why go for a kludgy, two-layered ugly copy? SELECT has proven convenient and practical. Keep what works unless you find a nicer replacement, not a round-about replacement. If you want to "protect" result sets from getting "into" the RDBMS, then simply forbid duplicate-containing result sets from being fed to other tables and queries within the RDBMS, and ignore internally nested "ORDER BY" clauses. (I'm not saying actually have a SELECT keyword, but rather stop adding extra keywords like ARRAY and EXPORT to get the same thing. Don't gum up the language with mostly-superfluous clauses. They are solutions looking for a problem.)

[Coming back now to your 'EXPORT...FORMAT CSV TO "myfile.csv"': It seems fairly obvious that a hypothetical statement like this tells the RDBMS to send a query result to a file, in a specific format. You wouldn't send this statement to the system if you expected a query result back, just as you wouldn't send a command like "BACKUP db1 TO FILE 'mybackup.bak'" and expect a query result back. I can't fathom why you would even entertain the idea that you would send this and expect a query result back.]

Somebody else suggested that EXPORT was a solution to get "output" duplicates (and possibly sorting) as query results, not me. Your criticism of the idea sounds a lot like mine, actually. Maybe you can explain it to the other guy(s); I didn't have much luck.

Here's a compromise: have an optional "ALLOW DUPLICATES" clause in Rel for roughly 5% of the time where you want them in the "outer" results.

But anyhow, it will probably still come down to expressiveness and convenience versus what I call "purity" of set-centric relational. We'll probably never see eye-to-eye on that philosophical issue. -t

{You have not demonstrated "expressiveness and convenience". Your BagNeedScenarios are weak, only marginally better than your justification for donkey-eared integers. Your argument isn't respectable as a 'philosophical issue' because it fails to first be respectable as an 'argument'.}

[This is pretty much a throw-arms-in-the-air-and-walk-away moment in the discussion. There's nothing about the proposals that sacrifices expressiveness or convenience. They're merely different from SQL, and much, much more like general-purpose programming languages, both new and old, in making a clear distinction between different data types and the operations that you can apply to them.]

[Please, don't anyone take that as an invitation to start yet another long, meandering, unenlightening thread about type systems.]

Q: What does relational and you have in common?

A: Projection.

Top, such ad hominem attacks -- funny as they may be -- only degrade your credibility. Please consider deleting them. Anyway, for the third time, let me ask this: I could easily add an "EXCLUDE" keyword to ORDER -- e.g., 'r ORDER(DESC Name, ASC Date, EXCLUDE CustomerID, EXCLUDE CustomerPhone)' or something similar. Top, would that meet your requirements, at least in concept?

You called my replies "unenlightening" and "meandering". Is that not a personal and rude attack"? The "H" word comes to mind.

Observe the use of italics and other indicators of authorship. I have not insulted you at all, in any way.

And, why would EXCLUDE go with ORDER? ORDER selects which fields to sort on, not which to emit. I like the idea of EXCLUDE, but it's still just another way to make a bag. -t

ORDER() generates an ARRAY from a relation. Relations have no notion of order. ARRAYs do, and they can include duplicates, so it makes sense to provide a way to generate duplicates from a relation by excluding the key.

So you can have bags. Welcome to the bag club! Please kick back, relax, and unpack your bags. (Related: reply "A" above under PageAnchor David_Letterman.

Your sarcasm is ignored. An ARRAY is primarily intended to fullfil the same role as a SQL CURSOR -- it allows the programmer to iterate the tuples of a relation. "FOR" is used to iterate the tuples of an ARRAY, which is why it was used as an example of handling duplicate tuples, waaaay back when these discussions started. A relation has no notion of order, so it must be converted to an ARRAY so that tuples can be iterated in a specified (or arbitrary but fixed) order. ARRAY is also available as an output format -- i.e., an ARRAY can be the result of an expression evaluation -- so that clients of the DBMS can obtain tuples in a specified order. If you need duplicate tuples too, it makes sense to provide a means to exclude key attributes. There is, however, no ARRAY algebra or any other operations on an ARRAY, as there's never a need to maintain duplicate tuples inside a DBMS.

Their intended use doesn't change anything: you still have bags; it's just that you take an "is-a" approach to bags while most RDBMS takes a "has-a" approach (uniqueness is a feature, a constraint attached to a table, not a type). HAS-A is better reuse of concepts and implementation of a query system because it means that RDBMS bags are not crippled; that is, given a decent package of operators.

I believe your reasoning is that if your bags are crippled, then people will use them less. It's a discouragement technique. However, the likely result is that they'll use Rel less because of the artificial and harsh line it draws between two otherwise very similar concepts.

Further, your approach is anti-polymorphism (if one does go the IS-A approach). The context and operations available to each are unnecessarily different. -t

Sorry, you've lost me. "Anti-polymorphism"? I don't need RelationalAlgebra-like operators on an ARRAY any more than I need to find the square root of a phone number.

I've pointed out how a pure relational system can accept input from bags (via import/link mechanisms discussed on other pages) and output to bags, via EXPORT, ARRAY, and FOR. I've explained that bags are pointless inside a true relational DBMS because a DatabaseIsRepresenterOfFacts about a domain. Furthermore, I've explained that bags inside a database may lead to inadvertent or erroneous duplicates. You have not shown why bags are needed inside a database, as BagNeedScenarios all deal with bags at the boundaries -- the input and output -- of a database system. Those, as I've shown, are trivial to handle. Nor have you shown that dealing effectively with bags at the boundaries of the system and eliminating them within the system presents an undue, unacceptable, or insurmountable burden. You haven't even shown that these conditions are awkward or annoying. Indeed, the clear distinctions between concepts and types in TutorialDee/Rel is one familiar to programmers, and will likely find greater favour with them than the indiscriminate hodge-podge of operations in SQL's 'SELECT'.

Thus, it is clear that true relational systems are inherently superior to BagAtational systems. Q.E.D.

I'd like to thank you for creating and participating in these discussions, Top. By making me think about these topics, it's confirmed to me that true relational systems are the way forward, and that BagAtational systems -- whether using SQL or some other language -- should be discouraged, deprecated, and avoided. I shall return to working on the RelProject with renewed vigour, purpose, and vision.

Whatever floats your boat.

Re: "I don't need relational-like operators on an ARRAY any more than I need to find the square root of a phone number."

Each relational operator has a very similar corresponding "bag" equivalent (including other column-based or column-projected structures); similar to how integers and real numbers can both be added. Your analogy is odd. (However, it may be part of encryption.)

Re: "I've pointed out how a pure relational system can accept input from bags"

Not disputed, but made harder.

Re: "I've explained that bags are pointless inside a true relational DBMS because a DatabaseIsRepresenterOfFacts about a domain."

That's argument-by-definition, not argument by utility. Plus, "fact" was never clearly defined. Facts can also be known ambiguity and probabilistic knowledge. Are you going to argue that one shouldn't store fuzzy photographs in a database because there is ambiguous or imperfect info in it?

Re: "Furthermore, I've explained that bags inside a database may lead to inadvertent or erroneous duplicates."

True, but any tool can be misused/ abused. Making bags more difficult will not necessarily stop such problems, and may create new and different problems.

Re: "You have not shown why bags are needed inside a database, as BagNeedScenarios all deal with bags at the boundaries -- the input and output -- of a database system."

As already explained, "inside" is relative, and the query processor often cannot often know what is being done with the "output". Further, it may be possible to put more restrictions on "internal" queries than non-internal queries. You don't have to cripple one to stop the other.

Re: "Those, as I've shown, are trivial to handle."

Your approaches are awkward, extra code, and potentially confusing.

Re: "Nor have you shown that dealing effectively with bags at the boundaries of the system and eliminating them within the system presents an undue, unacceptable, or insurmountable burden."

And you haven't shown that allowing them creates and "an undue, unacceptable, or insurmountable burden". Sets are not the default. You seem to think that sets are the default unless insurmountable evidence says otherwise. Actually, bags are the default because most RDBMS already accept them.

Re: "clear distinctions between concepts and types in TutorialDee/Rel is one familiar to programmers, and will likely find greater favour with them than the indiscriminate hodge-podge of operations in SQL's 'SELECT'."

There is no "clear distinction". You are forcing a distinction. HAS-A is a legitimate contender to IS-A, and is often more flexible and better concept reuse. And programmer preference varies widely. I won't disagree that purity-obsessed personalities may gravitate toward Rel. I just hope I don't have to work with such annoying, fixated people. The WorseIsBetter IBM's and Oracle's already won, and will probably win again against your MIT-style contraption.

Didn't X11 originate from MIT and succeeded? Not that X11 is based on sound theory or anything.. just saying.

Further, if the anti-bag movement by chance caught on, then SQL can be modified to limit them on internal queries, and Rel will then very little extra to offer. You don't understand economics, only abstraction purity.

The difference between us is that I view abstractions as TOOLS, as UsefulLies. However, you view them as MANDATES. -t

You haven't raised a single meaningful objection and offer only repetition of past points, nonsense, and thinly-veiled ad hominems. I particularly liked your quip about facts, as if bags will deal better with "known ambiguity and probabilistic knowledge" than sets. If you ever come up with a sound and logical argument, be sure to post it here.

Projection, and likewise. You are the Joe McCarthy? of bags.

Uh huh. Yet you still can't give any reason why bags might be desirable inside a database -- not at the boundaries, or some hypothetically "relative" ambiguous region or contrived situation, but simply an ordinary table within an ordinary database.

Please clarify that last clause of that sentence. I'm not sure what it modifies. Note that a technique enforcing uniqueness on a table or all tables will not necessarily affect the query language itself. Scenario 4 had an example where tossing the uniqueness enforcement mechanism can save space on a given table without switching DB engines. -t

By "ordinary table within an ordinary database", I mean the sort of thing -- inventory, CRM, ERP, general business recordkeeping, etc. -- for which typical DBMSs -- MySQL, PostgreSQL, Firebird, HyperSQL HSQLDB, DB2, Oracle Database, Microsoft SQL Server, Sybase ASE, Ingres Database, etc. -- are used. I.e., a typical database scenario, unconstrained by unusual space requirements like your "Scenario 4". In a typical database context, is there any reason why a bag might be desirable?

Okay, I tend to agree. But I disagree that situations that fall outside of that are "contrived", and protest your use of that word. -t


NovemberTen


CategoryDatabase CategoryThreadMess?


EditText of this page (last edited August 12, 2014) or FindPage with title or text search