Are Tables General Purpose Structures

Are Tables A General-Purpose Data Structure?

See MinimalTable for definitions of "table".

Claims to investigate:

(Please place rebuttals below, not in the list.) (Request is rejected. Put a summary of rebuttals here or you force a biased discussion.)

This focuses on the concept of tables rather than BigIronDatabases in particular. For the purpose of discussion, exclude speed issues arising from a hardcoded implementation: Notably, given appropriate constraints on the table (which is part of relational theory), one can have table implementations which are exactly as efficient as any given data structure for the operations that that structure would support natively, while still allowing the flexibility one would expect from a table.


[original, causing confusion] This focuses on the concept of tables rather than BigIronDatabases in particular. For the purpose of discussion, exclude speed issues.

Why? Speed is frequently important - even if we limit ourselves to BigOh issues; and tables have speed issues even if a NimbleDatabase (or none at all, just raw tables) are used.

Only in very limited circumstances. If you can identify a problem situation WRT speed, I would be glad to take a look. Maybe we can do a wild-west speed shootout.

The 3 claims above are mostly true. See the discussion below and at other places. It is *exactly* performance that can make one prefer a non-table structure. Notably time complexity. The time complexity of accessing a table element by key is O(log n). A hash table element O(1). The first element of a linked list, by reference, negligible. An array element by index, negligible. However, in non critical applications, O(log n) is fine, and therefore there is no reason to use non-table structures (other than programming habits). -- MariusAmadoAlves

O(1)? Not sure about that. Why don't RDBMS vendors build keys with hashes then? -- AnonymousDonor

Many do, though they use those hashes for binary trees (hence the O(log N)). It's certainly possible to index with a HashTable. Most uses for an RDBMS involve large data sets, though, where the storage overhead of a hashtable is prohibitive. This is really a case of a big-dataset tool being adapted for a small-dataset problem, not an intrinsic problem of tables. A lightweight in-memory database {NimbleDatabase}could easily use hashtables for indices, and then would have the same speed as a hashtable.

There are cases where tables have intrinsically worse performance than dedicated structures, though. Consider inserting an element at the head of a list. This is O(1) for any reasonable LinkedList implementation. It will generally be O(n) for a table, because it requires adjusting the sequence numbers for all following records.

{{It requires logically adjusting the sequence numbers, but it does not require actually iterating through all the sequence numbers and changing each one. The arguments in favour of tables are based on using a uniform table interface, not a uniform implementation.}}

These sequence numbers are part of the visible interface, so they can't be optimized away.

I am not sure what tablized implementation of a linked list you are envisioning.

You may be able to get amortized O(1) updates if the actual sequence numbers are not visible, only ordered.

Leave a gap between sequence numbers, and then double the gap with each renumbering, like how vectors resize themselves. The constant of proportion will be significantly worse than with a normal LinkedList implementation. And you lose random-access indexing of elements by position. But that's the same trade-off you face between arrays (O(N) insert at beginning, O(1) random access) and linked lists (O(1) insert at beginning, O(N) random access).

(EditHint: The list discussion is already mentioned below and should probably be consolidated together. But I've already refactored this page once and it just ended up getting ThreadMessed again.) -- JonathanTang

I can't see why a table couldn't implement a linked list in the same complexity as a 'real' linked list. Why wouldn't you just maintain a 'head' element which contains no real data, only a link to the first element? You could then do any addition anywhere in the list by appending a new element to the table and updating the two elements (if it's two way) that refer to it. O(1) if you already have the node you want to work with. Requiring that we can replace the head is a needlessly complicating assumption. -- WilliamUnderwood

I forgot about that possibility (that's what I get for commenting first and then reading the rest of the page). Yes, it's absolutely possible for a table to mimic the list exactly. You're likely facing higher constant factors, but that's an implementation detail, not something inherent to the data structure. The analysis I gave above is really for an array, and an overly complicated one at that. Although implementing an ArrayList this way at least achieves the NoPointersIdeal mentioned below. -- JonathanTang

Following a "pointer" in a table means doing a lookup on some key and is O(log n), not O(1). In general, a table directly implements a map or a set efficiently. Representing any other data structure is only possible by abusing keys as pointers and consequently incurs an additional logarithmic cost.

Hardly. Consider a 2 column table with the constraint that the first column is an integer value smaller than 4 billion. This can trivially be implemented as an array. Any relational operation that maps to the operations one generally does with arrays is done in exactly the same complexity as an array. Stating it again to make sure there's no confusion: the algorithmic complexity is a function of the constraints and the implementation of a table. The fact that we have a set with tuples in it has no bearing on it. -- cwillu

Well, no. To be trivially implemented, the integers have to be densely allocated and some sort of garbage collection is needed. This is not even transparent to the user. The result is an array, and therefore should not be called a "table" any more.

No, only the implementation is an array. The app writer may never know which implementation is used under the hood, and it may change dynamically by the underlying engine if say the size or usage pattern changes. RDBMS often use B-trees, for example, but that does not mean that one is directly "using" B-trees. One could also swap an array implementation with a linked list and the app writer may never know the difference (other than maybe analyzing performance profiles). "Table" does not define/dictate an underlying implementation.

Good point. Tables are an interface, not an "implementation". For example, the MS-SQL product has something called a "clustered table", which uses an internal arrangement that speeds up primary key-related operations at the expense of other keys or searches. Such a table does not behave any differently outside of performance.

{Some of the above can perhaps be moved to AreRdbmsSlow.}

I will agree that linked lists are probably one of the "least natural" structures to emulate with tables. Emulation is doable, but not very pleasant. But not easily handling the entire set of structures is not a reason to abandon the idea. -- top


Claim 1: Tables can represent anything that bags, stacks, hashes, or other DataStructures can.

Con:


While this is true; it is likely true for many containers mentioned. LispLanguage shows you can model everything with lists (with pairs, actually). Tables are hardly special in this regard

True, but I find Lisp structures harder to view and lack inter-developer consistency, and don't scale to "large-scale data banks" very gracefully.

[I suspect that the 'representation' of trees/graphs/stacks/lists/etc. using tables would prove equally delinquent of inter-developer consistency and as bad or worse for scaling.]

Exactly - Lisp cons cells are universal, but they're rather low level. Also, they don't support random access structures well (most Lisp dialects do provide random access arrays, but in addition to cons cells, not defined in terms of them).

But, they are a similar kind of contender to tables in that you don't have a bunch of different "kinds" of things, only lists with the ability to nest.


Claim 2: Tables are more consistent and easier to learn than dedicated data structures.

Pro:

Con:


Claim 3: Tables scale more easily.

Pro:

Con: Regarding performance: As computers get cheaper and smaller, the idea of a NimbleDatabase in a wrist-watch will be less and less far-fetched. After all, some want to put Java engines in wrist-watches. Table engines are hardly more bloated. Also, you can't play both sides of the road. If it is a small structure, then the overhead of the table engine is probably not going to matter most of the time anyhow. Efficiency generally does not matter for small sizes. And if it grows big or needs concurrency, RDBMS scale easier there than linked lists.

Nobody is objecting to NimbleDatabase, not even in the wrist-watch. It's the more extreme "we should use tables for everything!" idea that raises objections.


Tasks where dedicated structures are harder to use than tables:

These are things that equivalent dedicated structures seem to have the most DiscontinuitySpike problems with in my experience, as they require more coding than a table would to add them:


'Real-world example of Collections vs Tables'

I wrote a python program, and after reading about TOP, noticed that several of my classes had the sole purpose of emulating either a record or a complete tables, including multiple keys and ad-hoc (tabular flat-file) persistence which you get for free with RDBMS tables.

There are two issues. Firstly, procedural/relational approach to access the table (e.g. via SQL or an HDF library) would maybe save 50% of the lines of code over each partially-table-emulating class, which only takes 100-150 lines of python.

Second, is that my program does millions of small operations on data before returning to the user. With RDBMS tables, an SQL query to retrieve a single item is at least several times slower than a native dictionary lookup, and would dominate the running time.

For True Speed, I could have used C++ STL collections, at the expense of a lot of programming time when the algorithms are 'dynamic' in that they do lots of creating and passing around of objects, making it hell to do C memory management or C++ Resource-Acquisition-Is-Initialization.

-- Jettlogic


Discussion & some other thoughts

Surely all data structures are (mathematical) relations between data items. You can represent relations in many ways, tables, structs-and-pointers, s-expressions, etc. Which representation is best is an engineering trade-off, not an ideological absolute.

The flip side is saying that nuts and bolts and standardized 5-volt circuit board parts are "not an ideological absolute". Yet, we find standardization and consistency useful in nuts, bolts, circuit board standards, etc. Tables make a very good standardized "structure" component in a similar way. True, for special purposes you will always need custom-made thingies. But, I think we are at the point in software engineering where we can farm many things off to nearly off-the-shelf standards. Perhaps some might find that boring, but sometimes progress does that. I think smaller dedicated structures made sense when hardware was expensive, but now the slightly higher overhead of table engines is small compared to labor costs. (Although special niches will always be hardware-bound.) -- top

[It should be noted that 5-volt digital logic is rather quickly being replaced with lower-voltage logic; 3.3 and 1.8 volts are now very common. This trend has been going on for several years now]

[At any rate, the expense of having multiple voltage levels (5,3.3,1.8) on a single board, along with the need for logic which works with multiple voltages and/or converts between them, is far greater than having different data structures (relational tables, lists, trees, arrays, etc.) in a software system.]

Perhaps, but why mix if you don't have to? BTW, I think a simple non-nested dictionary array is the upper limit. Beyond that, tables are preferred in my view.

Different containers serve different requirements. HorsesForCourses and all that. Having tables and linked-lists in the same application doesn't hurt anything; the two need not come in conflict. I see no compelling reason to use tables for everything. (When there is a good reason to use a table, then I use one. When a linked list is more appropriate, I use that. This ain't rocket science...)

I have not seen enough "table failures" to agree with that. If you have the table engine in there anyhow, then the additional cost of another table is trivial. Plus it scales in complexity better. Better future-proofing.

[delete troll]


Much of this page seems to be predicated on the idea that tables == database; and table-behavior == SQL. This misconception is apparent on both sides of the debate. Tables don't do concurrency, sorting, persistence, etc. RDBMS's do these things. It's true that these things can easily be added, but that just extends the basic error of mixing abstraction with implementation.

But upgrading from mere tables to a RDBMS is minor compared to the alternatives. It is a few tweaks here and there, not a complete switch to a different kind of thing or different kind of API. It is more incremental in nature compared to the alternatives.

Stack Example

Someone brought up the fact that tables aren't stacks. True, but if you're thinking in an abstract domain then you don't need stacks. What you want is to get the most recent data added to a collection, a concept can be implemented in many ways. The simplest would be a column that represents the order in which the rows were added into the table (e.g. an auto-incrementing field). The big weakness of tables is that they force you to think about these fields early; the great strength of tables is that they force you to think about these fields early.

Most programmers like to assume that things like references and collections-classes "just work". This is good for programming, but can lead to premature fixing of the underlying metaphors of the design. If you think "I need a stack here" then you miss the fact that you are simply defining an ordering-relation for extracting data from a container. Ignoring details like this can be good, but hiding them may also prevent you from exploiting it.

Take the example of an audit trail. If you want an audit trail, then you don't want to lose data from the table. You could do this by having an audi-object separate from your stack object (with reference-semantics this is not hard); or you could think in terms of tables, and redefine the ordering-relation for stack-usage to find the most-recent "active" row; and remove by marking it as inactive. A "stack" view thinks inactive rows are deleted, whereas the "audit" view sees active Vs inactive rows.

Interesting analogy, but I am not sure how the last paragraph points out "premature fixing of metaphors" described in the second paragraph, at least not "bad" fixing. It is simply a decision whether to use relational as the underlying philosophy or the "old style" approach.

The table is more flexible. With the stack, you either have to add a secondary new structure to get the audit trail, or abandon a pure stack altogether. With the table, you just make some adjustments, such as an added "is_active" flag column, but are still using the same table. It is back to the advantage of has-a over is-a.


The definition of "General Purpose" does not mean "Universal". It is an antonym for "Special Purpose". For example, an X86 processor is general purpose, and an NVIDIA or ATI graphics GPU is special purpose (specialized for graphics). A Pentium, being general purpose, can do the computations that the GPU does, but cannot do them with anything like the same performance. The same is true of a turing-complete formalism based on tables. Tables excel at computations that are "Single Instruction, Multiple Data" in nature (the form of parallelism where you want to apply a computation to a large set of data, where the computations on individual items do not interact), permitting powerful computations to be expressed succinctly. They are also very good for expressing tedious control systems: a natural representation of a state machine is a structure known as a "state transition *table*". The only reason that it's easy to lose sight of the power of tables is that many people's introduction is via the brain-dead language known as SQL. -- DaveWhipp

Hmmm. That is an interesting way to look at relational tables. However, things like Join operations and aggregation require inter-row interaction. Are there any candidate "general purpose" non-SIMD structures/paradigms?

You might want to read some of the old (early '90s) books on the ShlaerMellorMethod. This method was popular in telecoms and control applications. You modeled your application using tables and relations that have associated state machines to describe (not proscribe) their behavior over time (in response to events). By the mid nineties it was losing in the methodology wars, but had developed the formalism to include an action language rooted in SetTheory. Much of the work has been carried over to define executable profiles for UML modeling (ExecutableUml).


Part of what bothers me about dedicated structures is their IS-A nature when a HAS-A nature would be more flexible interface-wise it seems to me. Why is an "ordered list" not a list with sorting set to "on"? Or why do we have to go through gyrations to get a 3-column structure out of a map (with two columns)? The number of columns and the number of sortable and/or unique columns should be features (attributes) that are switched on or off as needed. Dedicated structures seem to be a sloppy attempt to take combinations of all these orthogonal or semi-orthogonal features and turn them into a taxonomy out of historical habit.

Perhaps if the interfaces were not dependent on the "type" of dedicated structure, it would not be much of an issue. However, it is in most APIs. You cannot change a list into a map without having to overhaul existing interfaces that used the list. Tables appear better at this because adding a new column, a new key, and a new sorting column generally do not require changing existing code. If one sets out to make the interface handle such changes incrementally, I believe the result will resemble tables. [Note: I may have repeated this issue elsewhere; it sounds familiar, but I forgot where.]

The answer to your question is "efficiency and simplicity". We don't need one super mega list with every possible feature embedded in it waiting to be switched on when the data is just a list. Sometimes a list is a list. When it is, just use a list. If that changes and it needs to be an ordered list, then use an ordered list.

By your logic, all we need is one big program with every feature in it waiting to be switched on. (Which is one way of thinking about a TuringMachine. Our job is to figure out how to set the switches to get the desired behavior.)

No, you are exaggerating. However, that is an interesting way to view things. Note that we don't have to necessarily include features not actually used. Anyhow, think of it as a thought experiment and focus on the interface, not on bytes consumed.

If all you mean is that ordered list and list use the same interface, we already do that. We also don't include features not actually used.

Well, some of the typical set of dedicated structures indeed do or can share some parts of their interfaces, however, each cannot be replaced by the other without changing existing code in many cases, especially when using combinations. Besides, why have a different name for each combination? Many are just names given to each iteration in a loop that would resemble something like:

  ...for columns = 1 to 2
  .....for unique keys = 0 to 1
  ........for sortKeys = 0 to 1
(Dots to prevent TabMunging)

It is illogical to me to create a taxonomy of combinations in place of describing the 3 or so factors that define it. It's a form of AttributesInNameSmell. Plus, tables are not limited to just 1 or 2 columns, keys, sorts, etc. The sky is the limit. The small numbers in the traditional data structures are arbitrary maximums. I see nothing magic in them. I don't get it. You keep talking about taking up less RAM and stuff, but have no interface-focused justification that I can see. On a small local scale they are possibly faster and take up less RAM; I won't challenge that claim much (yet). But is that it? I don't see many change-friendly properties from a software management standpoint.


Why is an "ordered list" not a list with sorting set to "on"? '

How does one maintain order in the list? The problem with an ordered list in a database table is that to reorder the list, one must update a sequence number in each table row. Give me a linked list any day, where I can reorder items by only modifying the neighbors.

Isn't that called a "linked list"? One can implement such in tables by having previous-ID and next-ID foreign keys. I have not found a need to use such very often, mostly because it is tough to hook/define into a UI. Thus, they perhaps are best used for internal structures. Another approach is to use a sequence number (float or double) that takes the average when you insert something between two nodes. When the decimals come close to the limit of floats or doubles, then resequence the whole list. Triggers can assist with this.

Please note that ordering is only significant to a user; one certainly would not bother about reordering lists unless the user wanted to be able to specify the order of items in a list. There are times when the user will need to specify order based on some criteria not contained in the data and must be user specified. ''In what way? What if you want to perform a non-commutative operation on the items in the ordered list, in order? -- AdamBerger


A bunch of the above seems predicated on the desire for UniformityUberAlles. The line of argument goes like this:

This argument is fallacious in a couple of ways.

Regarding your linked list argument, if you implement a linked list directly, you end up creating a structure resembling:

  struc linkedListNode{
previous: ^linkedListNode  // pointer to another node
next: ^linkedListNode
dataPortion...
  }
A direct table version is pretty much the same thing. If you want dedicated functions/stored-procedures/triggers such as car, cdr, etc., then go ahead and make them.

[So what you're saying is that if one implements a linked list "directly", one must define a node structure, and write whatever functions (car, cdr, etc.) one wants to use to access the list. While if one implements a linked list using tables, one must define a node table, and then implement "functions/stored-procedures/triggers" to access the list. And the difference is what exactly?]

[EditHint: move discussion related to linked lists to one section.]


To me, the greatest pro of TOP is one actually not discussed here yet: the promise to fulfill the NoPointersIdeal. I'm currently researching this (however as a sideline to my main research on adaptive hypertext). Eventually I'll have more information.

-- MariusAmadoAlves

In what way does TOP get rid of pointers? Even topmind himself said, above, that to implement things like linked lists, stacks, and queues using tables, you need a column containing sequence numbers. In what way do those numbers differ from pointers?

In that they are *keys*. [And so the difference is? Whether you call them pointers, references, keys, indexes, or Spot the Wonder Dog, it doesn't change the fact that you have to do the same operations on them when you insert or delete items in the collection as you would if you were doing the same operation using pointers in C. Tables provide no additional abstraction.]

Tables *do* provide additional abstraction: arbitrary views. The set of possible keys is open. The set of pointers is closed. Consider the dataset of pairs (letter, number)

{(a, 1), (a, 2), (b, 1), (b, 2)}
Put this in an array or linked list, and you cannot declaratively express the subset of pairs with letter 'a'. Put it an table, and that comes naturally.

[You can't?

  filter (\x -> (head x) == 'a') input
-- JonathanTang]

I would note that your example does not use a column of sequence numbers. It is not a list, stack, or queue. How does this relate to my questions about pointers?

[But now you've changed the problem. We were discussing using tables to emulate stacks, queues, and lists. Those things have standard interfaces. If you allow arbitrary views, you are no longer modeling stacks, queues, or lists. The fact that those structures don't provide arbitrary views is irrelevant; that's not what they are designed for.]

I don't think the concept of pointers can be completely wrung out of software. Even a US Zip-code is a pointer in some ways. But we can strive to reduce them and reduce the problems caused by them. Also note that a table that emulates a linked list cannot be hacked to gain direct memory access in the way live RAM addresses can, because it is not referencing actual memory. And, it can be used by multiple languages at the same time. By the way, why is a sequence number used to create a stack a "pointer"? -- top

Pointers != "live RAM addresses".

But are all sequence numbers "pointers"?

If they are used to reference elements, they are pointers. Many languages use pointers that aren't physical addresses.

[I prefer the usage where "pointer" is used only for RAM/core addresses, and "reference" is used for everything else, because it's handy to distinguish the two things. E.g. a symbolic link in the filesystem is a reference, but I dislike calling it a pointer. That terminology distinction certainly isn't universal, but I think it's very common.]

The idea of relational is that "references" are there if you want to use them, but they are not the only means to produce views of things. They are simply attributes that describe nouns like any other attribute.

No one disputed that.


My idea is that the references should be made in terms of the abstraction. C pointers are in line with the general C abstraction, because that is the von Neumann machine. That is, the semantic universe of C is the machine model. This is the greatness of C, and what made it snowball.

But we want higher-level models now. That's the whole force for OOP, for example.

By "in terms with the abstraction" I mean, for example, that a vector element should only be referenced by its index, a tree node by its path, a table element by its key. Never by a pointer, which is breaking the abstraction, spiking through it right down to the machine model.

Here by "abstraction" I'm thinking mostly *containers*, but other concepts fit as well.

"pointer"."reference"
.
.
.position
  addressaccess.key
(C)(Ada).iterator(TOP)
  ........................................
  machine. abstraction
  model.model
-- MariusAmadoAlves

Exactly; so "pointers" are more concrete while "references" are more abstract.


[Hey, nothing like an explanation of relational theory from guy that knows one data structure. C'mon top, tell us some more about how we're all wrong and you're right.]

The arguments are given above (consistency, scaling, viewability, etc.) If you disagree with them, then I don't know what else to say. They seem to be strong points to me. Maybe it is all subjective preferences. Perhaps if I did not "grow up" on nimble table-based systems such as Xbase (even though the language itself needed work) I would not have this view. Let's hope we all get a chance to use our pet techniques and not be kept from that goal by The-One-Way zealots. Let's hope we can live in a world where FP fans can use FP, where Java zealots cannot keep Smalltalk from Smalltalk fans, etc. I don't suggest that TOP be forced down everybody's throat, anymore than OO should be forced down everybody's throat. -- top [Maybe move this to the TopMind Q&A section in a few months.]

I don't think any of us (at least not myself) want to force OO down anyone's throat. I, of course, cannot take responsibility for tool vendors' hype, but that's marketing and not science.


Something about this discussion has been bothering me for some time now, and I think I see what it is. I think that there is a MetaLevel conflict in TOP's comparison between trees, lists, collections, etc., on the one hand, and tables on the other. While it is hard to say that one type of data structure is 'higher level' than another, there is surely a difference between a structure which is defined in terms of it's implementation, such as a linked list or an array, and one which is defined in terms of it's purpose, such as a collection. There is often some confusion even with a given data structure type; for example, a stack can refer to any last-in-first-out ordered collection, but it most often refers to the specific implementation via an array and a stack pointer, especially when implemented in hardware.

Whether "data structures" are implementations or interfaces is unsettled debate.

Relational tables are a very high level abstraction, with a multi-level structure and complex intrinsic behavior. They can be implemented in a variety of manners, and in fact a given table may in it's lifetime be 'transferred' from one implementation form to another (from a disk-based b-tree to a multidimensional array in memory, to give one common example) while retaining its essential 'table-ness'. It seems to me that comparing a database to a tree or list is like comparing and apple to a carbon atom; the apple may consist in part of many carbon atoms (among others), but the properties of them are radically different.

Even the most MinimalTable is vastly more complex than a LinkedList containing the same data, even if the actual in-memory structure of the data is the same, because the table also has indices, relational links, as so forth. Substituting a table for a lighter-weight structure is not only an AbstractionInversion, but also overkill.

Why is it overkill? See PageAnchor: "virtual_tool" under AbstractionInversion. It is only "overkill" if the drawbacks outweigh the benefits. Are the drawbacks only performance? Maybe we are being "wasteful" on the low-end, but it does not matter. At that scale CPU power is cheap. Perhaps this is kind of like the depression-era (1930's) generation versus those born in the 50's and 60's. The depression-era person may take worn shoes to a shoe-smith to be repaired, while the 60's person will just throw them out and buy another. When it comes to CPU usage, I am of the latter generation. More on this below.

Even in terms of uniformity it fails, because you are treating the table as if it were a list, or a tree, or what have you, rather than using the structures and features inherent in the table itself - in effect, it would be turning TOP on it's head, using tables to implement an ad-hoc AbstractDataType hierarchy.

I don't see why it fails "uniformity". Please clarify. Using a table as a stack is not going to have a lot of variation from developer-to-developer.

Conversely, it is entirely possible to implement a MinimalTable abstraction using, say, linked lists, and then using those as to construct as NimbleDatabase - in effect, creating a DBMS library. Indeed, this is precisely what would be necessary to implement a TOP system, at least underneath.

I suspect my perspective on this is somewhat different from that of many programmers, because of my background in Scheme; in the LispFamily languages, it is not at all unusual to create a table structure from lists, and then abstract it so that it is effectively treated as a unitary database from the perspective of the ClientProgrammer. While this is done in more conventional languages as well, it is not as common; more importantly, tables constructed this way are easily displayed as tables without having to write a formatted print function, and can be stored in the same EssExpression syntax as the program. A simple example of this can be found in Section 3.3.3 of StructureAndInterpretationOfComputerPrograms. -- JayOsako

So it comes down to EssExpressions versus TOP as the EverythingIsa finalists, eh? My biggest problem with EssExpressions is that they don't narrow down the possible variations enough (lack standards/consistency) and don't scale. They suffer from the typical complaints of "navigational" structures (NavigationalDatabase). For example, look at the different ways to implement many-to-many relationships using EssExpressions. Tables can do the low-end, medium-end, and the high-end with little or no changes. EssExpressions can do the low-end great, the medium okay, but suck at the high-end. I would graph it as follows:

 EssExpressions:
 Low: ***************
 Med: ***************
 Hi_: *****

Tables: Low: ******* Med: ***************** Hi_: *****************
The average for TOP is higher than es-exp's in my opinion, so they are the better general-purpose tool, especially if complexity of a project grows over time. I rarely see project complexity shrink over time, thus it seems more logical to emphasize the higher end when making a judgment rather than the lower end. If you see dark clouds on the horizon, you should pick the bigger umbrella even though the current rain may be light. -- top

That is just the opposite of what I meant, really. The last example was just that - an example of one way to implement a table in a given language. The points I'm making is that a table is not a simple data structure, but a high level abstraction which can be implemented in several different ways, and that turning around and implementing lower-level structures using tables is counter-productive. -- JayOsako

Why is it counter-productive? I agree with using lists and maps when appropriate for small local usage if the language readily provides them, but nesting/combining them is past the threshold in my opinion. For example, I sometimes use maps in place of named parameters when languages do not support named parameters. -- top

By the time you are scaling to the point where it would make sense to use a table in lieu of a list, array, or collection, you would be better off redesigning the data structure as a full-blown table (or some equally sophisticated data management method), rather than using the table to emulate the simpler data structure. Mind you, I suspect that most people would put that point a lot farther up the scale than you would.

In the end, I am not really disagreeing with your basic arguments - that tables can be used as general-purpose data structures (at least in principle), that too much emphasis is placed on OOP in areas where it is poorly suited, and that relational tools (among others) are underutilized in general - but I think that you are undercutting your own argument with your stridency. You'd do a lot better if you demonstrated your point rather than arguing it, especially considering how outnumbered you are. -- JayOsako

Okay, examples:


I would like to say that

How is the "table engine" going to know if a given EssExpression fits enough table rules to be usable? I can envision a table engine that internally uses es-exp's or one that exports tables to them, but something that can be altered alternately between the table engine and direct Lisp commands may turn into a headache. Does the table engine validate it each time before using it as a table? Or just raise an error along the way as it encounters them? That is going to put more burden on the table engine as far as complexity and speed. And, it is possible it may result in major corruption if the misinterpretation results in scrambled eggs. It may be unrealistic to expect the table engine to detect all possible corruptions of es-exp's (corruptions being non-table-shaped or table-interpretable structures). For example, there may be something that is table-able, but not in the developer expected. The table engine may just "go with it", and create a growing problem. -- top


How is the "table engine" going to know if a given EssExpression fits enough table rules to be usable?

Because it was constraint to be so. How does "table engine" store row in current DB implementation, Array of bytes? How does "table engine" make sure array of bytes fits enough table rule to be usable? It's the same thing. Data is, after all, Data. It cannot validate its own consistency. Can your Table detect if data in each column is in correct type? No, it cannot. What detects the inconsistency is your DB engine. Same would go if we have nested list DB engine.

but something that can be altered alternately between the table engine and direct Lisp commands may turn into a headache. It may be unrealistic to expect the table engine to detect all possible corruptions of es-exp's

I'm not talking about something tightly integrated to lisp here, in DynamicStringVersusFunctional? you sample putting TCL code in DB. How is that not "unrealistic to expect the table engine to detect all possible corruptions of tcl-exp's".

I am not talking about LISP at all, I'm talking about how it's pointless to raise any general purposes data structure, since every data structure can simulate to be other data structure also. Be it nested list simulating table or table simulating nested list.

what's the different between.

 VALUES(1, "john", "20040101")
and
 '(1 "john" "20040101")
And, please, Don't assume that since it's possible to use nested list, everything must be nested and messy. Is it not possible to create messy Database schema?

There seems to be a major miscommunication here, but I have not figured it out yet. Maybe I'll rest it a while and then try again some other day.

The major miscommunication seems to be that your definition of table includes DB engine in it. DB is table WITH required set of operation. Giving any datastructure and that same set of required operation, anything will be as useful as table. That's why I don't see any point of pushing table to be general purposes structure, because anything is.

I am confused by "as useful as table". Do you mean representationally equivalent, or as convenient?

As convenient.

I am still not following your argument. Perhaps we should consider it with both the operations included in "table" and without the operations.

Let's consider different between Table and nested list assume that BOTH has the same set of operation.

 -SQL-- CREATE TABLE employees (id as integer, name as string);
 -SEXP-- (setf employees (create-table '((id integer) (name string)))    #internally implement using nested list
 -SQL -- INSERT INTO employee VALUES (1, 'john')
 - SEXP-- (insert-into employee '(1 "john"))
 -SQL-- DELETE FROM employees as emp WHERE id = 1
 -SEXP (delete-from employees :where ((emp) (= (id emp) 1)))
 -SQL-- SELECT * FROM employees as emp WHERE emp.id = 1
 -SEXP (select-from employees :result :all :where ((emp) (= (id emp)1)))
 -SQL UPDATE INTO employees VALUES (...what ever SQL allow..) WHERE ...
 -SEXP (update-into employees ((emp) (...whatever lisp can do...)) :where ...)
I don't see the big different or convenient advantage between the two.

Reply above.


Why are people in this forum mentally crippling the concept of a table by comparing it to our primitive state of computer science? Modern DBMSes/SQL/OOP are hardly the pinnacle of technology, each with severe limitations, verbosity and redundancies.

This cannot be fully implemented without a revolutionary new paradigm, one that merges a sophisticated DB, a replacement for SQL, an unbloated language (100% of modern languages are redundant), and an IDE. A RelationalLanguage (better than SQL) should be part of the language. Tables should be part of the language.

Tables are conceptual, whether they are 'persisted' in a DB or in memory as a structure or cached, that is simply a 'persistence' attribute of a table. This could be optimized by the compiler or runtime engine. An array/collection/list/tree/dataset/dictionary/stack/hash/map/set/XML/etc. is just a table with differing properties. All should share the same set-like manipulations (add, remove, sort, save, load, copy, find, filter, query, iterate over, read, write, etc.). It is RIDICULOUS that YourFavoriteLanguage has redundant separate structures for every one of these and redundant operations for each of these often with different names.

Tables are the GrandUnificationTheory? of Computer Science. The problem is our sorry state of tools (like SQL and OOP and .NET/Java and IDEs). I find it amusing when some .NET idiot shows me his classes that read a table (from DB) into a table (class), translates it to a table (DataSet) then serializes it (save table) into an XML file (a persisted table) with a million identical methods at each stage. Hell the modern languages spend 30-40% translating/formatting/manipulating identical structures and identical methods.

-- AnonymousDonor

You might be right. Please join the <blatant self-promotion>RelProject and maybe help solve this problem.</blatant self-promotion> -- DaveVoorhis


The PetStore animal menu has a possible example of this. It seems natural to put the description and link info into a map (dictionary) array, especially if we are going to use them for more than one loop or purpose. However, maps don't contain inherent order. If one started out with a table, they could add a sequencing column as needed without overhauling what already exists. -- top


Real-world example of Collections vs Tables

I wrote a python program, and after reading about TOP, noticed that several of my classes had the sole purpose of emulating either a record or a complete table, including multiple keys and ad-hoc (tabular flat-file) persistence which you get for free with RDBMS tables.

There are two issues. Firstly, procedural/relational approach to access the table (e.g. via SQL or an HDF library) would maybe save 50% of the lines of code over each partially-table-emulating class, which only takes 100-150 lines of python.

Second, is that my program does millions of small operations on data before returning to the user. With RDBMS tables, an SQL query to retrieve a single item is at least several times slower than a native dictionary lookup, and would dominate the running time.

For True Speed, I could have used C++ STL collections, at the expense of a lot of programming time when the algorithms are 'dynamic' in that they do lots of creating and passing around of objects, making it hell to do C memory management or C++ Resource-Acquisition-Is-Initialization.

-- Jettlogic

Generally if you are processing a lot, then a DB engine will perform well, and if you are processing only a little bit, then it does not really matter in many cases because it is finished up either way soon. Thus, I am wondering what kind of environment you have such that you need to shave speed or resources. Part of the bulk is the SQL language and parser. A simpler or more regular query language may be warranted if you want to shave time.

---

There is this visual interactive programming language whose name I just cannot recall that recently introduced tables as a means to visualize conditions. Can somebody help and add the link? I think that would be a strong answer for the question AreTablesGeneralPurposeStructures.


Re: "Tables are more general-purpose than mixed structures (bags, stacks, hashes, etc.). Tables can represent anything that bags, stacks, hashes, or other data structures can."

I'd challenge this claim with regards to variants (be they values or objects). Tables are, as traditionally defined, limited to tuples of a single structure. A bag, stack, hash, set, etc. can contain variants because their nature does not impose upon the structure of the values or objects they contain.

It depends on whether you see the "nodes" as being columns or rows. And theoretically, there is DynamicRelational, with flexible columns. Current implementations of RDBMS are often optimized for speed, not structure flexibility. But it does not have to be this way.

Considering a column as a 'node'? I've never done that before. That would make a row into a set of values associated with the given node which, unfortunately, would not be distinguishable from one another (unless there is some other row identifier, like an ordinal position, or at least one column noted as 'header'). Even so, I have trouble seeing how this would handle the issue with variants, especially as it extends into states of virtual objects and such. (...) After thinking about it a bit, I'm quite sure you'd gain nothing by considering columns as nodes... you'll certainly need some sort of header column, at which point you're no better off than you were with a header 'row' at the top of each column in the more traditional column definition... not in any computational sense, at least. Language-wise you'll have an easier time adding features to nodes and essentially calculating header-names if there is more than one header column, but you'll likely have a much more difficult time adding entities and making one entity reference another.

That aside, consider a traditional variant value type: 'Tree A B = Node:(A,Tree A,Tree A) | Leaf:(B)'. The variant is Node vs. Leaf; A and B are immaterial, though they might be primitives, tuples or records, coinductives, or variants themselves. Now, it is very easy to create lists of trees, bags of trees, stacks of trees, etc. You could have a map of trees, but a tree has no inherent name, so you'd need to use an extrinsic one (chosen by the agent working with the tree). To make a table of trees, though, I think you'd need at least two tables - one table representing the collection of trees with identifiers into a table carrying information about each tree (which may also handle the recursion). You've already gone from one collection to two, so one table was unable (via this approach) of storing a collection of trees. That means one table was not very general purpose, IMO.

DynamicRelational is remarkably more flexible in practice, but not computationally so (since you can have a single table with all the columns that your domain for DynamicRelational might ever use). You really need tables where elements can be tables (fixpoint) to have true general purpose structure.

Also, traditional structures like maps hardwire two columns (attributes) into its design/definition. When you want 3 or more, you face a DiscontinuitySpike that you don't face with tables.

With this I agree. My statement only regarded variant types; it was not a general objection to tables.

How about we explore something specific, a realistic UseCase. The generalizations here are not dissectable as-is.

Use case: You wish to represent an unordered collection of tree-values (that are conceptually immutable, so as to simplify the problem), rejecting duplicates. An individual tree value can represent a computation, a search, a piece of knowledge, a grammatical rule, an unlambda function... quite a few things, really. For a number of these, you'll wish the ability to find particular values by features they possess (aka structural patterns). A tree value is of type: Tree N L = node:(N,Tree N L,Tree N L)|leaf:(L). Trees may be arbitrarily complex within its typing constraint. Assume N and L are, for simplicity, strings. They do not name the nodes or leaves.

Use case: A bit easier (due to less component variation): a collection of digraphs. Every digraph is described as (N,E), where N and E are, respectively, a set of nodes and a set of edges. Each node in the digraph has features N', and each edge directionally connects one node to another (directionally) and has features E'. Assume N' and E' are strings, but are not names. Given a node, you must be able to rapidly find all connected edges. Given an edge, you must be able to find its nodes. Nodes probably will need to be named within each graph to allow the edges can be described, but the names should not be considered intrinsic to the nodes or the graph... and should not conflict between graphs (so if nodes must be held in a separate table, they'll need to be named by both graph and node name). A 'set' of digraphs would be considerably more difficult due to the NP problem of identifying isomorphic graphs, and so is left out here.

Use case: You wish to represent a collection of beliefs. Each belief is a datum - a proposition held to have a certain truth value in a set of worlds (or, alternatively, a proposition held to have a certain function of world->truth value), where time is one descriptor of world (so the world of the distant past is different than the world of the present or future). In addition to being a datum, a belief is held without absolute confidence; it is relatively easy to represent confidence in a database as a simple realnum ranging from 0.0 to 1.0; it doesn't even require much precision... 8 bits is plenty since quantifying confidence is such a fuzzy and imprecise thing anyway. For now, we'll also simplify worlds and simply give each one a string name (can come back to this if you pass the first challenge). Propositions, however, will be more difficult. A proposition is an abstract logic function over a world... e.g. P(x) is true or false depending on what 'x' happens to be in the world. A proposition can be a predicate (... make it a simple string for now ...) over some (possibly empty) ordered set of identifiers (which are world-dependent variables, and may also be strings for now), which for the moment we'll simplify to at-most-two identifiers - that is: Truth:(PredicateName?,Identifier1,Identifier2) (with some null ID). And:(Proposition,Proposition), Or:(Proposition,Proposition), Not:(Proposition), ExOr?:(Proposition,Proposition) (to allow OnceAndOnlyOnce), Impossible:(Proposition) ('I believe X is impossible', which tends towards the inference 'I believe Not:X'), Possible:(Proposition) (I believe X is possible; does not infer 'I believe X'), Probable:(Proposition,Probability) (I believe that X has Y% chance of being true), Contingent:(Proposition,Proposition) (I believe that if X is true, then Y is true). You have inference rules, too (e.g. 'Not:(Possible:(Not:(X))) => X, with same confidence'), but those needn't be represented in this collection, so don't worry about them overly much (they could be, though... it's perfectly reasonable to have one believe in a certain set of inference rules. However, that adds pattern-matching stuff and meta-level worlds that make the problem remarkably more difficult). However, to use those inference rules efficiently, you need to be able to quickly find propositions that share common substructures in common worlds in order to find: contradictions, beliefs applying to the same targets (identifiers), beliefs applying in the same worlds, beliefs using the same predicates, replications (where two beliefs support one another), abstraction points, and more. My expectations that a typical table's indexing can provide more than a sputtering bit of help on the searches, or that even several would work to well represent the collection (even as well as a straightforward 'set of value objects' or 'list of value objects'), are extremely dismal. The structure needed here needs an enormous variety of indices if it is to meet time-constraint requirements of a modern inference engine.

Is your argument related to "variants", or performance?

Use case: Interactive scene-graph - a collection of object-descriptors to be drawn and animated over a sequence of frames with constraints on ordering and occlusion, plus a feedback requirement requiring that ready access to which objects might be the target of a user's attention when the user points at something on the screen. The collection itself needs only to provide the indexing to make this possible.

I would like to see some sample data. Remember, anything that can be represented as a graph of object pointers can also be represented relationally.

Although I am no Cyc expert, initially, it seems one may be able to represent the above with just one table: (Technically, Cyc isn't involved at all. I hate it when people talking about belief databases start thinking "It's Cyc! It's Cyc!" or start using the word 'Cyc' just to make themselves look a bit more knowledgeable on the subject. This is just a collection of beliefs and an inference engine. Cyc is a -particular- collection of beliefs and inference engine.)

 table: beliefs
 ---------
 beliefID
 operator   // ex: and, or, probable, etc
 weighting  // 0.0 to 1.0
 param1
 param2
 param3
 param4
And I have no problem using generic strings to store parameters which may be numbers in some cases. I prefer using dynamic or type-free languages anyhow such that it does not matter to my app code. If we want to validate numericness for some combinations, we can put in triggers like this:

  TRIGGER ON CHANGE TO TABLE BELIEFS(newrec, oldrec))
     IF newrec.operator = 'FOO' AND NOT IS_NUMERIC_PARSE(newrec.param2)
        RAISE ERROR '2nd parameter must be numeric for op FOO'
     END IF
  END TRIGGER
Where does beliefID come from?'

Let's assume it is an auto-number.

and, more importantly, how do we know which propositions are in our collection of beliefs,

Don't the operators determine that? I am hesitant to make a hard-split between them in order to serve future flexibility.

     (ID,Op,Wt,Param1,Param2,Param3,Param4)
     { ...
       (12345,"Truth",100%,"General","Tables",-,-),
       (12346,"Not",100%,"12345",-,-,-)
       ...
     }
and which are simply propositions that are part of other beliefs?

Add a many-to-many table

 table: belief_xref
 --------------
 beliefRef1
 beliefRef2
 relationStrength
And why do propositions that are not directly associated with beliefs have confidence values? Propositions don't have confidence values... only beliefs do.

If they are not used, then ignore them.

Do plain old propositions have 'beliefIDs'?

Perhaps I should have called them "suppositions" instead of beliefs. I don't yet see a reason to make two different tables, but am not hard against it.

What would be the algorithmic cost of querying something simple like, say, all the beliefs with predicates referencing a particular identifier? (I think it will be higher than O(N^2) due to reducing from sub-propositions back up to the belief). Oh, and you lost the world (that one is an easy fix). You've a ways to go yet, and I think you'll quickly violate several "philosophies of relational" that come from Codd (in particular, normalization forms). Your use of triggers to maintain consistency as part of the database is admirable, but I don't recall it being a general property of tables or even of relational databases. I imagine, though, you could say it in a declarative manner.

I will not comment on performance issues, other that saying that if you can roll-your-own Cyc engine then one could also roll their own relational Cyc engine and tune it to need. (That depends on what you mean by "relational cyc engine". I think you'll need to actually explore the approach for such a query if you're to understand the difficulties involved in it.)

As far as triggers, perhaps constraints can do the same thing. The difference between them is mostly a language implementation issue. And I don't see it violating common normalization because there is no duplication from not using columns. (Maybe in fixed-record implementations there is such, but I am not assuming we're using such.)

. . . .

I was tempted, but won't include: use-case: text file - ordered collection of characters over which you perform pattern-searches over subranges. It isn't relevant to the point on variant types, but does represent another area where tables aren't at all useful (you can represent any list or array as a table, but try putting a subrange pattern-search into any modern query language and head will be dating desk in no time).

Relational does not define implementation. (With this I'd argue (speaking as a language designer): Language properties ALWAYS constrain implementation. If a language is to provide or relational operators, that property constrains your implementation. Constraint is a valid approach to definition (anything that meets this constraint is relational...), though it isn't specification. So it is only accurate to say that Relational doesn't specify implementation.) A DB engine can be tuned or even custom-built for a particular usage pattern without losing its relational nature (although maybe making some sticky or ugly performance trade-offs). You can argue that Oracle or MySql would be too slow for such, but that does not mean it cannot be done in the absolute. Perhaps what is needed is a toolkit to build such contraptions. That way one can get dedicated performance, but not lose most of the flexibility of relational, but merely slower performance for areas outside of the target usage patterns.

In debates like this, one needs to be very very specific. Vague notions usually don't translate well across brains. For one, tree operations can be added to relational operators so that more can be done declaratively. Oracle includes some. Second, trees are overused in my opinion. General-purpose does not necessarily mean 100% everwhere everything anyhow. A swiss army knife may not have a flashlight, while Bob's knive does, but that does not disqualify it. And you have not showed an objective performance problem. You provide no clear structures and no clear sample data or queries that are failing them. And for that matter, matching every possible navigational structure in performance is not necessarily required to be called "general purpose". A swiss army knife may not be the optimum wine bottle opener, but that does not disqualify it. You lack precision and specificity in your complaints still. Further, if tables do degenerate into navigational-like messes in worse cases (not shown yet), that does not disqualify them by itself. You are beating up imaginary strawmen, not just strawmen but imaginary ones. Your problems with relational are non-established and the reasons why such would disqualify "general purpose" if true have also not been established either. You have to pass thru 2 gates, not just one. -- top

I only claim that tables aren't so great with variants (especially the fixpoint-recursive sort). Unfortunately, variants are everywhere, like it or not - almost every domain has them.

I am not sure what you are calling a "variant". Do you have a formal definition we can use? Tree operations might help; I'm not familiar with them, and would need to know what conceptual and computational access they provide. You lie about providing "no clear structures": several use-cases (tree, digraph, belief) - (Now you are going into your rude mode again. I honestly find your descriptions vague and poorly structured and I swear to every known deity and supernatural force that this is how I perceive them. I am not making that up. I think you need a lesson in technical documentation, but this is not the place. Above you admitted you were not familiar with Oracle's tree operations, yet claim that your tree example is slam-dunk evidence. This appears to be a contradiction. (If I were rude like you, I might be tempted to call it a "lie".) See also: RelationalAndTrees. (I'll check it out.)) - ...provide structures as clear as you'd see listed as Datatypes in Haskell! I can't respect your statement that these aren't "clear" enough. Admittedly, scene-graphs aren't so clear, but they aren't in practice either... all you really ever know is that there are rather arbitrary geometric shapes (which is why I made a challenge of it, since lists or arrays of virtual objects, plus a bp-index, works fairly well). As far as sample-data and queries that fail: failures due to variants are failures on algorithmic complexity. It's difficult to show with just a little sample-data; rather, I need to ask you pretend that there are a few million records, that some of the variants are thousands deep (recursively), and then demand you find certain properties. You and I probably think on different levels, anyway... sample data is how I perform tests, but not how I think about or understand structure. If you wish to run the use-cases, create your own sample data.

I cannot evaluate the practicality of your test scenarios if I am not familiar with the domain. I could not say usage pattern X of your belief gizmo is more likely than pattern Y, etc. Of course, you could probably find scenarios where a navigational structure is faster. That is not in dispute. "General-purpose" does NOT mean "always best for all operations".

I'm sure you can construct a generator of trees or beliefs or even some simple scenes (boxes and spheres and cones, perhaps). If you feel utterly confident in your ivory tower, then don't bother testing your precious tables against any of the gauntlets I provided. My understanding is from experience, use-cases provided from sandboxes I've played in before. It's been a LONG time (years) since I actually did all that, but I still trust the experience and my recollection of the representations I used, queries I built, and calculations I performed at the time. As far as 'objective performance problems', I can only offer you algorithmic ones under optimal conditions based on any particular choice of representation. After all, even if I did generate sample-data and run benchmarks, you'd stand like a righteous bastard and shout out: "Lo, and is that particular relational engine optimized and tweaked and twiddled to your domain? For if not, it is invalid!

Again, how come only YOU are allowed to tweak for a domain? And again, not always matching navigational performance does not disqualify something. General-purpose ANYTHING is rarely optimum for everything. - Hmmm? What gave you the impression that only I am allowed to tweak for the domain. I have the feeling you've written this based on a false impression. I say only that if you -specialize- an implementation to the domain you no longer have -general-purpose-, period. Enabling new features that apply across many domains is quite legal... which would include such things as indexing on "less-than" and "greater-than" as well as "equal-to".

And Lo, is that sample-data realistic? For if not, it is invalid!

Is this not an important issue? Like I said, of course there will be situations that favor one approach over the other. - It IS an important issue. However, my overall point is that you'll just be looking for any excuse to entirely dismiss my efforts. Indeed, the fact that you just said: "Is this not an important issue" implies that you were inclined to do so.

To me, this is proven.

If it's only proven in your head, it is not usable to those on the outside. Maybe you are just not a good relational designer. It is hard to tell without specifics. (In my head, eh? Other people have experienced this and reported it as part of ObjectRelationalImpedenceMismatch?. It seems Oracle users experienced it to the point that they've added special operators to deal with the problem. However, I don't know how to better prove it to you except by allowing you to experience the same problems I encountered.)

I'd love to see a solid disproof... something I could use in the future when I encounter these problems again. But I've never seen it. Tables are good for many things (e.g. anything one could put in a CSV), but horrible for collections of recursive variants. Databases can do collections of recursive variants, but even they can't do it well. To you, my statements are simply a challenge... or, as you call it, "a bunch of imaginary strawmen" defending a castle in the sky. Bah. For better or worse, you're biased, mister TableOrientedProgramming. With a name like that, who wouldn't be? Come if you wish, build yourself a gauntlet in my sandbox and test whether your faith in your beloved tables is well placed. I'll gladly help if you're confused, where what is clear-as-day to me (who knows the listed domains) seems fuzzy and obtuse to you (if only because you lack experience in them).

I will agree there may be classes of problems where tables don't shine so well, and perhaps your Cyc-like scenarios are examples of such. But that does not necessarily make tables "non-general-purpose". (Just means they aren't general to domains using fixpoint variants, does it not?) Like I said above, there may be times where a knife with a flashlight built-in is better than a Swiss army knife without a flashlight (they usually don't have one). Yet, most would still call it a "general-purpose" tool. As far as the classes of problems where tables and relational may not do so well, perhaps. Maybe ways can be devised such that extra relational operators can be added to deal with them. But maybe nobody does it because it is not a common enough niche. We may be able to tweak/customize relational rather than toss it altogether for some navigational contraption for such problems. You make it sound all-or-nothing, like "I tried Oracle and it didn't do well, so I am gonna invent the Adaptive-Diaper-Doo-Dad and ignore relational 100%."

Eh? My Adaptive-Diaper-Doo-Dad will have relational built right into it. Relational is still a valuable tool, and it's good for at least half (primitives, semantic-typed, product-typed) of the whole set of commonly useful basic value-type categories (primitive, semantic-typed product-typed, coproduct (or variant), structure-abstracted collections and sets). It needs to be paired with something to make the whole good for the rest... just trying to figure out what that something is. I think the three big things tables don't handle well are variants (esp. the fixedpoint recursive sort), cross-component ordinal subrange pattern matching (e.g. if you were using a whole table to represent a list or string or tree, and needed to find a pattern that is spread across many related components), and cross-component group pattern-matching on the whole collection (e.g. matching against a set).

....

Hmmm... RelationalAndTrees seems to attempt to handle one special case of tree structures that exists solely within one table. It solves only a small part of the problem: the fight with relational to get access to the whole tree within a collection. It will handle some variant types, but it is rather easy to simply build a slightly more complex variant type just to trip it up. Fortunately, most variant types used in practice -are- simple enough to be handled by this. Unfortunately, all it can do is the easiest part: locating one tree for examination. It is telling that you fight with relational just to accomplish this part of the problem that is easiest when working with a more traditional list of trees.

I was rather hoping for something more impressive.

Perhaps it was not clear from the parameter description, but joins can be done first to bring in multiple tables

One can be 'evil' and make variants that would reference four alternative tables based on column, each of which have variants that reference one of four tables... in one huge recursive mess. That'd make transitive closure rather difficult, I'd imagine. Which joins do you do first? (of four possible tables... do you do all four?) And would it help?


Partial Agreement with-regards-to Pointer-Hopping

Perhaps we don't really disagree that much after all. I will agree that there may be certain classes of problems where to use tables; one has to do a large amount of procedural iteration and "pointer hopping" from node-to-node (record-to-record), where declarative queries are of less use. In such cases table-centricity may indeed suffer from practical and theoretical (big-O) performance problems compared to more direct address-pointer solutions. This we can both agree with for now. We don't need to argue these any more.

Now, it may be possible to custom-tune a relational engine such that the common operations of the domain can be done declaratively instead, yet still be "close to the metal". But, this perhaps disqualifies it from being "general-purpose". But the alternatives need a lot of custom fiddling also. If you are building custom pointer handlers then you are essentially building a kind of database from scratch. Thus, neither qualifies as "general-purpose" under this view. (Your "adaptive" thingy only exists on paper, so it is hard to say how much setup work it requires. And perhaps a pointer-centric relational kit can also be built.)

But I claim the need for such pointer-hopping is either not that common, or only a fraction of the total data usage pattern for most applications. Even if some of the app requires lots of pointer hopping, that does not mean that all does. Declarative queries may still be needed and useful for other parts of the app, and perhaps a majority. And one may sacrifice pointer-hopping performance in order to get the other RDBMS benefits (ad-hoc declarative querying, out-of-the-box concurrency, integrity management, import-export utils, off-the-shelf report writers, etc.)

Indeed there may be niches where the pointer-hopping-centric performance is the critical path (bottleneck) and the other things mentioned above are not important enough to give up these critical path performance needs. But like the flashlight-knife versus Swiss-army-knife analogy shows, having everything plus the kitchen sink is not necessary to qualify as "general-purpose".

Perhaps you just work more on the pointer-centric problems, and thus judge "general-purpose" from that perspective. You need a flashlight-knife and thus the Swiss-relational-knife lacks a key item for those kinds of projects. A career military professional is going to view "general-purpose pickup truck" differently than a commercial construction manager. Our key difference is perhaps in the perceived frequency of such problems (or the importance of the pointer-hopping portion of the app). Without formal surveys and studies, it is hard to say whose experiences are more representative of the typical or most frequent apps. If we agree that this is the only real point of our differences, then we can AgreeToDisagree on this issue and move on.

In the long-run I would be inclined to suggest the industry try to create relational tools or kits that help address the needs of pointer-hopping-centric apps both by tuning the performance for such activities, and by adding more graph and tree traversal-friendly functions/operations so that larger chunks can be done declaratively. I've yet to see any evidence that such is not possible in the absolute sense. The lack of a push for such to me is evidence for the small size of that niche.

--top

I'd think it's more that there are already other specialized tools and data-management for these (rather broad) 'niches'. I.e. rendering engines and inference engines and functional programming languages, Cyc, OWL, RDF, Haskell, etc. Consider that RDBMS is viewed, at large, as a storage utility (as opposed to a volatile data store or runtime processor) for business domains. It's awesome at that (again, rather broad) 'niche'. People who want tools for other 'niches', however, rarely go first to the RDBMS.... and thus the appearance of demand for such facilities is minimal.

But are there any such tools that you'd consider "general-purpose"? Or do they mostly roll-their-own? Perhaps they avoid RDBMS because they don't have much exposure to them in school or work, not because they won't do the job. Many degree programs skip RDBMS unless you are an IS major instead of CS. CADD may be an exception. Performance tests have shown commercial RDBMS noticeably slower than custom-built CADD thingies. But then again, custom-built will probably always be faster for known usage patterns. But it still may be possible to put a relational interface on top of such a contraption.

Trees as represented in RDBMS are useful when representing real structure of real entities as facts within a database. DV mentions (well, more like 'alludes to') this above in his description of the real purpose for transitive closure. Where trees within tables becomes utterly evil is when used to represent structure of values that are, themselves, just part of a fact (and are not facts in and of themselves). E.g. for a table of propositions, the table doesn't stand for "this IS a proposition" (because a proposition, being a value, is a proposition whether or not it is represented in a table); it instead stands for "and here's where I store the proposition that is referenced from elsewhere". I hope you can see where I'm coming from on this. Violations of DatabaseIsRepresenterOfFacts might be okay when merely leveraging a table as a structure to store data... but not when moving from table to RDBMS.


Re: "Tables are easier to scale, both in size and complexity, without having to overhaul access code, because they are close to a one-size-fits-all structure."

And they are, proportionally, more difficult to design... especially when comes time to provide, maintain, and utilize indices (which are the only means tables can compete with the more traditional heaps, arrays, maps, hashes, etc.). If they were easy to add, all general-purpose languages would provide implementations.

I found tables so easy to use and change in ExBase that one was tempted to use them for everything. It shows that if you design a language around tables instead of the other way around, they can be quite nimble. (And even ExBase had areas it could improve in.) Tools like MicrosoftAccess also can be nimble, but I find it makes it tougher to port existing work from drag-and-click to scripting when you want finer control. It has a mouse-to-scripting DiscontinuitySpike that traditional ExBase didn't.

I'd love to have good tables sitting in my STL toolkit and made 'nimble' via some sort of template-driven query and manipulation language. Hasn't happened yet. Getting tables INTO the language is the first DiscontinuitySpike you need to deal with... which is, essentially, what the above italicized statements say (or were intended to say... I suppose you can interpret it as saying that tables are difficult to design for the domain, whereas the intent was to say that tables are difficult to design into the language and compiler).

Things like maps and garbage-collection were once expensive and exotic, but now are commonplace. The implementation age for nimble tables may yet come.

I'm hoping we'll skip straight to AdaptiveCollection, which generalizes the advantages of tables.

AdaptiveCollection is a goofy mess.

By which metric?


I consider EssExpressions the only comparable competitor. I'm a "conceptual" admirer of LISP, but not a practical admirer. See also: RelationalLispWeenie. -top


See also: GeneralPurposeProgrammingLanguage, MinimalTable


JulyZeroSeven

CategoryDiscussion CategoryCodingIssues, CategoryDataStructure


EditText of this page (last edited August 20, 2014) or FindPage with title or text search