Minimal Table

What are the minimal features of a "table"?

These are general "ranked" feature-levels for suggested light-duty table implementations. The tighter the implementation constraints, the less deep one should go down the list. Ideally, one should be able to replace a light-duty table with a RDBMS table/query/view if more powerful features are eventually needed with little or no code change.

A DictionaryDataStructure (map) with a single key but multiple value columns. Queries return Returns zero or one row. Essentially, a map of vectors (not a map of maps).
- Example access API: rowMap = getRow(table, key)
  - Or in proper syntax: rowMap = table.getRow (key). ;)
A map-like structure with multiple potential key columns and multiple value columns. All keys must be supplied. Returns zero or one rows. Essentially a map of maps/records.
- Example access API: rowMap = getRow(table, key1,...keyN)
The above, supporting access by Boolean expression (predicate), similar to an SQL Where clause. May return zero or all rows.
Access by Boolean expression and sort ability. Similar to having a Where and Order By clause.
- Of course, pedants would argue that if you want to sort, that's outside the scope of the RelationalModel. But sorting is so damned convenient to have in result sets that it is hard to ignore. (Anyways, you can always convert the resultant set/map to an ordered data structure, like a vector or list, then use standard sorting techniques on that).
The above plus multi-table joins.

Add the following, and you start to venture into full DBMS territory.

Multi-user/process access to tables
ACID-compliant
Projections and a whole host of other useful operations in the RelationalAlgebra
Full RDBMS (run in it's own process)
- Not just it's own process; it's own server. And a big beefy one at that. :)
- Do you mean physical server or logical server? What is the difference between a logical server and a process?

Other features which don't seem to fit an easy ranking:

Internal indexes besides key for speed
Typing of columns (Personally, I wouldn't miss column types even in full RDBMS, but will leave it as a personal preference to be hashed out in some static-typing HolyWar somewhere else.)
Persistence

The biggest implementation DiscontinuitySpike is probably the introduction of Boolean expressions. That requires implementing expression parsing.

Horsepucky. In a language with higher-order functions, just pass in a predicate function. No need for writing your own parser - or even eval. You can even do this in C++ - a "table" data structure might be a useful addition to the StandardTemplateLibrary.
Yes, but that perhaps makes the "database engine" tied to one language. I would rather avoid that if possible.
This is sort of like the Java thing with making native OS calls, if you want to do anything powerful, you have to pick a language and go with it. That language will either be SQL, or some API in your native language. You're not going to create something that's both powerful, and generic, without inventing a new middle man language, which is what SQL already is.
We can have the full power of relational without SQL. SQL is the COBOL of relational languages: large syntax. See RelationalLanguage for other ideas. However, SQL is so entrenched that it is hard to ignore at this point. Perhaps some kind of compromise can be made.
I agree, my point was, there's only two ways to fix it, create a new language, or bind it to an existing language as a library. Either way, you'll need to implement predicates, or it'll be pretty much worthless as a query api.
My pet TopsQueryLanguage is something that is designed to be readily API-able in most languages by sticking with simple syntax. Whether each intermediate table is created as an object/handle of the language, or kept internal to the query engine is a design decision left open.

A far more generic definition of a table can be given thusly:

A table (or better, a relation) is simply a set of tuples, such that each of the tuples have the same header/ signature/type (in other words, the same attributes).
Tables have to have a unique key of some kind. I suppose we can define the default key as all the attributes appended together, as mentioned below, but that is sticky to implement in a small footprint and marginally useful.

That pretty much sums up a table. We can embellish tables by adding the concept of CandidateKeys? (all tables, of course, have at least one CandidateKey consisting of all the attributes in the header - otherwise it would be a bag and not a set). We might embellish it in other ways, such as table invariants.

But nowhere in the definition does it mention a "database". A RelationalDatabase obviously is a database composed of relations; but relations can exist outside of a database.

The RelationalAlgebra (doing joins, projects, selects, unions, intersections, etc.) applies to tables; a database isn't required for that.

Things like transactions, persistence, ACID semantics, security, concurrency, etc., are database concerns, on the other hand - but are orthogonal to the RelationalModel. You can have a NavigationalDatabase that provides ACID and security, etc. You probably wouldn't want one for your business data as that's an established AntiPattern; but chances are the software that provides directory services for a large-scale network has a NavigationalDatabase (or something resembling it) under the hood.

Minimal Query Language

I have been considering a way to produce basic filtering queries without having to parse a complex query and expression language. This is to keep the query system small and easy to implement in constrained devices. One approach is to have a query table with the follow columns:

operator
column
operand

A query is simply multiple sets (rows) of these AND'ed together and would represent the form:

  op1(a,b) and op2(c,d) and op3(e,f) and op4(g,h) ....

To create 'OR' conditions, one can use operator "isIn". Here is a set of possible operators:

Equals(column, value)
isIn(column, list) - If column is one of the items in the list
startsWith(column, subString) - similar to LIKE 'foo%'
contains(column, subString) - similar to LIKE '%foo%'
endsWith(column, substring) - similar to LIKE 'foo%'
lessThan(column, value)
And so on with regard to greater-than, less-than-or-equals, etc.

Generally the "list" in the isIn operator would be considered a single string operator of the format: "'aa','bb','cc','etc.'". A bit of parsing may be required to separate the elements internally. Or perhaps the number of operands can be considered open-ended to accommodate. That is an implementation detail left to the implementer. I would suggest making it case-insensitive to avoid the need for case conversion operators.

Of course this is not as powerful as full-blown expressions, but the point here is too keep the internals simple. It is a way to get 80 percent of the power of a WHERE clause with only 10 percent of the implementation size.

You make it sound like boolean expressions are difficult or something, it's not nearly as difficult as you make it sound, there's really no reason to not include full blown expressions.

One may have to do some additional filtering in their processing loop if the query sub-language cannot do everything. But, this is usually the case with queries anyhow. The following expression is shown in SQL-like syntax, prefix, its table form, and API form:

  SQL-based:

  state in ('CA','NY','TX') and wages < 3000 and name LIKE '%rog%'

  Prefix (except for AND):

  isIn(state, 'CA','NY','TX') and lessThan(wages, 3000) and contains(name,'rog')

  Proper prefix EssExpression style

  (and (isIn 'state' 'CA' 'NY' 'TX')
         (lessThan 'wages' 3000)
         (contains 'name' 'rog'))

  Table:

  operator   column   operand
  --------   -------  --------------
  isIn       state    'CA','NY','TX'
  lessThan   wages    3000
  contains   name     'rog'

  OO-API:

  result = Query.new()
                .isIn('state','CA','NY','TX')
                .lessThan('wages', 3000)
                .contains('name', 'rog')
                .execute();

Which form is chosen is up to the implementer. Since it is a sub-set of fuller query languages, such as SQL, existing queries would still work if a larger database system was later used, assuming drivers or translators were created. Also note that some QueryByExample processors use a similar arrangement.

The above syntaxes kinda gloss over queries that involve more than one column... i.e, 'where columnA = columnB'. Not that it isn't possible, in fact it opens some new possibilities.
It does not explicitly exclude it that I see. The operand can be a column name if the quotes are not supplied. However, requiring quotes around constants stored as attributes in data structures or tables could get messy. Maybe it could use a dollar sign prefix or something to mark column names. Then again, it is not that common of a query need in my experience, and we are not trying to cover every possibility; so perhaps it is not something worth implementing. That is something perhaps best left to "full expression" implementations, which is beyond the goal here. Good question, though.
One ramification: the table and OO examples are ambiguous: are we joining the subqueries as 'and' or 'or'? A similar issue with all of the examples is the use of 'isIn("CA","NY","TX")'; again, is it "is in CA and NY and TX" or "is in CA or NY or TX"? Both of these can be handled, I leave it as an exercise to the <insert role here> to fill in the above to cover it (which may well be myself when I've slept for a bit).
It described it as an "or" I believe near the introduction. "And" would be redundant because it already has that ability without the "isIn" operation.

: -- WilliamUnderwood

ColdFusionLanguage has a built-in table type (it calls it a "query" object) which can use a limited set of SQL. However, it is buggy in that it appears the owning company adapted a typed database engine, probably from another company, into the more or less untyped Cold-Fusion language. This has odd side-effects that make them unusable in some circumstances. Maybe they will fix it in future editions.

They went the other way and added column types to the CF-native Query object in CF 7. A big pain, because an application I wrote made extensive use of hand-crafted query objects, sometimes with "magic values" to represent queries that either needed to be saved to the DB or tht hadn't yet been loaded from there. New typing system caused all sorts of conflicts when I merged the hand-crafted and real queries. -- ChrisMellon?

Too bad they didn't take a cue from SqLite and make typing optional but available. But it sounds like the change is better than what they had before: "content-based guessed types". It was nearly useless for internally-defined tables. For example, if a string "kind of" looked like a date, it would start treating the entire column as dates out of the blue. It changed its spots during run-time more often than a female in a furniture store.

I'm not sure the "optional but available" is the right wording. I think SqLite's typing is a joke, or a lie. It's not really optional, it is more like a pretend typing that doesn't really do anything, but it lies/pretends that it is there. How would you like it if the equality sign didn't really do equality checks, it was just optional, but available, for pretend sake. What the F**k.

Specific examples would be helpful.

It doesn't enforce integrity, and hence it is a fake joke bogus type system. It's like having stop signs all around town at the intersection, but no one actually pays attention to them, and no one cares if you go through them. It is a lie, a scam. The reason SqLite offers "optional" typing as people call it (which I reject) is to remain compatible with SQL syntax. If they removed the fake bogus typing from SqLite completely, since it doesn't do anything for people, then the SQL would be incompatible. It's a bogus backward compatible thing. You can send SqLite some SQL, but it just ignores some of your SQL and pretends it doesn't matter. It would be like declaring an int in CeeLanguage and the compiler just ignores it and lets you turn that int into a string which is not an int. CeeLite? let's call it. Would that be an improvement over CeeLanguage, to have a Cee that ignores int? If so, why hasn't someone created such a wonderful language like CeeLight?. Typeless CeeLight?. There is ForthLanguage I guess.

Having optional types may indeed be added for compatibility purposes. It doesn't have to be there for "integrity". I'm not a fan of SqLite's style of dynamic typing, but probably not for the same reasons as you. Dynamic and "lite" typing is generally a nice fit for "lite" tools in my opinion, based on what they are typically used for. There's a time and place for both heavy typing and light typing. -t

Maybe one can use the native expression evaluator of the language rather than reinvent the wheel, if it has one. Just stick with a standard record/object/array naming convention.

  for aRow = eachRow(tableX) {
    matches = myQueryRow(aRow, "a.foo > a.bar + 3");
    saveMatches(matches)
  }
  ...
  function myQueryRow(a: array, expression: string) {
    if (evaluate(expression)) {
      return(a);
    } else {
      return(empty);
    }
  }

There was a recent proposal to add SQL-like constructs (but of course generalized so) to HaskellLanguage list comprehensions:

http://research.microsoft.com/Users/simonpj/papers/list-comp/index.htm

This moves functional languages one more step into the directions of the relational paradigm (or rather makes clear, that relational can be fully subsumed by the functional paradigm).

Noe: Please move is post to a more appropriate place if you know of one (I'm not that well versed in all these TOPical pages).

Is a struct in CeeLanguage a minimal table or minimal row? I find in programming languages people reinvent minimal tables by using structs in Cee, and associative arrays in PHP, and lists along with plain arrays in other languages. With an array you iterate to access data, which is very error prone since array can go out of bounds - no such thing happens with relational queries. With associative arrays you have a nice data structure that you can access via a key, but still not as nice as a table that could act as a minitature temporary database inside your program.

CategoryLowEnd