Multi Paradigm Database

I have been kicking ideas for multi-paradigm database system organizations from a logical perspective. Although I am a RelationalWeenie and think that OODBMS are ModernDinosaurs, I realize that too many people like OO to outright ignore it and it may be better to find a way to get along rather than fight them in a never-ending HolyWar. Thus, I have created a topic where we can kick around ideas for something that works in (at least) both paradigms. Plus, we may get some "dynamic" benefits in the process by moving away from the "static relational" patterns of the current generation, and may even satisfy the NoSql crowd. Features include:

I am not sure such is practical without large compromises, for the differences between the two paradigms are significant in my opinion (TablesAndObjectsAreTooDifferent), but it does not hurt to try.

On approach is one big (conceptual) table with at least the following attributes:

  Table: objects
  ---------------
  objectID// auto-gen integer
  entity  // optional table name
  parentID// to implement object inheritance (see note)
  ...
See below for a critique of this approach.

"Entity" would be the relational table name of a given record (object). Additional requirements may be put upon those records with entities, such as having at least the same required columns of their "master" schema. But since these are things normally expected in the relational world, such as a valid unique key of some sort, I will not address those here much. Those with Entity defined (and fallow the relational rules) can be queried as regular relational data.

The "objects" table would have (assume) open-ended attributes (columns). They would not have to correspond to a pre-designated table schema. This allows objects to have whatever attributes they need and not be bound by relational's traditional "table shape" rules. If you request a given attribute that has not been assigned to a given object/record, then a blank, null, or zero is assumed in its place. (Again, those with "entity" defined might be made more picky to correspond to relational's requirements.) Note that we have to keep our rules for "object" rather open and dynamic so that a wide variety of OOP languages can use such a database. Proponents of static typing or ExpensiveAdministrators to enforce consistency rules may not like this.

This approach also allows one to query objects using relational query languages. (Perhaps some tree-friendly operations could be added, similar to Oracle's tree extensions.) A typical object query in SQL may resemble:

  select a,b,c from objects where foo="bar" and zarg="blog"
Perhaps this would be possible also:

  select * from objects where foo="bar" and zarg="blog"
But, your language or API would have to support some kind of open-ended structure (a list of variable-celled maps) to accept what may be an unknown number of result columns. The ODBC standard may not work well for this.

Note that multiple inheritance is not addressed here. Another table or "side structure" may be needed to support such.

Records that have an "entity" defined can be queried in the usual relational fashion. Those that don't would just have to be queried as a single big table of objects if we want to do relational queries on non-entitied objects. You could perhaps narrow it down by parent, however. The ratio of entitied to non-entitied records would vary per shop, depending on how OO-minded or relational-minded their tech culture is or what they had in the past that they are moving away from. Note that a given record can be part of both relational and object queries.


The above could be implemented with existing RDBMS perhaps, although I could not vouch for performance; and queries would be more complicated in many cases. We add another table:

  Table: attributes
  ----------------------
  objectRef// foreign key to object table
  attribName
  attribValue// type: open-ended string or "blob"
This is essentially an AttributeTable. This approach may make it harder to convert an object to a relational record than the original approach, though.


Example constraint/type dialog:

  COLUMN CONSTRAINT
  .
  Column Name: _________
  .
  Pre-Defined Constraints: [Integer|v] [create new]
  (the first item is a drop-down list)
  .
  Column can be empty:  [yes/no]
  .
  Applies only to Entity: _______  (blank for all)
(dots to prevent Mozilla/Firefox-related wiki bug)

Newly-created constraints would be a function or SP-like thing:

  constraint myConstraint(columnName, colValue, row) {
     var result = "";
     if (columnName=="foo" && getArrayItem(row,"entity","")=="bar") {
       if (contains(colValue, "$")) {
          result="Sorry, no dollar signs allowed in bar rows"
       } 
     }
     return(result)
  }


Classes tend to have 'shape' that corresponds, approximately, to the idea of a table.

"Approximate" is often not good enough when it has to be implemented.

I would approach the problem by embedding the idea of a view into the database as a first-class element, rather than a derived element. This may be sufficient to cover variant types, particularly if the view is 'writable' i.e. has a one-to-one relationship between a view-tuple and an underlying object. I'm not sure much else would need to change.

I have done this sort of thing, abstracting the type using views, but one cannot treat a view as a table, so the foreign-key problem remains. -- RichardHenderson

But OO philosophy tends to divide things differently. If you bend a paradigm enough to fit another paradigm, you might as well go with that other one. If you want a vehicle that is 50 percent SUV and 50 road car, but there is no such thing, then you are faced with getting an SUV, a car, or both. But if you wanted something that was 90 SUV and 10 percent car, then you might as well get the SUV rather than buy both just to get that 10 percent. IOW, why use objects if you only or mostly use them in a relational "shape"? I suppose we should work with an example. How about CampusExample.

I'm not sure OO is that much different beyond inheritance, once the functional stuff is stripped away. I'll wait for you to complete the example so we can argue over something solid. -- RichardHenderson

The campus example is all relational at this point. How would you OO-ify it?

In many dynamic OOP languages, you can add attributes or even methods willy nilly. There is no "master class" to match to a table schema. I realize that your development style might not use such dynamism, but can we just dismiss dynamic OO developers? It would probably degenerate into a dynamic-vs-static typing HolyWar. Dynamic can cover both better than static.

Perhaps basic typing can be added by allowing a "class" attribute that references a kind of OO DataDictionary that describes the class's attribute types. But things get complicated if you allow user-defined types. It risks run-away tree or DAG type subtitution checks. Going all out risks turning this thing into a programming language, almost like a SmallTalk engine.


Relational tables and object "shape" are almost identical. With the exception of inheritance and forward-referencing "keys" also known as the evil pointer.

What I find amazing is that everyone runs around and argues about how "low-level" and "evil" pointers are compared to "keys". But, what's a key? Isn't a key, composite or otherwise, a pointer? What it ultimately comes down to is that some folks here are saying pointers have no intrinsic value. As Costin frequently says, 0xaaabbbccc is less descriptive than "Panici, Jeff" or whatever you might be using. True, it is ... so what? Not all keys must store meaningful data in and of themselves. It's enough to know that it's a "key" that opens a door somewhere.

Anyway, back to my original train of thought. Say we have this class shape:

public class DriversLicense? {

string Number;
string FirstName?;
string LastName?;
string Address;
string City;
string State;
string PostalCode?;
}

This class "shape" describes a table schema. Now, what does an RDBMS do, that the majority of object-oriented programming languages don't currently do? It creates an extent that matches the shape defined by this schema. An extent is a collection of the same type. Unless we're using an OODBMS bound language, we don't get this kind of behaviour out of an object-oriented heap. Each instance of DriversLicense? is allocated where there's a free block of memory and a global object-pointer (evil) table is updated. This table is only used for reference management and eventual garbage collection. But, what if we had an Extent object instance that transparently "collected" all of the instances of DriversLicense?? We'd have a "table". Now, if we have enough of these Extent objects in memory, we can perform ... relational algebra on them and get derived sets (tables). This exact functionality exists in Poet FastObjects?. Now, the utility of such arbitrary sets is questionable in an object-oriented language. Usually, we want to find some root objects and we're not, per se, interested in the results of the intersection itself. Thus, the query syntax should be (and is) slightly different from that of standard SQL; hence, Poet FastObjects? supports ODMG's OQL. In addition, we have a concept of a "root object" extent that acts like a dictionary. This allows object-oriented programmers to create "shortcuts" to the parts of an object graph they're most interested in.

One can treat objects-in-extents as relational tables. In fact, Poet FastObjects? provides an SQL ODBC driver that does exactly this. GemStone/S doesn't have any high-level query capabilities built in (it does have extended block syntax, but that's still at the Smalltalk level), but it wouldn't be impossible to write such wrappers on top of it.

Now, here's the rub to this whole argument: Is it appropriate to use an OODBMS as a "shared databank"? What I've tried to articulate in the past is that the answer is probably no. For some small projects, it might be desirable to use only an OODBMS to store data. Then it takes on the role of both a persistent object store and a "shared databank". However, I'm of the mind that both products serve an architectural role. The OODBMS is useful as a transactional object persistence engine. Almost all OODBMS products support ACID transactions but there are variations in the level of query support. Poet FastObjects? is an example of a product that has some nice, high-level query capabilities - if you want them. GemStone/S is geared more toward a centralized object instance repository. An RDBMS would be used as the "shared databank": reports would be generated from here, analysis would be done from this data, etc. Most of the major OODBMS products have data transfer systems that will populate a relational schema.

While I personally believe more time and energy should be put into advancing OODBMS products, I'm in the minority. Oh well, I know I'll get harpooned for this one: There already is a MultiParadigmDatabase - it's called an OODBMS. -- JeffPanici

The only time I've ever seen GemStone act as a MultiParadigmDatabase was when many man-years of effort was applied to building a large nearly-relational framework on top of the core engine. Otherwise, it's really a DBMS toolkit. A wonderful toolkit, but not for everyone. -- StuCharlton

The problem of keys is that in an rdB foreign keys can only reference a single table (type). My abstraction of views fixes this. Keys are abstracted from the physical address, much as IP addresses are abstracted from MAC addresses, therefore a reference will be valid for as long as the referenced object exists, wherever it exists. I'll have a look at Poet, thanks :). -- RH Issue perhaps belongs in ForeignKeysCanOnlyReferenceOneTable.

The given example is a struc, not an object. The issue gets more interesting with a real OO design:

 class DriversLicense {
string Number;
Driver driver;
Date expires;
MimeItem picture;
DriversLicense priorLicense;
 }

class Driver { string FirstName; string LastName; Date dateOfBirth; Address address; DriversLicense currentLicense; Vehicles vehiclesRegistered; // persistent collection of Vehicle }

class Drivers {...} // persistent collection of Driver

class Address { string numStreet; City city; State state; PostalCode postalCode; Drivers driversAtAddress; Vehicles vehiclesAtAddress; } etc...
-- MarcThibault


Re: "There already is a MultiParadigmDatabase - it's called an OODBMS"

Being an OODBMS alone does not make it automatically relational, so how could this be? Even if some are relational, some suggest that they are tuned for OO and not relational as far as performance. I agree that each one could emulate the other, but it is hard to optimize performance for both. The biggest difference between RDBMS and OODBMS is that OO does not have to follow certain rules that relational does. If data is created when the relational rules are switched off, there may not be anything usable as relational when the rules are switched back on for a relational viewpoint. Further, in this system we have to bust pure OO-style encapsulation. It is an attempt to find the best compromise. Every multi-paradigm solution is probably going to have compromises.


Language-Space

What I am looking for here is a "database" that can (at least) share data with multiple languages and/or applications and allow for relational queries. For the sake of argument, lets assume that objects and instances of do match the "shape" of tables (I may not agree, but will leave that issue to another section). Some of the tools mentioned above appear to be an attempt to manage the data inside "application space" or "language space". For example, it may be trying to use Java's objects to directly manage and query the data.

For most applications I deal with, sharing data with other languages or apps is important, even if not an immediate requirement. It seems to me that a language-space solution would hinder this, no? Are you assuming that every app in a shop will be written in the same language to increase sharing and querying across apps? If so, that assumption makes me itchy.

It also seems that the very spirit of "multi-paradigm" is to divorce data from particular application languages. Perhaps this very idea is counter to the OO philosophy (in some ideologies) of encapsulating data behind behavior. (SeparationOfDataAndCode)

Further, if the schema is in the multi-language database, isn't echoing it in the language a form of OnceAndOnlyOnce violation?


I think a shop's philosophy is going to be geared toward either relational or the NavigationalDatabase "web of dictionaries". It makes little sense to have the same data be both an object and a relational record. Perhaps some types of data are best as objects and others are best as table rows, but there is little agreement about how to assign what to what. The philosophical underpinnings to solve this division are still lacking.


Such a feature could be added to existing RDBMS by having an optional "allow dynamic columns" setting for a given table. It may not make sense to add it to all tables since dynamic columns are probably slower than static columns in a similar way that statically-typed languages run quicker than dynamic languages for the most part. My suggestion is sort of the "dynamic vision of relational". IOW, the SmallTalk of relational, in a rough sense. -- top

http://www.geocities.com/tablizer/dynrelat.htm


Support for ObjectCapabilityModel Methodology

On approach is one big (conceptual) table with at least the following attributes:

  Table: objects
  ---------------
  objectID// auto-gen integer
  entity  // optional table name
  parentID// to implement object inheritance (see note)
  ...

A false start IMHO. This is a relational model of a possible implementation of an OO language; whereas what we want is a way of integrating support for relational and object-based modelling in the same language. Note that well-designed OO languages generally prohibit queries on "all objects", requiring that an object can only be accessed if a reference to it is held (certainly this is true of ObjectCapabilityLanguages).

Well, this might be a fundamental philosophical difference between OO thinking and database-centric thinking.

I don't think it's a fundamental difference. At least, it is not a difference between ObjectFunctional thinking and relational (as perhaps opposed to database-centric) thinking. The pure RelationalModel is after all a mathematical model that ensures ReferentialTransparency, which makes it almost automatically capability-secure. To the extent that impure versions of relational programming introduce insecurities, it is because they diverge from the RelationalModel.

There might be a way to bolt or wrap such restrictions into a system like this. The approached described here focuses more on the ability to represent "things" from diverse paradigms and philosophies. Thus, it has to be "wide" instead of restrictive. Different paradigms might need different restriction rules.

In a multi-language or multi-paradigm shop, perhaps capability security techniques are not the right tool.

I couldn't disagree more. Consider languages like OzLanguage, which is almost capability-secure, and demonstrates that there's no significant conflict between multi-paradigm and capabilities. It may not be a priori obvious that different paradigms actually need the same restriction techniques, but in fact this seems to work almost unreasonably well.

Tying a database to the internal pointers of a specific language has always been a tricky matter.

"Tying a database to the internal pointers of a specific language" has tended to be done for existing languages that were designed without support for relational programming as a goal. We can hope for better in new languages (or language variants) where relational programming was taken into account when designing the DataModel/InfoSet and libraries.

A "generic" or "multiparadigm" database cannot assume use with only fancy/new languages by definition.

That's a strange definition; the term MultiParadigmDatabase doesn't imply any assumptions about whether it can or cannot be used from existing languages, AFAICS. It's very plausible that a MultiParadigmDatabase would be significantly easier to use from a MultiParadigmProgrammingLanguage, and that does not include most "popular" languages.

In any case, the objectID-as-integer approach doesn't solve any of the trickiness involved with "tying a database to the internal pointers of a specific language". What advantage would be gained by having object IDs be integers rather than opaque values? Something like the SemanticBinaryModel seems to be a more promising approach.

Reason: To help coordinate information outside the system with information in the system.


If we remove "entities" as a requirement, then perhaps this can no longer be called "relational". Maybe call it a "predicate database" (PredicateDispatching) if there is a terminology dispute.

Relational databases (and the relational model) are NOT (contrary to apparrently common misconception) named after EntityRelationshipModelling? or because they "store" "entities" and "relationships" or even because a JOIN operation "relates" two tables together. Relational refers to the fact that all the data is (logically) represented as _relations_ (see RdbRelation). For that matter, there is no requirement that all the facts about a conceptual entity be gathered into the same table/relation - that's just a useful pattern to keep related stuff together (often, though certainly not always, we deal with a whole bunch of facts about the same entity together - getting the user to input them, reporting on them, etc.). Having said that, perhaps the term "predicate database" would help remind us that DatabaseIsRepresenterOfFacts, but there is no reason to ditch Codd's preferred term just because we ditch E/R modelling, say.

Somebody suggested that something that is a single big table is hardly about "tables" anymore. I wanted to avoid the issue by suggesting a different name to avoid a battle over what DrCodd "really" meant. Perhaps I wimped out too early. Plus, "Farfegnugen" was already taken. The goal is flexibility, not necessarily conforming to "relational".


Indexing

I imagine that indexing of a full MultiParadigmDatabase would be quite interesting. Obviously one could index on every column, if desired, but when there are N columns in use, there are 2^N possible queries on identity alone, and considerably more when working with less-than and greater-than relationships, and more still when dealing with queries on arbitrary patterns. I imagine that, in practice, the super-large MultiParadigmDatabase is out of reach until it can build its own set of indexes based on both profiling of queries and manipulations to it, and upon initial estimates of this profile performed by the programmers. (See also AdaptiveCollection).

Indexing every column? If one really wants to do that, then perhaps a column-wise implementation should be done. Of course, this would slow row-centric queries.

I think you've misunderstood my intent. When I say that one indexes 'on' a column or 'by' a column, I mean that the rows are the things indexed, but they are indexed such that you can find them -by- the attributes specified in the columns they're indexed on. E.g. if you index on the name, then you can find all rows that have the same 'name' field very rapidly.

I didn't mean to imply indexing the columns within the rows. That isn't nearly so useful... or at least I can't imagine as much use from it. If the set of columns is wide enough, I'd certainly expect values in each row to possess some sort of indexing or sorted-order to avoid a linear search, but I doubt that will ever be much an issue. The greater savings would come simply from using some sort of fast identifier for column-name (like a small number identifying an interned string) and performing simple compares on that.

{By default, it should probably be cluster-indexed on rowID with the convention assumption that most non-aggregate cross-references will be on rowID. There are at least two approaches to other indexes: No attribute name (column) will be indexed unless explicitly requested (other than rowID); or every attribute name will be indexed unless explicitly excluded. These choices would be a DBA decision. -t}

{Whether it's actually stored row-wize or column-wize under the hood should be a configuration choice that shouldn't affect the interface (queries used). The idea is that one could change it without having to change existing queries. The NoSql crowd is making the mistake of tying query language to implementation choice; which is poor future-proofing; even if it does bring short-term benefits by closely tailoring query language to implementation. But MultiParadigmDatabase places flexibility over out-of-the-box speed/performance in terms of priorities.}


(Moved from FearOfAddingTables)

For ultimate flexibility, may I suggest the MultiParadigmDatabase, which is essentially one big dynamic-row GodTable. (If you want entities, you add an Entity attribute to a "row".)

There's something to be said for dumping all the data into one, massive DataSpace upon which arbitrary manipulations and self-joins can be performed. Any Relational database can be represented this way by designating one 'column' the 'table-name' column, and simply dropping every row of every table into the single GodTable. Any Logic-programming predicate-database can be represented this way, too, including all memoizations. Any set of objects can be represented this way, complete with descriptions of their behaviors. The MultiParadigmDatabase certainly makes representation very easy.

However, there are costs... especially regarding the semantics of these representations. It becomes quite difficult to understand: 'what means this row'? There is ambiguity in representation, largely because the set of columns can overlap BOTH intentionally AND unintentionally (because a single column-name can possess many different semantics based upon in which 'row' it is participating). (If one could guarantee the uses of the columns are disjoint, one sort of loses the 'advantages' of having a MultiParadigmDatabase in the first place... one could simply break it back up into a regular Relational database.) The semantic cost becomes reflected in query results and (therefore) consistency checks; that 'c' property is quite difficult to maintain if there is no explicit semantics for DataSpace. A row might represent a single predicate truth, or a whole bunch of predicate truths. There can be two different rows, one saying: {State: California, Anthem: "I love you, California"} and another saying: {State: California, Anthem: "I love you, California", Gemstone: Benitoite}. What mean these rows? One will need to include the semantics, somehow, directly in the row ({Predicate: PlaceCreated?, State: California, Anthem: "I love you, California"}, {Predicate: FactsAboutState?, State: California, Anthem: "I love you, California", Gemstone: Benitoite}. At this point, one might ask: 'what means "Predicate"?'. It invites a sort of arbitrary regression on these questions, at least until some sort of artificial constraints are enforced upon the table structure. It doesn't seem right that it need be artificial. Relational, for example, has non-artificial places to attach semantics: table-name and column-name.

Besides, in addition to being a fan of flexibility, I'm a huge fan of correctness(-proofs). I want my databases (and my queries, and their results) to all be strongly, statically typed. That 'C' property is quite important to me. While I'm quite willing to consider a well optimized MultiParadigmDatabase DBMS for representation purposes (including storage, management, concurrency, AcID transactions, etc.), I'd probably reject it as the overlying database 'concept'. I.e. I'd use it with an overlying 'wrapper'.

(Perhaps move the above to MultiParadigmDatabase and leave a reference?)

Strong typing can be optionally applied to columns, as described above.

Don't just slap on options without first exploring their consequences. For example, you can't have strongly typed columns AND have an open or unbounded number of columns (open-ended attributes (columns)). You'll need to close it - to have exactly one arbiter who determines who gets which column-name - and, thus, who gets to add new entities (since they'll not often be able to share strongly-typed columns). You seem so excited by all the hand-waving happening in the above discussion that you're doing it yourself.

I don't see a problem with it. Before accusing me of hand-waving and pants wetting, please demonstrate at least one failure scenario. Once a type/constraint is put on a given column, it must follow that constraint. It does not mean that the other columns also have to have that constraint. And the constraint may optionally still allow empty/null/non-existence if need be. The constraint may say "if there is a value for column X, then it must conform to POSITIVE-INTEGER". This would still allow an empty row.

Take some time to understand how each and every possible 'option' interacts with each and every other purported feature, for you'll very often be paying for a new feature at a cost to the others.

There indeed may be trade-offs, but the idea is to allow such trade-offs. This is why it is flexible. The trade-offs are probably inherent to any data management system.

The use of constraints goes part of the way towards strong typing, at least if the constraints language is sufficiently powerful. One then needs well-typed operators if these types are to have any real meaning.

The query language operators would only need to address 3 basic types: text, number, and date/time to fit usual query conventions. (Date/time perhaps can be omitted as a formal type). I am not a fan of operator overloading, I would note. If you are thinking of user-defined types, that is not the goal of MPD, and is not available in most existing RDBMS anyhow. It is not a feature I'd miss and conflicts with the goal of being app language neutral.

I would like to explore specific "difficulty" scenarios. If you are looking for a complex type management system with DAG-based lookup and substitution and that is required for you before you qualify it as "multiparadigm", then I suppose it may fail. Nothing will make everybody happy. The above suggestion leans toward a "lite" typing viewpoint, which perhaps could be seen as a bias. But my experience is that type-lite tends to allow more inter-language usuage. Text atoms are more sharable than binary atoms, and this is part of the reason why HTML and other web standards have taken off. Type-heavy tends to bind/limit one to specific languages, or becomes a language itself; and reinventing an app language is generally beyond the goal of a database.

I'm a firm believer in the idea that DBMS features ought to just be pulled into generic programming language runtimes: transparent optional persistence, multi-cell transactions, ACI(D?) transaction semantics, translucent distribution and replication, etc. If one wishes an explicit data-service used by a great many processes, it can be provided as a service written in the language atop these built-in features. If other languages wish to interface with your DBMS, they can do so much as they do today: via communications ports in a common text-language through ForeignFunctionInterfaces with the OperatingSystem, or perhaps via ForeignFunctionInterfaces in a library. You cannot truly escape such language issues. Your avoidance of specific languages in the implementation of DBMS and DBs is probably wise in the short-term, but long-term the world would be better off with DBMS support simply being standard, allowing for databases of true complex data and objects.

Anyhow, when you said you described as an example type: "a pattern representing a positive integer", my first thought was: "okay, so this person means that any text-pattern is a valid type. This means I could write an XML-schema as a text-pattern for column X, and write a C++ grammar for column Y, and (etc.)". If you were thinking this, it'd at least be halfway compatible with true semantic types (because you could guarantee that your decodec/parser always will run without errors - the 'only' (rather large) problem from there would be writing any sort of query that leverages the value-data within these fields ("find me all rows that have column Y specify function just_do_it(int,int,double)").). However, obviously you weren't thinking this.

As for difficult scenarios: back to my favorite. Knowledge Representation in a Learning system, with an ontology (vocabulary, set of words or meanings or 'entity-classes') that increases during runtime.

(replies below)


Re: I discussed several of its nice, flexible features, but added that I prefer strongly typed systems. To this, you essentially replied: "strong typing can be optionally applied". Sure. Maybe so. But if I do so, what happened to my flexibility? what happens to the features that led me to even somewhat consider MultiParadigmDatabase in the first place?... One should NOT answer or even imply that I can simply add strong types via constraints and somehow keep the properties that make MultiParadigmDatabase tempting in the first place

"Types" are generally orthogonal to paradigms. In my opinion "types" and flexibility are at odds anyhow. Set theory is a more powerful classification and better fits real changes than DAG's in my observation. A flexible classification system would thus use sets, not types. "Flexible types" is like a "strait jacket that provides freedom of movement". (Some define "types" via sets, in which case we may just be talking about different ways to say or present the same thing.)

Now, as to whether I want the impossible? of course I do.

I am selling vacuum cleaners, not genies.

As to whether it is "hard to beat incremental and optional tightening of conditions/types/constraints": I disagree. Incremental loosening or tightening of constraints is a bad idea to start with due to problems of coupling. Supposing you already have a user-base, any tightening of constraints will break applications... and you'll be fighting the data already extant within the database. Similarly, any loosening of constraints can allow addition of data that violates assumptions made by those applications that access your database, which can also break applications.

As a project gets further along in testing and construction, one can crank up the constraints. For example, when the project passes the first test, add the constraints and then deal with the constraint issues that may come up. Another possible feature is constraint monitor. This would simply log problems (deviations from expected) rather than trigger formal errors. One can then investigate the problems without shutting down a running system before making the constraints formal.

but long-term the world would be better off with DBMS support simply being standard, allowing for databases of true complex data and objects.

"Complex data" meaning DAG-based classification systems? I'll pass. Shag the DAG.

As for difficult scenarios: back to my favorite. Knowledge Representation in a Learning system, with an ontology (vocabulary, set of words or meanings or 'entity-classes') that increases during runtime.

You mean BeliefDatabaseExample-like gizmos? I'm sure there's a set-oriented way to do it if there is a decent DAG way. Sets are a superset of DAGs.

And TuringEquivalency seems to be your version of 'Abracadabra'... I can almost see you waving your hands again as you type the words; sure, one can make do with some form of TuringTarpitDataManipulationAndQueryLanguage if one really absolutely needed to do so, but that doesn't mean the domain itself doesn't "call for" something higher level. One ought to be able to directly express what is desired from a query, and "making it work" also includes "make it correct" and "make it fast" - things that become remarkably difficult to do while wallowing in a TuringTarpit. Keep in mind that one TuringTarpit can drown a lot of ModernDinosaurs.

You have not identified any inherant tarpit. If you can make your own custom knowledge-base system, then one could in-fairness make a custom relational version to compete. You are comparing roll-your-own to off-the-shelf, which is not a fair comparison. If you roll, the other side can also roll.

I don't need to identify an inherent tarpit. I need only tell you to shut up about 'TuringEquivalency' because it doesn't prove anything at all - not when you can create a DataManipulationAndQueryLanguage based on something like BrainfuckLanguage and still have your silly excuse for 'TuringEquivalency'. This is a truth: Turing equivalent or not, a good query for a domain language must both be expressive over the entities in that domain and allow for optimizations (preferably at a high level). And I don't mind if both sides 'roll'; I wasn't aware that there were 'sides' in this thing at all. I'm only saying that, to 'roll', you will need types to support operations over complex values. Why? because the domain requires representation of complex values that are not, by themselves, meaningful 'data'.

Bullshit! It only "requires" it because you are used to thinking about it in that way. You are mistaking your personal mental model for reality. When you make your "types" flexible, query-able, and meta-able, you will have invented a database without even realizing it. Interpreters and compilers are databases of sorts, its just that language-centric thinking tends to hide them away, forcing the reader to think in syntax instead of data structures. "Complex types" are turned into nothing but look-up tables, trees/dags, and ID numbers. I would rather approach such "database building" knowing that we are building a database instead of backing into it accidentally while on thinking in terms of syntax. When you know where you are going, its easier to plan. (Another advantage of a DB is that the presentation is controllable. I'm not stuck with your ugly Erlang conventions or what-not because I can sift, sort, and present the info myself any damned way I please. Now that is high abstraction in my book. I can Frank my Sinatra.)

Values are not data unless they represent a proposition. You don't have information simply by possessing a value, therefore you cannot "sift, sort, or present the info" to anyone. You cannot, no matter how you work with it, gain truths from a collection of values that never represented propositional truth in the first place. This seems to be a fundamental gap in your own understanding of computation. Go fix it.

Any digital informational construct you can envision can be (re) represented as a data structure. Thus, you are technically not correct. Any differences will be about machine efficiency and/or human relatability.

Your logic is flawed - non sequitur from statement to conclusion. It is true that digital information constructs are represented as data-structures. However, that does not imply that every data-structure is a digital information construct. The information constraint is the relevant one, here, not efficiency and human comprehension.

You lost me. Please restate. Constraints can be put on data structures also.

Not all values are information... 'data' in the true sense of the word. Example: Just having the value [7,23,42,108] doesn't tell you anything about anyone, anyplace, anywhere, anywhen. It is not a datum, and it is not information. Not, at least, by itself. To be data, this value needs to represent a proposition... e.g. "the numbers on the bus routes that drive by my apartment" or "auspicious numbers in various cultural media". This is fundamental. Which proposition the value represents might be implicit in the context in which you find the value (e.g. you ask which bus routes stop near my apartment, I give [7,23,42,108] as answer, implicitly representing "bus-routes 7, 23, 42, and 108 have stops near my apartment"). Now, DatabaseIsRepresenterOfFacts - they are intended to represent propositions (sentences that are true). You can force a database to represent complex values by breaking it down: parts of value X are Y and Z, parts of value Y are A and B, etc.. However, X is not information. Y and Z are not information. A and B are not information. None of them have any meaning... absolutely none at all. Thus, no matter how you use the Database queries on a value, no matter how you sift or sort or join or decompose, you'll never, never, never learn anything new. No information was ever there to begin with.

That there is context that gives something meaning is understood, I thought. In MPDB it naturally has a column name and the record that it is part of as the minimum context, and of course one adds domain-specific contexts/relationships via references (keys).

Values, by themselves, don't inherently possess any extrinsic context... which is why they don't provide information. In a database, values are given meaning only by their placement within a proposition. One can represent complex values in a database, but this never serves any purpose beyond mere representation (since values by themselves do not have meaning), and doing so comes at a cost of artificial complexity, a reduction in expressiveness, and inefficiency. You know you've gone wrong (or just ran out of options due to a sucky DBMS) when you have any table that essentially is representing 'is a value', or if you have tuples whose whole existence is dedicated to: 'is part of this other value'. Going for support of complex types is truly the 'simple as possible but no simpler' approach. Pay once up front or pay forever out the back.

Imagine the horrors you'd deal with if a database could represent only integers, and representing a string of 50 characters required that you add 50 tuples to a table, and getting a table of names required deep recursive queries of arbitrary depth, and simply querying an entity by string identifier required deep query traversals first on the strings-table. That's essentially the hell you send people to every time you damn complex value types. Stop doing it!

One could likely put "view wrappers" around such a highly "atomized" DB if such did exist such that they are not usually dealling with stuff at such a low level. Going to the sub-value level like that is an extreme case. As far as performance, you are right that a fully atomized DB would probably unpleasent. In practice there is usually a compromize between fully atomized and fully hard-grouped. MPDB is more atomized than a RDBMS, but one could get even more atomized, such as a "single-value graph" (a single value with multiple potential pointers), but still be at the value level. In general though, the more flexibility or varied requirements (less "structured") you need, the lower you go on the atomic level.

In an ideal database, you'd never have a tuple dedicated to describing a sub-value. Those are pointless on a fundamental, philosophical level. Beyond that, the reasons include: (1) you can't learn much at all from a table that essentially carries as data of the sort 'this-is-part-of-a-value-used-elsewhere-in-database', (2) what little you can learn is the sort of security-violating data-mining that should be avoided anyway.

However, I understand the concern that complex values allow for storage of complex data within the value... essentially creating a navigational database. Of course, who are you to be arguing that people shouldn't have the option because they might abuse it? That runs contrary to the philosophies you've described at earlier points in this discussion. What you might wish to do, however, is encourage some extreme normalization forms - essentially: exactly one fact per row per table. That is, you'd reject a user-entity table that had user-id AND name AND ssn AND birthday; instead, for this same information, you'd have an autonum 'IS-USER', with separate tables for 'USER-NAME', 'USER-SSN', 'USER-BDAY' - a whole four tables for the same information. Then you'd let the DBMS optimize the schema. E.g. the DBMS is perfectly capable of making the decision to transparently 'join' the IS-USER, USER-SSN, USER-NAME, and USER-BDAY tables for space and time-efficiency purposes. With this philosophy, you'd be able to more easily spot places that can be further normalized... e.g. if someone used a collection-value to indicate a relationship to each thing in that collection, you could say: "hell, no!" and point them at the extreme-normalization form and remind them that the DBMS can optimize.

Of course, you'd also need to make sure that the DBMS can optimize. You could even offer it 'suggestions' on how to do so if you don't feel comfortable with fully automated optimizations.

Most DB's tend to force a decision between row-wise versus column-wise grouping of items. I wonder about DB's where it can be both or neither. Rows (and classes) are a human-friendly construct, but maybe not necessary for the problem domain. After all, the human brain doesn't use rows. Then again, it is highly fallable. Maybe there is an inherent trade-off between "clean" and "flexible" and the row-vs-column is a reflection of this trade-off. It may just be naturally hard to do "theorem proving" on an organic system, because organics don't care much if they commit a violation of things like OnceAndOnlyOnce, cleaning it up incrementally and gradually, if at all.

The human brain represents data within computational structure directly; it doesn't at all separate 'data' from 'process'. Further, human-brains are very much not human-friendly; brain surgeons have difficulty looking at that grey matter and understanding what's in there, and even we (the users of the brain) cannot recall or process (or otherwise remember) data stored within it without a trigger. The information in our brains is almost impossible to port to other processes (i.e. to other brains), and completely fails when it comes to concurrent usage or ACID semantics in an enterprise environment. I wouldn't want to store my banking information solely in a brain... not even my own. I suggest we avoid "the brain" as a model for data storage, or just reference it in terms of "what not to do" if we wish for portable, durable, and human-friendly access to data.

Nature, after all, had to make do with what it could construct by chance alone, and took millions of years (and trillions of failures) to get something more complex than a flatworm. :)

Organic flexibility comes from emergent properties of millions of tiny things, each of which can be super-complex (e.g. possessing 3.2 GB worth of DNA and often being specialized to use just part of that). We can do the same thing, but we seriously need to find a more efficient approach.

Anyhow, back to your point: The extreme normalization form I mentioned is extremely flexible... allowing you to always tag more information to any particular entity (bigger 'row'). Further, it is extremely clean logically, it corresponds exactly to predicates as you'd find in (say) Prolog relations, missing only the infinite sets that can be specified in form of predicates. Instead of trading off between "clean" and "flexible", this normalization form buys both and pays in "efficiency". Fortunately, we know of ways to buy efficiency back - a process called "optimization". Optimization costs energy and time, but that's also what you gain back after the optimization... making it a fine investment.

As stated above, I am skeptical without seeing actual examples. But it should be noted that a MPDBMS does not prevent such a construct. That's why it may make a good experimental tool.

I didn't say MPDBMS prevents a construction of this normalization form... though I shall suggest that it DOES make considerably more difficult the optimization (performing automatic joins and such) due to the difficulty in describing relations. If you want the best place for examples regarding the 'clean-ness' and 'flexibility' (and weaker 'speed') of the purist approach, I'd suggest playing with logic languages like prolog for a bit - you can skip the computed predicates and stick with table-based predicates if you wish a direct correspondence with this extreme form of normalized relational. I'm not sure it's something you could appreciate given a static view of the normalization form (you'd be focusing on all sorts of optimizations by joining tables, which I suggest ought be performed by the DBMS... possibly with suggestions from an expert).

One thing I do like about extreme normalization: never a need for a NULL.... only thing that comes close is need to represent an 'empty' value (the fact that nothing is there).

[Am I correct in inferring that the "extreme normalization" you're referring to is the sixth normal form described in (for example) http://www.dcs.warwick.ac.uk/~hugh/TTM/Missing-info-without-nulls.pdf ?]

I vaguely recall reading that, or something like it, a long, long time ago. Anyhow, that seems similar to the extreme normalization, though it seems to be a bit of an earlier paper on the subject. To fully match extreme normalization, one would need to acknowledge that even 'CALLED:(ID,NAME)' might someday have NULLS in it, and break off the user entirely (to 'IS_EMPLOYEE:(ID)', 'CALLED:(ID,NAME)'). Horizontal decomposition might also be rejected in favor of variant-typed data (e.g. changing from JOBID: String to JOBID: 'Maybe String', with Haskell-style 'Just String | Nothing' (where 'Nothing' represents 'no job). However, that could go either way; the horizontal decomposition isn't really part of the extreme normalization I was describing, but isn't contradictory to it. It was used by Chris Date in the first place to solve the problem of dealing with multiple types in a single column... perhaps he was trying to solve too much at once with 6th Normal Form.

Not having a row for one of those is not much different than not having a given item in the row-map of MPDBMS. And it avoids repeating the ID over and over in the database. Per stated rules, these are equivalent:

  <row id=1234 name="Bob" jobID=[null]>
And

  <re id=1234 name="Bob">
I understand the use of NULL in MPDBMS well enough; it was well described near the very top of this page. However, I am failing to identify whatever point it is you're attempting to drive home by repeating it down here. Are you trying to say that there are no NULLs in MPDBMS? That can't be right - it looks to me like MPDBMS has [null] for an unbounded or infinite number of columns. Are you attempting to say that MPDBMS can represent entities in their un-normalized form? Why would you think that is insightful? Even relational can do that! Indeed, even Date started with an example that basically uses one big row-map - the very same thing MPDBMS is using.

I just wanted to make sure it was understood to readers that the linked approach to avoiding explicit nulls is not the only approach.

One can always go about adding more columns to a big row-map. But I can't help but feel that doing this for purpose of 'avoid repeating the ID in the database' is very much a premature optimization (AND is one that could easily be performed 'under the hood' by a good DBMS). Anyhow, for lesser normalization forms, the reason to normalize is prevent duplication of facts, which is relevant for data maintenance issues. The more extreme forms are advocated for other reasons: to avoid NULLs, or for purity in adhering to the idea that 'DatabaseIsRepresenterOfFacts'. Failing these extreme forms detracts is only in the purity of the data representation and introducing many meaningless NULLs. This isn't a computation problem - you can use the one-big-row-map representation to derive all the same facts as you could from the extreme normalization or Date's 6th normal form. It just isn't as 'clean'.

Anyhow, I think the context for this discussion on extreme normalization has been lost. If you recall, I suggested use of an extreme normalization form is to help discourage the use of complex-value-types to represent multiple facts with a single value. It mostly helps keep the schema creators focused: 'one fact per row' 'one fact per row' 'one fact per row' - that mantra makes it easy to justify splitting up a [collection,of,values] across multiple rows if it was representing multiple facts (as opposed to exactly one fact that needed a value collection). Further, it helps do so despite the temptation to embrace the exact sort of PrematureOptimization you very recently advocated: why duplicate the ID a bunch? why not just use a [collection,of,values]? This is a very human issue, one of socially engineering the acceptance of purity above that of premature optimization. Let the DBMS optimize is the paired mantra, which must be supported in truth (via good optimizing DBMSs, to which you can make all the same suggestions you were tempted to inflict by hand upon the 'pure' schema). I suggest you'd need to teach both mantras on the same priniciple we currently teach normalization forms.

It is not premature optimization if there is no identifiable reason to denormalize into slim tables. It may simplify some queries at the expense of others. For example, queries that use multiple attributes of a single (normal) row are longer and probably slower under the slim-table approach. Ex: WHERE hireDate > 01-apr-2005 and salary < 50000. The slim-table approach probably requires a join or "in".

    QUERY(ID,NAME,JOB,SALARY) :- IS_EMPLOYEE(ID),ENTITY_NAME(ID,NAME),EMPLOYEE_JOB(ID,JOB),EMPLOYEE_SALARY(ID,SALARY)
As an aside to TopMind: what "not that different" actually means is that "there's a small difference that might or might not be extremely significant". Exempla gratia: A parachute pack with a rip cord is "not that different" from a parachute pack without a rip cord... weight, size, area of fabric, ability to support your fall, etc. a lot of things are exactly the same. Both are even usable - it's possible to open a parachute without the rip cord. However, that rip cord makes the parachute a damn bit more usable - it reduces the average time it takes to open a parachute from several seconds to under one second, which improves safety a great deal. I bring this up because you recently used this "not that different" as part of your rhetoric, and (to my own vague recollection) you've done so quite often in the past. Doing so implies that you're failing to actually recognize the differences or comprehend their significance (or lack thereof). When you follow it by repeating yourself (rather than asking for clarification), it implies you aren't even interested in understanding or comprehension. Here is a better argument form: "idea <X> and idea <Y> are only different on point <Z>, which is (not?) significant because <... your reason here ...>". Please consider using it. (DeleteWhenRead)

It sounds like you want others to pre-argue your cases for you. That's asking too much. If I identify a "problem" with the difference, I will mention it.

Hahahaha! I've been arguing my cases all along. That would be why my posts are often 4-5 times the size of yours. Now I'm demanding the same of you. That isn't asking too much. If your reasoning is so damn awful that stating it is like others pre-arguing their cases against you, then perhaps you ought to work on your reasoning.

Your posts are longer because you meander and rely too much on English when examples or code samples may be more useful. It is hard to tell whether you are ansering the question directly or trying to plug "in case" holes. The relavancy of each paragrph to the immediate question is not clear to me. Another technique is to restate how you interpreted the other person's statement and then address that. That way you don't have to cover alternative interpretations without actually needing them.

I'll believe what you just said if I stop having to put (NEEDS PROOF) in the middle of half your claims and you're still writing a fifth as much as I do (and still making claims).

You mean like: "There is an objective definition of 'classification' (NEEDS PROOF)".

(moved discussion to MostHolyWarsTiedToPsychology)


Type-Heavy Must Be Given Up

I am skeptical that a "type heavy" database can be made sufficiently language and paradigm neutral to be called a "multi-paradigm database". Type-heaviness probably has to be sacrificed. The success of web protocols has proved the cross-platform value of the PowerOfPlainText. Personally, I like light-typing, so wouldn't miss it, but realize that other people or specialties may prefer heavier typing. But I welcome demonstrations of attempts at a type-heavy MPDB.

To the contrary, such things as XML and HTML and javascript (structured text, not plain text) are what have proven to have cross-platform value, and these things are type-heavy within their small domain - they require strict structuring, possess schema and standards, etc.. A great deal of their value comes from the ability to automatically verify that they are correctly formed and meaningful, and to simply 'know' that they're meaningful to others across domains and platforms due to adherence to a standard structure. Those and the vast majority of web protocols gain success from their strict definition and standardization - features that allow programmers across domains and platforms to support the same protocols and guarantee they are implemented correctly. These things are natural to strict typing. Some protocols wrap other protocols, much like abstract types wrap other types (i.e. other standardized protocols or plain-text) as required - a flexibility that has further expanded the success of web protocols but that is no less 'typed' for doing so. Now you state that "light typing", whatever that means, is to your preference. However, you've offered nothing at all to support your thesis that type-heaviness must be sacrificed. Burden of proof for your thesis is very clearly on you, top. And if you truly "welcome" demonstrations of type-heavy database structures, you should bother doing research and self-education on existing type-heavy database designs.

XML is "typed"? I am not sure I agree with that. Nor do I agree its (limited) success is because of "strict definition and standardization". People study the vendor's actual output and write extractors that fit it for the most part. They are often not concerned with heavy schema validation (which is not necessarily "typing" anyhow). Similar constraint checkers could be put on top of the kind of MPDB mentioned here anyhow, as mentioned earlier. If you wish to call that "type checking" be my guest. I don't want to argue that definition anymore, although the wideness of your usage makes communication difficult because it puts the "type" umbrella over validation and constraint management, which some readers may consider a outside of "types". If you wish to continue to use such a wide definition, it may be helpul to create a taxonomy of "kinds" of types to produce clearer communication.

XML is "typed" the moment you run it through a schema or DTD. And, despite your intention to put on blinders and think otherwise, these forms of constraints and validation do fall under the "type" umbrella. This isn't especially "wide" usage, either; it follows naturally from all other forms of typing. What do you think it means to check whether something is a real number vs integer vs unsigned integer? or is a structure with a particular feature? Type-echecking IS a form of validation of constraints - a subset of that whole field (much like birds ARE animals). If you want "kinds" of types, I invite you to read up on some works on actual TypeTheory rather than coming up with half-baked ideas like "TypesAreSideFlags"; you can learn about dependent types (wherein you produce a type based upon the actual arguments to the function), linear types (wherein you validate that variables are used under certain protocols... e.g. "exactly once"), uniqueness types (wherein you validate that a process uses only one instance of that type), predicate types (wherein you validate that a variable will necessarily meet a particular predicate), protocol types (wherein you validate that communications match particular patterns), constraint types (wherein you validate that two or more variables meet a particular predicate when taken together), and more. Give up on YOUR foolish notions of "types" and learn what is actually out there, THEN we can "produce clearer communication".

I disagree with the "natural" claim (on what grounds?). That being said, the above MPDB can have a constraint system/language that is as complex and fancy as one wants to make it (although it may require interfaces to a language of the user's/shop's choice). One may argue that such is not built-in to encourage or enforce a standard way to provide such. But adding such may be turning the DB into an app language, which is reaching beyond the typical scope of a typical DB.

I offered sufficient explanation after the "natural" claim that just saying you disagree with it (without explanation) is somewhat crass. And I haven't stated that the MPDB can't have a constraint system or language (though I discussed above why it defeats the flexibility of using MPDB). What I've asked is that you explain your thesis that "Type-heaviness probably has to be sacrificed". So, please, tell me why MPDB shouldn't have a constraint system/language as fancy as I (or any fan of strict, heavy typing) would make it.

"Natural" is difficult to objectively measure. I consider hidden type-flags, validation, and constraints as sufficiently different as to not roll them up under "types". Your personal world view may differ because of your fondness for Reynold's work, but you are not the reference being for every other person on the planet (nor am I for that matter, but I suspect you'd find my view of them the more common among IT practicioners).

I'll accept that the word "natural" perhaps entails something different for you and I in this context. To me it means that the operations and computations and descriptions necessary to support typing will necessarily support constraints and validation the moment the type-system reaches a certain level of complexity, and vice versa. This can be objectively (deductively) determined. Anyhow, regarding your "I consider" comments: Only a naive type-theorist would try to roll type-flags up under "types" (as opposed to optional implementation-detail). And only a noob type-theorist won't have already studied predicate and constraint types enough to know that one cannot describe a computable constraint (with an immutable description) that cannot be used as a type in a type-system. Your personal world view may differ because you're a naive noob in the field of type-theory... a condition that is probably common among IT practitioners.

As far as building in a fancy constraint system, such would probably require including a fairly complex TuringComplete programming language. While many DBMS do indeed include one, I feel that the DB and such languages should be separate things. I would like to be able to write Oracle triggers with Python or VB or Java or whatnot, not just Oracle's PL/SQL (although I believe Oracle is becoming more Java-friendly of late). The DB does not need to sanction a One True Language. Although, perhaps it should have a default implementation for speed resulting from tighter integration. But, this is mostly an implementation consideration. It would also be interesting to draft a declarative constraint system and see how far one could take that. But, I bet such would kind of look like the internals of a formal type engine.

I'd assert on principle that any constraint system should be completely independent from "triggers", largely because "triggers" imply communications that can be extremely difficult to reverse (making 'undo', rollback, and transactional semantics tricky). Declarative, rather than reactive, is the way to go for constraints. Now, a subscription to a query (and changes to it) would a be pretty cool way of handling "triggers" that communicate as one might expect "triggers" to do.

Perhaps. But that is kind of another subject, though.

I am a believer in One True Language for a database. Systems that try to avoid a common-tongue language are most analogous to a broken and shattered TowerOfBabel. I have difficulty finding any logic behind your approach of supporting a Broken Almalgamation Of Languages instead of One True Language.

However, if your primary objection to heavy-typing is that it would require creating (and sanctioning) a One True Language for describing constraints, I can understand your objection - the same reasoning you provide for focusing on strings and integers: support for the lowest common denominator among client languages that might utilize your Database. I just disagree with it. I don't believe that making all languages second-class is better than making all-but-one language second-class. Despite this, feelings one way or another about DB sanctioning of languages are completely insufficient to support your thesis that "Type-Heavy Must Be Given Up." Emotions are pretty low on the EvidenceTotemPole.

Because not giving it up turns the DB into a programming language, or at least shifts the emphasis toward the features of programming languages. Again, if you can propose a DB model that is considered multiple-paradigm AND type-heavy (and not a tangled mess), you can show my allegation to be wrong. However, until your unicorn shows up, my proposal is the most concrete MPDB on C2. I am not making an absolute claim, but rather analysing what has been shown thus far.

Further, if in order to use your MPDB, one has to master a programming language that people do not like, it will not poliferate even if it is the greatest MentalMasturbation toy since Lisp.

Ideally you shouldn't have to master a programming language in order to use the database, I'll agree. You master the language in order to master the database; there should be some natural graduation of available power, such that it's easy to create tools that perform easy or common activities. However, put it this way: almost nobody likes SQL, but it proliferates away regardless. An MPDB will succeed if it does the job people demand of it either in the absence of equivalent competition, or while providing a competitive advantage (e.g. one competitive advantage is the greater speed and optimization, and another is simpler work with tree-structured and collection-structured values, both of which can be achieved much more easily in the presence of static-typing or soft-typing.)

I'd like to see evidence for the tree and collection claims. The main competitive advantage of MPDB is flexibility and reduced reliance on a DBA, while fitting most of the SQL-influenced idioms people are already familiar with. I haven't seen any idea that is more flexible yet not requiring tossing out most existing RDBMS knowledge. There are DB ideas that are more flexible, yes, but they are too different from established tools. I realize that these optimization goals (flexibility + familiarity) may only matter in some niches; but universality is not necessarily the primary goal. I challenge anybody to find a better fit for flexibility + familiarity. -t


PageAnchor Tuple-Spaces :

Between this and auto-gen row ID, it sounds like what you want is a TupleSpace... plus Join queries.

Its structure sounds too "fixed" for the needs described here.

You never did describe any 'needs' here. Regardless, I haven't a clue how you came to believe the structure of 'TupleSpace' is too "fixed". What gives you that impression?

The meaning of the positions doesn't seem to be well-defined or well-tracked in TupleSpace. The approach I'm proposing uses maps and every row (map) carries a label for each value (or at least requires a reference that identifies such). I don't see where TupleSpace guarantees the same thing.


The topic DynamicRelational was created in attempt to split OO-centric characteristics from dynamic characteristics. Perhaps a refactoring of this topic is in order.


See Also: ObjectsAreDictionaries, TablesCanBeObjects, SqlFlaws, TupleDefinitionDiscussion, MaspBrainstorming, TableQuantityVersusAppSize, GodTable, MultiParadigmDatabaseDiscussion, MultiParadigmDatabaseQuestions, MultiParadigmDatabaseCriticism.

SeptemberZeroSeven JuneThirteen


CategoryMultiPurpose, CategoryDatabase, CategoryMultiparadigm, CategorySpeculative



EditText of this page (last edited June 29, 2014) or FindPage with title or text search