Multi Paradigm Database Discussion

Material moved from TupleSpace.

I'm thinking it wouldn't be such a hideous AbstractionInversion to use an RDBMS to implement a TupleSpace.

Instead of having different kinds of tuples, you simply have records in different tables. Since most modern RDBMSs provide in-memory tables, you have the possibility creating/consuming large numbers of them without hitting the disk space (so long as you can keep your tables small enough to fit into memory). The transactional nature of such things would seem to be a good fit for Linda's atomic operations. Additionally, if something happened which caused a transaction to fail, the ability to roll-back to a known state and try again would be very helpful. In this respect, I see quite a bit of the development surrounding Linda as re-inventing the RDBMS wheel. Naturally, you'd want to do it in a more lightweight fashion, so much as possible. Eliminating the SQL engine and providing an API to access data/perform operations would be a good start.

Many of the Linda tasks you create are based on "consume a tuple of this type, do X calculations with it, emit a tuple of this other type, repeat as necessary." This would require something repeatedly polling the database, looking for an input tuple to consume. If you could use triggers, which spawn or execute that process every time a record is written to the appropriate table, you get an interrupt-driven process instead of a polling-driven process; from there, it becomes a question of the DB server's ability to manage large numbers of spanwed tasks. Add in the fact that, for many, modern RDBMSs you can have a cluster of machines, sharding the data from a specific set of tables across multiple physical machines, and you have the ability to scale across multiple physical machines for increased performance. Indeed, while you may want some of the records persisted for actual consumption outside of the process, if the vast majority of "intermediate" results, working toward those records, can be done entirely in memory, it may be worth your while to have several physical machines which only shard the in-memory tables and handle the bulk of the triggered processes.

While such a system would be a bit of overkill, I suspect there are more people who are well-versed in the care and feeding of RDBMSs than there are developing with Linda. This could be a good "jumping off point" for introducing more people, inside and outside of various enterprises, to the concepts in Linda.

Are there papers out there about people already trying this? Or have tried this? I have to think I'm not the only person thinking in this direction, but I'm coming up short on hits on Google. It's possible I'm just not searching on the right terms, yet. -- Meower68

It's not clear to me what RAM has to do with it. That's more or less of an implementation detail, not a database "user" interface issue. Often it's best to focus on the UI/language/interface aspects of a "technology" first, and then consider implementation issues. Of course it's not always seamless because machine/performance-issue trade-offs often dictate what's available or feasible.

To clarify, one of the performance advantages which a TupleSpace frequently has over an RDBMS is that the TupleSpace lives only in RAM. I'm merely suggesting that a modern RDBMS, with memory-backed tables, would bring the performance closer to parity.

[Any DBMS or TupleSpace developed in the last few decades makes extensive use of caching, to the point that disk speed generally isn't an issue except (possibly) at startup whilst the cache is being populated, or if the pattern of retrievals results in a significant quantity of cache misses. The most notable conceptual distinction between a TupleSpace and an RDBMS is that in an RDBMS, tuple storage must be organised into a finite quantity of predefined tables. TupleSpaces do not have this structural constraint. Depending on the application, this may be a help or a hindrance. If there are a wide variety of tuple types, the overhead of creating one table per tuple type (or of using some schema that supports arbitrary tuple types, such as EAV) may prohibit using an RDBMS.]

How is this different from MultiParadigmDatabase (other than dynamic typing issues)?

[For one thing, it exists.]

Lovely, the wonderful LearningWithoutImplementation fight yet again.

[If MultiParadigmDatabase were a clear and rigorous model instead of a bag of vague ideas, LearningWithoutImplementation would be possible using it. As it stands, who knows what is or isn't part of a MultiParadigmDatabase?]

Try it on something. If there's a vague spot, let me know and I'll fix it; plug it with details. Sure, life is easier with an implementation, I won't dispute that, but that's not a show stopper to exploration.

[Without a clear definition of "it", that's rather difficult. It's much easier to reason using a defined language syntax or mathematical notation. Is there an example of a MultiParadigmDatabase with either one?]

What's not clear? Point out specific fuzz and I'll happily shave it. As a guideline for MPD, if it's not specified, then assume traditional-RDBMS conventions. I wouldn't deviate from familiar RDBMS conventions unless it interferes with the primary goals of MPD, such as dynamism.

[What's not clear is what language syntax or notation is used to work with a MultiParadigmDatabase. If the answer is to "assume traditional-RDBMS conventions", then why shouldn't I assume a conventional RDBMS and ignore MultiParadigmDatabase? What, specifically, is different about a MultiParadigmDatabase in terms of the syntax and/or semantics of the database language used to manipulate it?]

Let's see:

{Why don't you describe the syntax you'll be using directly instead of just trying to list the differences? I'm sure you've missed some (differences) since I don't see how you'll be able to do a join at the moment.}

Why should we hard-wire the engine to a particular language? That's old-school. Granted, it can be used to explore certain aspects of an engine, but also risks locking in thinking.

Forget indexing, performance, and multi-processing issues for the moment and think about what's the best way to interface with and use a map of maps where the primary key is an auto-generated number and there are no explicit "type tags" on attribute values. Let's say the test implementation is just one big CSV file. Is that helpful?

No. A specific language can be particularly illustrative, especially for comparison purposes, but if you want to avoid any specific language you'll need to clearly specify the operations that can be performed on a MultiParadigmDatabase. See how this TupleSpace page starts by identifying the operations on a TupleSpace? At the very least, you need to provide some equivalent before a MultiParadigmDatabase can be compared to a TupleSpace.

Most of those are relatively generic get-and-put kind of things. The real meat is in the queries. But we can use SQL-style or SmeQl-style as starting point. If some unanticipated conflict arises between traditional RDBMS and the dynamic nature of MPD, we'll deal with it then.

If you are used to relying on null-ness for your query language or coding style, you may have to rethink your approach. There's likely a way to do the same thing, but it will feel different.

That's a bit vague and open to interpretation. Please document the operations on the MultiParadigmDatabase, or at least clearly describe, in detail, how the operations of a MultiParadigmDatabase differ from SQL and/or the RelationalModel and its RelationalAlgebra.

You can say, "I personally won't experiment with the idea until I have an implementation to play with". That's fine; that's your prerogative. You don't have to. (While we are on IwantaPony, I'd like to see some sample queries with TupleSpace like "Find all employees who make more than their boss", by the way.) But that doesn't mean there's no use in considering the idea on paper. We have static languages and dynamic languages, and it similarly makes sense to have/consider static DB's and dynamic DB's. One can kick around dynamic DB ideas on paper.

No, I personally won't experiment with the idea until I have a specification to examine. The notion of MultiParadigmDatabase appears very nebulous, and therefore potentially subject to debate over what is, or is not, part of MultiParadigmDatabase. When it's clearly and comprehensively specified, then I will consider it.

A key question is how close such a contraption can stay to existing RDBMS conventions and still be dynamic. What must we keep and what must we toss? To "market" the idea, we should probably stick close to RDBMS's if possible. TupleSpace appears to have made very little effort to at least stay in the same neighborhood as RDBMS.

A related question is whether to have type-tags associated with attribute values. Of course I'd want to skip them, but the idea of "explicit" dynamic typing should be considered to see what the design choices would be. For example, would the tag be associated via a column name (all rows), a specific "cell", or an inheritance-like combination?

You're obviously still considering what should, or should not, be in a MultiParadigmDatabase. As long as the notion so is amorphous, it's difficult to consider it an alternative to something as well-defined as the various TupleSpace implementations, or even the TupleSpace concept in general.

If that's true, then show me the TupleSpace query of "Find all employees who make more than their boss".

Typically, high-level queries with joins and the like are outside of the scope of TupleSpace implementations which retrieve tuples based on keys or matching values. High-level queries are implemented by performing further operations on the retrieved tuples. That's why a TupleSpace is suggested as a foundation for (say) implementing tables in a relational database, rather than the other way around.

So as-is, TupleSpace is either not "finished", or not competitive with traditional RDBMS. It's closer to a persistence-oriented machine language.

{It's finished, it's not a direct competitor to an RDBMS (you could implement an RDBMS using one, or vice versa), it may or may not be persistent, and it's most certainly not a machine language.}

It's a lower-level language than "typical" query languages; above machine language but below a "high-level" languages like say SQL or Python. But if it's not intended as a competitor to traditional RDBMS, why are we comparing it to MultiParadigmDatabase?

Someone wrote, "I'm thinking it wouldn't be such a hideous AbstractionInversion to use an RDBMS to implement a TupleSpace." Later, you wrote, "How is this different from MultiParadigmDatabase (other than dynamic typing issues)?" In response, I quipped that at least a TupleSpace exists and we drifted OffTopic. However, a TupleSpace could be used as a building-block in a DBMS and vice versa. The one is not lower-level than another; they're different things.

Comparing an unimplemented Foo to an implemented Bar is kind of odd.

Yes, so I'm not sure why MultiParadigmDatabase was ever suggested as an option.

You have NonImplementationAphobia.

No, I have IncompleteDescriptionAphobia.

You are asked to design a DBMS that follows these goals in this order:

Go!

Those are a reasonable set of requirements, but not much help in terms of a technical description. At least please document the operations on the MultiParadigmDatabase, or clearly describe, in detail, how the operations of a MultiParadigmDatabase differ from SQL and/or the RelationalModel and its RelationalAlgebra.

I too am curious about what general design you'd select given the above criteria. A rough description is fine with me as long as you explain/refine specific spots as requested.

Isn't that OffTopic for this page, not to mention avoiding my (reasonable, I think) request?

Most of this sub-thread is off-topic. It's a bit late in the game to complain. [Since moved out of TupleSpace.] And, I believe my request to be "reasonable". Asking somebody to make an entire running system just to comment on it is NOT realistic.

Your request is quite unreasonable if you're not intending to define it in any more detail than a list of vague requirements and then expect us to design it for you.

I expect a discussion of the kind of trade-offs and features such goals would entail. We have "static" DBMS already, and so asking to kick around designs and explore the implications of dynamic DBMS designs is not unreasonable. Every project should have a brainstorming step.

So, you regard MultiParadigmDatabase to be sufficiently defined that you can ask, "How is [a TupleSpace] different from MultiParadigmDatabase (other than dynamic typing issues)?" and expect (presumably) a reasonable answer, but it's not sufficiently defined for you to identify how the operations of a MultiParadigmDatabase differ from SQL and/or the RelationalModel and its RelationalAlgebra and suggest that we now "should have a brainstorming step"?

I thought it was mutually agreed they cover different concerns. For example, TupleSpace doesn't include a query language beyond simple accessors. The difference between MPD and TRDBMS (traditional) is probably very much smaller than the diff between TRDBMS and TupleSpace.

What does that have to do with what I wrote?

I'm not sure what you are getting at. Note that it's still possible to brainstorm about dynamic DB's independent of MPD. You seem to be suggesting they are mutually exclusive. Feel free to split such off to a different topic if you wish to avoid mix-ups.

I didn't mention "dynamic DB's". It appears you consider MultiParadigmDatabase to be well-defined on one hand -- such that it's reasonable to compare it to a TupleSpace -- but insufficiently defined such that you feel the need for a "brainstorming step". That sounds contradictory.

I don't have a need for it. It sounded like you did, so I suggested it as a potential course of action. I am curious how you'd personally solve the goals/constraints given in order to contrast the result with MPD, but if you don't want to consider such, that's fine with me.

Sorry, not clear here: What "course of action" were you suggesting?

Brainstorming.

To compare a MultiParadigmDatabase with a TupleSpace?

No. To compare one dynamic DB design suggestion to another dynamic DB design suggestion. TupleSpace does not qualify as a DB in my book because of its very weak query capabilities. I am interested in dynamic versions of RDBMS-like tools and welcome design suggestions. If you are not interested, then I invite you to enjoy your exiting of this topic.

No one claimed a TupleSpace is a "DB", a database, or a DBMS and it appears you agree that it isn't. Therefore, I'm curious why you thought it similar enough to a DBMS to ask, "How is [a TupleSpace] different from MultiParadigmDatabase (other than dynamic typing issues)? Furthermore, I'm curious what you thought was so similar about a TupleSpace and a MultiParadigmDatabase, given that a MultiParadigmDatabase apparently still needs brainstorming to determine what it is.

I'm not the one who brought up TupleSpaces in relation to RDBMS and related DBs.

True, but you continued the discussion in a particular direction.

I was just probing to see if there was a useful connection.


For typical usage, querying the MultiParadigmDatabase wouldn't be that much different than using "traditional" SQL. If the query interpreter translates table names into "entity=" or "table=" WHERE-clause references under the hood, then even the table-ness feel would still be there. The comparing operators may have to be de-overloaded to make the compare type more explicit, and how the equivalent of null-ness is handled would often be different, but these only affect a small portion of most queries. The real difference in "feel" is on the schema-design side of things, not the query side, as schema "design" (or non-design) would be far more ad-hoc, at least out of the box (as restrictions can be custom added). -t

And I wouldn't use Perl's comparison approach for reasons described in ComparingDynamicVariables.


EditText of this page (last edited February 12, 2014) or FindPage with title or text search