Tuple Space Scalability

What are the issues with scaling a TupleSpace on the Internet?

Some ideas are here:

http://patricklogan.blogspot.com/2003_05_04_patricklogan_archive.html#200272403 (BrokenLink)

What do you think?

The latency problem with blocking operations could be solved by using PromisePipelining, as in JouleLanguage and EeLanguage.

I've been thinking about more or less the same problems. The conclusion I've come to is... the semantics of TupleSpace are wrong in a distributed environment. A distributed, persistent, fault tolerant, transactional AssociativeMemory cannot scale linearly.

''I agree; it cannot scale in its current form. However, if you're willing to deal with occasionally replicated work levels of operational success and re-work that are measured statistically rather than with booleans, and spend some time developing efficient search capabilities, and perhaps re-develop the TupleSpace with some notion of locality, perhaps it could be done. I don't think you'd have the TupleSpace anymore, though... at least not in its current form.

The best I could think of is a "soft replication" model. Each tuple has an "owner" - the machine it was initially created on. The owner "knows" about some of the neighbouring machines, and sends them notification when tuples come in or get taken. If one of the neighbouring machines receives a request for that tuple, it should return a "redirect response" to the client - EVEN THOUGH it has a copy of the tuple. The "redirect response" should point the client to the owner; it's the client's responsibility to connect to the owner and re-request the tuple. It's the owner's responsibility then to give the tuple to the client and tell neighbouring machines to erase the replica.

In this architecture, we don't need a centralized dispatcher of any sort - all nodes are created equal. Transactions and persistence are sovereign affairs of each separate node; the "soft replication" need not be transactional at all, for obvious reasons.

Disadvantages:

- If a machine dies, its tuples stay unaccessible until it's resurrected (or until the data is migrated to another machine manually), even though replicas exist. This isn't exactly fault tolerance.

- The topology of node connections must be specified at deployment time, and is hard to get right. In fact, a request for a tuple may fail even though the tuple is there somewhere. This isn't exactly shared memory.

- Clients are required to understand "redirect responses", or even to poll multiple servers. This isn't exactly a simple interface.

But, I repeat, this is the best approach I could think of, and it doesn't involve any stupid "partitioning", "automatic failover", "heuristics", "centralized manager", "tree topology" or somesuch crap.

-- VladimirSlepnev?

See also MozartProgrammingSystem, which has very nice distribution semantics (transparent distribution, fast object migration comparable to FuturePipelining? I believe, migration is not transparent, but can be added easily on top, failure semantics).

Alternately, what about using a TupleSpace basis for programming on Sony (et all)'s upcoming CellProcessor architecture?

There seems to me to be a synergy between the current state of the art in P2P systems, and tuplespaces, namely the acceptance that, for a large enough group of nodes, probabilistic distribution mechanisms (ie. epidemic & gossip models) scale far better than transactional models, and the reliability goes up with the size of the grid.

--DarrenHobbs