Just Three Persistence Patterns

From working on systems in a half dozen domains I have come to the conclusion there are three simple patterns we should lean on for making documents and messages persistent.

TupleSpace (think RupleForums) for in-progress, state machine-like transactions and collaboration.
VersionedDocumentTree? (think SubversionFileSystem) for long-lived, shared, document editing.
StarSchema based storage (think SybaseIq, a simple, low-maintenance, scalable database technology) for read-only transaction history and analysis.

More thoughts at http://patricklogan.blogspot.com/2004_02_08_patricklogan_archive.html#107620973199427475

What do you think?

StarSchema is a bad organization because it is typically denormalized. Most of the times it is an AntiPattern and only on occasion is a useful hack. See ChrisDate's "Reply to Ralph Kimball" http://www.dbdebunk.com/page/page/622790.htm

Note that a Star Schema-based storage is a RelationalDatabase. But in this case I am suggesting that its use be tailored and restricted specifically for a particular kind of schema and access. The kind originally intended in fact. See http://www.ralphkimball.com This is in contrast to the typical OracleDatabase, SqlServerDatabase?, etc. which is based on implementation ideas from the earliest relational technology geared toward early 1980s hardware, or late 1980s hardware at best. Certainly these popular relational technologies are geared towards transactional processing rather than relational analysis, and are expensive to scale, especially for analysis purposes, and expensive to maintain. This thesis chooses relational technology for historical analysis only.

Well, yet another handwaving about relational model. Relational model is not about storage, nor persistence. The implementations of today are not expensive to scale: http://www.tpc.org/tpcc/results/tpcc_price_perf_results.asp.

I am suggesting three general patterns that simplify uses of persistence. The third, StarSchema, is most closely associated with relational databases, and in particular OLAP. The first, TupleSpace, can replace some common uses of relational databases for OLTP. Moreover a TupleSpace can be used in a wider variety of ways, more simply, than can a relational database. For StarSchema I am being critical of typical relational database implementations; and I am suggesting an alternative approach similar to SybaseIq. -- PatrickLogan

Handwaving, handwaving and more handwaving. Is TupleSpace relationally complete, does it have the same expressive power ? Does it support transactions ?

I like the idea of trying to capture all powerful approaches to persistence in a small number of patterns.

However, you have given no rationale for why you think those particular three are the ones that are both necessary and sufficient.

Agreed. That's where this is headed.

It seems clear that you need some kind of comprehensive catalog of applications for persistence which is then generalized over.

Agreed, again. I wrote on my home page this hypothesis is based on experience from several domains. If this were to become a real thesis, I would have to draw on those and bring in the experience of others.

More particularly, why include a warehousing database but exclude transactional databases?

Relational technology is better suited, and originally intended, for OnLineAnalyticalProcessing, i.e. a DataWarehouse using StarSchema. This thesis suggests OnLineTransactionalProcessing is better served by a more flexible, less relational model, using a TupleSpace.

Which of these patterns is intended to replace filesystems?

Most directly the Versioned Document Tree is a filesystem, but above the level of what we typically think of in terms of an OS and a specific file system implementation, e.g inodes, mounts, etc.

Which of these captures state in HTTP transactions that currently people use cookies for?

Good question. Isn't a cookie just a small message that is sent from a client to a server? Consider client C working with a service S. C sends S a cookie along with some other minimal information about the requested service. S gives the cookie to a tuple space as a pattern to retrieve the existing state required for fulfilment of the requested service.

Which of them would you apply to checkpointing and restoring arbitrary process state, and how does that integrate with reverse-time computing? (Useful in debuggers, and implied by a universal persistence mechanism, since one could save state after every instruction and then restore state in time-reversed order.)

Checkpointing could be seen as an underlying mechanism for implementing any of these robustly. But at the application level, any of these can be used to implement checkpointing a service to make that service itself robust. Consider a typical web application that has a small shopping cart per user. The shopping cart is relatively small, ad hoc, and should time out if the client fails to act on it within some given period. This might be implemented using a tuple space. On the other hand a group of architects might be drawing up plans for an upcoming competition. They work around the world, share documents, and plan to collaborate over the next few months. In this scenario a Versioned Document Tree is their checkpoint solution.

Which of them replaces/encapsulates using flash memory to save configuration of an embedded system?

What would you like that system to do? If it is a small number of ad hoc configuration items, then why not have a small tuple space in flash?

Tackling all applications of persistence is a tall order.

Well, this is nothing if it is not intended to be controversial.

Thanks for the feedback.

-- DougMerritt

"Which of these captures state in HTTP transactions that currently people use cookies for?"

Perhaps the BigBallOfMud pattern. The various ways that one can make HTTP, a stateless protocol, support session-oriented client-server applications are all horrible kludges and always has been.

When I worked on go.com front page and home page, I had to implement a number of customization features: users would edit a prototype/template that would change how some pages were displayed to them. The state maintained in cookies allowed me to know which request went with which customization.

If you are saying "just don't do that", that isn't good enough. Being religious about following some design principle interests me far, far less than getting done the job that my users (and management!) want me to do.

It's easy to be a Monday morning quarterback and list the rules that an ideal world should follow. It is a thousand times harder to start with the requirements of a far from ideal world, and figure out a clean way to meet those needs.

I have no argument with that attitude, but trying to extract principles (such as this page does) from obviously flawed starting points is also a pointless exercise.

Nay nay, I didn't insist on the previous mechanism, I asked what to use to replace it, and that is a perfectly fair question when the topic is a universal mechanism. If it doesn't offer a replacement, then it's not universal. Again, you can't just say "stop doing that, I don't care whether you need to or not." There is a need for state to be maintained over HTTP transactions, you can't just handwave that away as undesirable. I gave a scenario. Go ahead and argue what should be done in that situation that will deliver what the users want.
- A replacement would be a stateful protocol. Persistent cookies are used to hold HTTP session information only because HTTP is stateless. The persistence is a kludge - session information does not need to be persisted.
- Ok. Now, persistence equals state, so this replacement stateful protocol should come under the domain of a universal approach to persistence, so back to my question, how would it be handled under this proposed new universe?

The simplest answer is that nothing would have to change since I am not suggesting eliminating HTTP from the new universe.

The more complicated discussion: Persistence does not equal state until you define what you mean by each term. I would define persistence as retaining the integrity and availability of data across the lifetimes of multiple distributed OS processes. I would define state in the context of HTTP as the set of persistent and non-persistent data maintained between a client (typically a browser) and a server. Some of that persistent data is stored on the client, most on the server. Cookies are often persistent, sometimes expire, etc. They are used to facilitate coordination between the state on the client and that on the server. I would not suggest this thesis should change any of that, at this level of definition.

Hmm, to me JustThreePersistencePatterns look more like

HierarchicalStorage? (-> e.g. FileSystem(s))
RelationalStorage? (-> e.g. DataBase)
GraphLikeStorage? (-> e.g. www: http+HyperText)

-- GunnarZarncke

CategoryPersistence