Transparent Persistence

Also known as orthogonal persistence, transparent persistence is a mechanism for bringing data into RAM implicitly, i.e. without expressing input procedures explicitly, and writing data back to its persistent form implicitly, i.e. without expressing output procedures explicitly. Typically this mechanism is in the context of an object-oriented programming language, but this may simply be a coincidence since most popular languages are object-oriented.

A transparent PersistenceMechanism may be fairly simple with one operating system process associated with one PersistenceStore, the store having one PersistenceRoot?, and no GarbageCollection other than that provided by the ProgrammingLanguageRuntime?.

A transparent PersistenceMechanism may be fairly complex with many distributed processes associated with one or more instances of a PersistenceStore, each store having more than one PersistenceRoot?, and a PersistenceAware? distributed GarbageCollection mechanism beyond that provided by the ProgrammingLanguageRuntime?.

Transparent persistence can be implemented in many ways, using an ObjectOrientedDatabase, SerializedData?, RelationalDatabase, or other form.

See InfiniteAmountOfTransactionalMemory

Transparent persistence imposes a ProgrammingModel?

Apparently the PythonLanguage ZopeApplicationServer has some form of transparent persistence.

GemStone/SmallTalk and GemStonej are two implementations of TransparentPersistence. Both are fairly complex mechanisms, but not identical. Their differences are due mainly to the relative simplicity of the SmalltalkLanguage and the relative complexity of the JavaLanguage, respectively.

The complexity of Java in this case has to do with the assumptions built into the language regarding the CompileTime features of the language vs. the RunTime features of the language as well as how static aspects of the language are supposed to be implemented. (StaticMeansClassInJava?)

The ProductivityEnvironmentForJava (PE:J) provides an implementation of TransparentPersistence. It generates a PersistenceLayer based on the JavaDataObjects (JDO) specification.

ErosOs implements the idea of TransparentPersistence. Operating system and application state are saved (checkpointed) at regular intervals, and restored when the system is started. Programmers allocate memory (or create objects) and put data there, and it automatically is saved to disk and is thus implicitly (and transparently), as opposed to explicitly, made persistent.

I've never seen a transparent PersistenceMechanism that performs adequately - unless that is, it was hand coded to the domain it was serving. The problem with transparent persistence is that you usually lose all the cool things about relational calculus that makes data queries fast - out left joins, searching, and such. The worst thing about transparent persistence is the traffic between the data engine and the application model. For example, you either end up retrieving objects one row at a time or retrieving the entire set of objects - when you are dealing with millions of records, that is a drag. -- RobertDiFalco

Hmm... if you had a turing complete query language, wouldn't one's typical rdbs count as TransparentPersistence?

It's not that simple. You can make SQL turing complete by wrapping the right SQL statements in an infinite loop. E.g. if the SQL is such as to emulate a cpu. Quite easy to do as a demonstration. But the result would be vastly slower than the regular way of programming, which goes to Robert's point: the problem isn't doing persistence, it's getting all 3 of (1) persistence (2) transparency and (3) performance, simultaneously.

Because of this, you probably don't really want utterly transparent persistence (with ACID guarantees, that is), you just want it to be easy. Then you can get it when you want it, without much effort, and take the performance hit, or avoid it when you don't absolutely need it, to run at normal speed.

If you don't care about ACID guarantees, then the above comment about ErosOS possibly makes it a solved issue.

Once upon a time, when everyone was using non-volatile core memory, some of these issues were a lot simpler; part of the complication today comes from needing to map data from volatile RAM to non-volatile disk. -- DougMerritt

[[[The following text moved from OrthogonalPersistence. RefactorMe.]]]

[From KillerOperatingSystem]

OrthogonalPersistence means you don't have to "save" and "load" files, the OperatingSystem just swaps out memory to disk when it shuts down. (ErosOs does this; it also resembles the Smalltalk image). ''Actually Eros writes all changes to a log immediately, and snapshots periodically. You can (as you should be able to) pull the plug and you're safe to a very recent point.

Is this the same as TransparentPersistence?

OrthogonalPersistence sounds cool, but what happens if the OS crashes or if there's a power cut? It can't just save when you shutdown, it has to be able to recover the state just before a power outage. And what about when you want to force a reload from some known good state (because something has corrupted memory)? Or when you want to revert to a previously saved copy of something (because you realize your changes made things worse)? -- GavinLambert

Notice how lacking orthogonal persistence doesn't help you on any of these issues. So bringing them up is essentially asking "How can we make OrthogonalPersistence even cooler than it already is?" One answer is to have a LoggingFileSystem underneath everything, making regular snapshots of the log's metastate (to make reboots extremely fast). Another thing is to allow the user to specify the granularity of snapshots (every hour, every minute, every second?) and to be able to force snapshots.

Huh? If you don't have orthogonal persistence, you have the currently "normal" way to do things, have load images on disk that load various bits of code into memory on startup until everything is running again. By definition, the bits of code on disk are in a "good" state, and when you're loading them all into "clean" memory you're basically loading a known good configuration. If later on in the session one of those programs completely bollixes memory, you can clear out the memory and reload from the known good configuration - in other words, whack the reset button. -- GavinLambert

And of course, all of this is done manually. The user or programmer has to manually specify that this can and should be done with programs X, Y and Z. Everything you can do without OrthogonalPersistence, you can do with it. But in the general case, you can't just restart a bunch of processes and hope everything magically works out. What if they were using a very rare and expensive resource? What if they were using a one-time accessible resource (a remote process that just terminated or a user that just had a coronary)? What if they modified a remote resource during processing? If you want to do things the dirty way and cross your fingers then OrthogonalPersistence can do that too. But it should be able to do better.

I think that a perfect implementation of orthogonal persistence requires you to do away with the notion of a "disk" altogether. Somewhat like PalmOs does it, which uses database records which in essence are nothing more than memory blocks, you control read and writes to them by requesting access to the database record, accessing it and then hand it back to the operating system. If the hypothetical new operating system used this technique coupled with full transaction management with logs and rollback, we would have something really cool. It would even be possible to roll the database back to what it was earlier, thereby getting a system wide undo functionality. -- EliasMartenson?

What does 'undo' mean in a distributed multi-user system?

Undo means that you can go back to an earlier state. For something like the above described undo functionality to work in a distributed environment naturally means that you need some form of transaction manager working across the involved systems. Of course, the overhead of having such a transaction manager running might outweigh the advantages of having transactionally safe execution. These things become evident once you go from theory to implementation. The perfect design rarely translates to a perfect implementation. -- EliasMartenson?

That addresses distribution, but what about multi-user? What are the semantics of undo? Who can force the entire system to go back to which earlier states?

Consider, is the undo queue public property? There's no security in that. Is it owned individually? You have to deal with conflicts somehow. Is there something between public and private? Perhaps you can have ownership of undo queues match ownership of objects, so that an object's owner can undo what others (non-owners and owners alike) did with impunity.

And I don't think you need a transaction manager at all. If you can't undo your changes on another system, then this is equivalent to that system's owner forbidding you from doing so. A system can never be entirely owned by a foreign user. I guess the semantics of having multiple users makes most distribution problems irrelevant. Just consider a lightning bolt a superuser ....

If you want a transactionally safe distributed operating system, then I would presume you need a transaction manager. The reason is, that if you want to be able to do rollback, it's an all or nothing approach. Compare this to a database. The only person that is able to do a system wide rollback is the systems administrator, and he is only able to do it while the system is down.

Consider the undo functionality more of a type of backup rather than something that would be used all the time. The traditional rollback, however, could be very useful. Imagine being able to try out a program that you are not sure what it does by starting a transaction, run the program and then rollback.

Now, please note that I'm not saying that this kind of operating system really is the KillerOperatingSystem. I'm just saying it's an interesting idea worth studying a bit further. Perhaps it's material for a page of it's own? -- EliasMartenson?

For security reasons, it is necessary for the OS to provide memory which is not versioned. Passwords may survive a reboot only if there is a big shiny button a user may push to "delete all passwords from system NOW" when the cops come crashing through the door with a search warrant. And it might be better for that big shiny button to be the computer's reset switch.

Here is a transparent persistence problem that crops up occasionally.

Say I am using a transparent persistence framework such as Hibernate. Now, generally, I want to model my domain with POJOs that have no persistence semantics and use the framework to implement persistence of those objects. All well and good and easily implemented using Hibernate for typical models.

But say I have this interface for a model entity.

 public interface Region {
    public Set getOrders();
    public Set getOrdersBetween(int low, int high);
 }

getOrdersBetween(...) will select orders from the entire set that match the criterion. But, to do that in a POJO, I have to write search code into the application, e.g. an iterator that checks each order or a binary tree that is maintained by the Region implementer as orders are added to the relationship. But, any conceivable persistence mechanism that I eventually hook this to will have no trouble performing the selection directly. So, implementing the search in the POJO is a waste of effort and could easily be less efficient than using the selection facilities of the eventual RDBMS.

So, to avoid the problem, I either add persistence code into the Region implementer or add an appropriate finder to the associated DAO interface ( e.g. regiondao.getOrdersBetween(Region forRegion, int low, int high) ) and eliminate the business method from the interface.

Neither of these solutions feels right. The first violates transparency. The second moves the operation to a less logical object.

Is there a data access pattern that solves this?

-- GregWiley?

Use a smarter "data aware set" that looks and behaves as a normal set in the domain, but when given a query such as between, generates an appropriate query and sends it to the DBMS allowing the illusion of clean domain code while letting SQL do what SQL does well. SQL has Select, no reason your Set/Lists can't have the same, in fact, why not more, just go look at a real collection hierarchy like Smalltalk. Consider a few basics at least standard on all your collections like say: forEach, find, findAll, findNot, collect, matches, anyTrue, allTrue, first, second, third, fourth, fifth, rest, and maybe a few others you find handy. It's not so difficult to make a SQL DBMS pretty much look like just any collection.

I like that idea. Of course, I'd still have to implement the non-passthru versions of those operations so that code could be tested without a DB. But if it was done in a standardized way (that is, implemented once :) ), it might not be so painful. Then, in deployment, non-passthru versions could be injected. But first, I'll check that there isn't already an available Java library that implements smart collections. The logical next step is to extend a persistence framework to implement those smart collections(e.g. generate SQL) or at least provide a standardized integration mechanism custom SQL implementation. It's possible that some of the persistence frameworks already do this so I'll look into it. -- GregWiley?

Today I started experimenting with an AspectJay approach. I have found that I can use it to solve a couple of transparency problems in Hibernate: First, one thing that bugs me is the requirement that model classes contain a primary key field. For an in-memory object, its reference is its identity so adding a field for that purpose is anti-transparent in the pure domain model. My experimental approach to solving it in AspectJ involves static crosscutting and looks like:

 public aspect HasPrimaryKey {
   private interface hpk { }
   declare parents: model.*Impl implements hpk;

   private Long hpk.key;
   public Long hpk.getKey() {
     return key;
   }
   public void hpk.setKey(Long key) {
     this.key = key;
   }
 }

Second, the problem I originally proposed can be solved with an advice that replaces finder methods in the model:

 public aspect DatabaseFind {
    OrderDao orderdao = new OrderDao();

    pointcut ordersInRange( Region region, int low, int high ) :
       execution( protected Set model.RegionImpl.findOrdersInRange(..)
             && args(low,high)
             && this(region);

    Set around(Region region, int low, int high ) :
               ordersInRange(region, low, high) {
       return orderdao.findRegionOrdersInRange(region, low, high);
    }
 }

Now, what I'd really like to do is to use a wildcard in that pointcut and have the advice base its DAO method selection on a decoding of the join point's method name. But I don't think that's doable in AspectJay. I'll keep looking. Meanwhile, all the finder methods have to be entered into the aspect manually (some typing could be saved by using anonymous pointcuts). -- GregWiley?

I came up with a way to do the wildcard substitution above. The code is long so I'll just describe the approach.

establish a convention for protected finder helper method signatures in the model. e.g. <target-entity-or-collection> model.<entity>Impl.find<target-entity>By<query-description>(...)
establish a convention for DAO finder method names. e.g. <target-entity-or-collection> dao.<target-dao>.findFor<entity>By<description>(<entity> e, ...)
in an aspect, define a pointcut that selects execution of the finder helpers. e.g. pointcut finder() : model.*Impl.find*By*(..)
in the same aspect, define an advice that uses thisJoinPoint to
- parse the method name to obtain the entity, and target
- obtain entity instance, method arguments, and return type
- select the appropriate DAO
- produce a method name conforming to the DAO finder convention
- reflect on the DAO to discover the finder method with the appropriate signature
- call the DAO's finder and return the result

-- GregWiley?

I am increasingly of the mind that TransparentPersistenceIsImpossible?. -- MoeAboulkheir?

Moved issues on definition to PersistenceDefinition

Using database tools, one stops thinking in terms of files. When doing most of the processing in collection-oriented systems, such as ExBase, one tends to see a pattern of a distinction between "cataloged" info and non-cataloged. Cataloging means that one explicitly gives something a name and/or identify, which may be a "path". I've used a lot of "temp tables" which are not assumed to be "kept" and are treated more or less like locally-scoped variables. I've used other non-DB systems, such as statistical packages, that have "session" collections that are only "saved" if you explicitly go through a "save" step of some kind, or pre-designate the collection as a cataloged collection. When using a file system explicitly, the processing of "saving" is generally the cataloging step. It's not so much about disks, but about the human action of "cataloging". Temporary collections may also use disk, but it's a matter of having some kind of "index" or registering step so that it can be found again later. -t

Consider a different UserStory: you power down your computer in the middle of an ExBase process that involves a lot of "temp tables". When you power up your computer again, the ExBase process fires up and continues that process right from where it left off (or close enough that you can't tell the difference). You didn't need to restart that operation. You didn't need to rebuild those "temp tables" from scratch. You didn't even need to give those 'temp tables' any names for an index or catalog. This would widely be considered TransparentPersistence of process (likely provided by the OperatingSystem) despite the fact that those tables and the whole ExBase operation are 'volatile' compared to the other processes.

But you may have that with everything if you had something similar to bubble memory. Powering down wouldn't rid the variable stack and program position pointer. However, there may be operations in an undetermined state if the CPU suddenly stopped due to loss of power. To get around such we'd need some kind of A.C.I.D. system.

Why do you state that as an objection? There is no known fundamental reason that we cannot have widespread TransparentPersistence. There are many reasons we don't widely have TransparentPersistence, of which you name just one: consistency issues. Others include: recovery of networked services, and dealing with upgrade (if everything is persistent, then you are essentially forced to treat all upgrades as occurring at runtime, which requires we revise our software distribution models), and achieving all this without sacrificing performance or security. That's a lot to bring together at one time, and it would help a lot to have support from the hardware (BubbleMemory, MMU), OS, language implementations, language and service design (idempotence, LiveProgramming), and network protocols. I will note that AtomicConsistentIsolatedDurable is stronger than necessary: for persistence we don't care about 'atomic' or 'isolated'; we do want 'durable' and 'consistent'. 'Consistent', in this case, means that the power disruption is either invisible to the program or has well-defined semantics. You might not be willing to pay for the atomicity or isolation properties, though they may be useful for other purposes.

Our software architectures have reflected hardware architecture for so long that we simply haven't explored alternatives sufficiently. But "work-space" versus "storage" is a common idiom in both manual work and the human brain.

TransparentPersistence helps separate the logical lifetime from the physical power-cycle. Logical organizations of storage vs. workspace are still very useful for understanding system behavior under concurrency, security, system modularity, extensibility, and various forms of partial failure. We still benefit from understanding temporary variables/tables as distinct from BlackboardMetaphor or PublishSubscribeModel or DataBase or TupleSpace or other high-level organizing principles for code and data.

See ProgrammingWithoutRamDiskDichotomy PrevalenceLayer, PersistentLanguage, ImageBasedLanguage, ContainerManagedPersistence, NoMoreDatabases

CategoryPersistence