Journaling File System

A journaling FileSystem is a hybrid between a traditional filesystem and a LoggingFileSystem. Writes to the filesystem are moved to a log first and then later moved to the traditional component. Only when the operation has completed is the record taken out of the log. This log is limited in scope (often only to metadata) and time (from seconds to minutes) so that it doesn't need to be functional by itself; in other words, it's broken. The filesystem relies on its traditional component for the actual storage and retrieval of data and metadata.

Journaling filesystems were invented to take advantage of the reliability of LFS. A journaling filesystem can recover from a system crash by examining its log, where any pending changes will be, and replaying any operations it finds there. This means that even after an unexpected shutdown it's never necessary to trawl through the entire contents of the disc looking for inconsistencies (as with scandisk on Windows or fsck on Unix); you just sort out whatever has been added to the journal but not marked as done.

Journaling filesystems abandon write-optimization. For many applications read-optimization is a far more important consideration than write-optimization, but in many cases read-optimization has ceased to be a consideration with large RAM caches. Once upon a time, it used to be that 90% of hard disk operations were reads so that read-optimization made sense. Nowadays, that 90% is writes and still nobody bothers to write-optimize. This was one of Ousterhout's primary motivations in writing the BSD LFS.

What this means is that most data is written to cache without ever being read back, that hard disks have begun to take over the function of tape drives. And while many applications require read-optimization, the most important application of all, the operating system, requires write-optimization.


Examples of journaling filesystems in production:


On the topic of file systems (as opposed to some different persistence paradigm), this is pretty much the RightThing, so it's interesting that it took until the 21st century to start to become common.

This general approach was known to be critical for databases decades before that. As far as I know, the delay in applying the lesson to filesystems was due to performance issues. Additionally, there wasn't much perceived need for it until filesystems measured in hundreds of gigabytes became common. The chief advantage of a journaling filesystem is that it doesn't take excessively long for a machine to come on-line after a hard shutdown. Databases required journaling long before this in order to provide ACID semantics.


EditText of this page (last edited April 1, 2006) or FindPage with title or text search