Programs Are Databases

In a metaphorical sense, at least.

The first (next?) part of this page contains a sort of manifesto for an idea that is at once extremely difficult to implement practically and also rather confusing to most. In retrospect, I think it's safe to say that the world really needs the equivalent of VisualAge FooBar as standard issue software for each language. Making a "visual" source repository toolkit available for customization by all comers could possibly solve much of this perceived natural injustice.


ProgramsAreDatabases, not text files. The text file is an artifice designed to get the program into the memory of the computer. It's not much better than a stack of punched cards, when you get down to it.

There is a static structure to all code. That is, there are classes and modules and methods and blocks and functions and whatever else your language supports.

Much of the static structure is completely accidental. Do you care if your functions are defined in alphabetical order? Does the computer? Would it completely screw up merges in a version control system if you alphabetized your function definitions?

What matters about a method is the text of the method and its class. The textual relations to other methods are irrelevant, but the existence of such relations is unavoidable in a text file. Therefore, text files encode much nonsense about programs. I therefore claim that text files are an inherently inefficient way to work with computer programs as the structured data that they really are.

What is needed is a programmer's toolbox that supports working with code NOT as text files, but as a collection of related entities in a database. It might be valuable to establish some standards for the representation of these databases, or it might be important merely to have a way to go between text files and databases.

On the text file front: I have actually used a homebrew system for treating TCL programs as databases. I wrote it in TCL. The problem was that nobody else could use the intermediate representation, and I always got diffs vs. the final text output. Suckage.

I claim, therefore, that a migration path to the OneTrueWay must support conversion to and from text, and it must integrate version management and merging. The task of merging changes within a database is different (not entirely) from doing the same with files. It just requires a lot of patience.

OK; So let's have a bit more detail. If you are going to temporarily abandon textual names, you still must link up your functions somehow. So - do you propose a graphical program editor where you draw links between functions, or drag and drop or some other convention? I'm quite happy that you can provide something better than a text editor for working with code, very much in the RefactoringBrowser scheme of things. I have a problem though with abandoning textual names for subroutines - if that is what you are suggesting. If you keep textual names for subroutines, aren't the savings mostly at the level of auto-generating forward declarations? - which is handy but not a major revolution on the path to the OneTrueWay.

Ok, I've been misunderstood. I do not advocate dropping textual names for things in programs. I advocate abandoning text files as a way to store programs in progress. Text files are sub-optimal for working with programs because you spend brainpower answering the question "Which part of the file contains function FOO" or else at least you get badly bitten by (e.g.) CVS when you stick a new function at the bottom of the file and so does someone else. In God's programming tool, there are no such conflicts.

So what are you proposing? That we keep all source code in memory 24 x 7 rather than saving in bits and pieces? Is using "select .. where ..." to search a database really superior to using Find or GREP to search a set of files? How does a database solve update collisions any better than common source code control systems? If you don't use exclusive locking, you will have conflicts in a database just as much as in your favorite source code control system.


see also BananaPeelsAreDatabases?, InspiredGeniusWishesProgramsWereLikeDatabases?

I'd understand this remark if this page had no sound theory or practical application, but it seems to have both. Can you clarify?

Mere assertion is not a particularly useful way of changing people's minds, or reality for that matter. I guess my objection is to sloppy writing. Programs are no more databases than they are text files, so we may as well assert that said fruit waste is a database. The source code of programs is often a text file, unless some special editor stores it in a native format. I guess it's a bit like a red rag to a bull when I see a page with a name like this one. We all know that programs are not databases. Programs are sequences of instructions for processors. Databases are places for storing and accessing information.

"Programs are sequences of instructions for processors." These instructions must be stored in the computer, so they are also data. But they are not just free-form data, they are also highly structured: hierarchically contained packages, classes (possibly inner classes), methods and attributes of various visibility. These elements have structured relationships among each other (who sees whom, who depends on whom, who inherits from whom...). So, programs are structured data, currently stored as... text?!? The original author is right IMHO, in an ideal environment, highly structured data such as program source belongs into a database. -- FalkBruegmann

{I'd hesitate to replace text with a representation of programs designed largely to be more convenient for the computer. In my mind, things should be made as convenient as possible for the programmer, if necessary at expense to the computer. Program source is for communicating ideas both between programmers and from programmer to computer, and those purposes should take precedence. Text serves that purpose well enough... and also readily allows for RealMacros and various forms of MetaProgramming that might be difficult to handle in an environment where programmer-inputs are strongly structured by the database format.}

{Getting rid of text files wouldn't be bad, though. Text files are merely a transport and storage medium for program source. There are a number of relevant alternatives for storing and transporting source, some including the use of databases. Some are even in use already, a fine example being various Smalltalk environments. But disparaging text files isn't the same as decrying text entirely.}

{The vast majority of features one might desire of structuring program text itself into a database can be provided by a system that parses and annotates program source-code in order to allow for rapid refactoring. Such an annotation system must be aware of the storage medium for the program, but isn't really dependent upon it. The annotations themselves could be stored easily into a database.}

Oracle StoredProcedures are obviously an implementation of this ProgramsAreDatabases pattern.

{How so?}


Most image-based languages follow this pattern (LispLanguage, SmalltalkLanguage, FactorLanguage, ForthLanguage). They have at least one index (e.g. the dictionary) for symbol lookup. Some have multiple indices for different types of symbols (functions vs. classes vs. methods). Those with built-in browsers (Smalltalk) sometimes even give you a non-hierarchical means to search for particular method signatures.

As to text storage, I've seen scripting languages for which the source was composed in a tree control rather than a text editor, which captures the inherently hierarchical nature of StructuredProgramming. Folding TextEditors approach this from a different angle, as do GraphicalProgrammingLanguages.

Also of note is XsltLanguage. Instead of being strictly navigational in getting from one element to another, it can address the XML it is transforming via complex queries (named "select", in fact).


Is this the same as SourceCodeInDatabase?

Not really. This is about viewing programs and/or their run-time image as a database, or at least a complex data structure.


Although this topic looks like my handiwork, I did not create it (IIRC). (I mention this because I have been accused of creating too many TableOrientedProgramming topics.) -- top

In defense of TopMind, I created this page. -- IanKjos


Aren't databases typically considered as stores of "facts", valid for some unspecified interval of time, while programs have a strong time component, both in their sequence of execution of step, and the event-driven nature of asynchronous processing? I would initially assume that databases would have difficulty in modeling this time component. -- PeteHardie


We developed a iconic/desktop programming language/integrated development environment in the early 1990s. It used a relational structure for storing the language structures. We had around 100 classes and instances of the classes were stored in flat files as records. We were also able to export these (and later import) as C++ global variables. http://w3.isis.vanderbilt.edu/OOPSLA2K1/Papers/Carlson.pdf


See also IntentionalProgramming, HiddenDatabaseSyndrome, AdvantagesOfExposingRunTimeEngine, GreencoddsTenthRuleOfProgramming


EditText of this page (last edited April 26, 2014) or FindPage with title or text search