Literate Programming

Introductions:

Literate Programming is a programming methodology in which a program is its own documentation.

See http://www.literateprogramming.com/ for a lot of background.

(These days, it's also called "BehaviorDrivenDevelopment"...)

BehaviorDrivenDevelopment and Literate Programming are actually two very different things. In BDD, you use a bunch of human-readable machine-executed tests to guide development and know when you are feature-complete. Literate Programming isn't concerned with that kind of testing at all. Rather, its emphasis is on creating a human-readable document that explains what the problem is and how it is solved, with the code itself integrated into the explanation. -- DanPuckett?

It is a way to organize code so that it matches the way humans think and explain and present ideas, rather than matching the way a compiler expects ideas to be presented to it.

From one source you can produce both the documentation and the executable code. A good example of what can be done with this methodology can be seen in the book TexTheProgram.

Note that "documentation" means a description of the implementation, not a user manual. For example, the TeXBook is the user manual for TeX, whereas TeX the Program describes (and contains) the implementation.

[ "a programming product ... is a program that can be run, tested, repaired, and extended by anybody. It is usable in many operating environments, for many sets of data." -- Frederick P. Brooks, in the book The Mythical Man-Month. So how does one test/repair/extend ("maintain" ?) a program ? I assume by "description of the implementation" you mean documentation explaining how a maintainer can do those things. ]

Literate Programming is also the name of DonKnuth's book (ISBN 0-937-07381-4 ) which describes the methodology.

One speculation for the reason behind Knuth's pushing of LP is that according to Stanford's intellectual property policy, Stanford would have owned all of Knuth's code, but not his published writing. So the answer's simple: make the code part of the document, and while you're at it, be sure to minimize the appearance and importance of the code itself. Ensuring the document outstrips the code by two to one or more is a great start. One of the tenets of ExtremeProgramming (which I claim is also highly overrated) is to use meaningful identifiers and sensibly factored code, in order to make it self documenting. LP isn't useless; it's really useful for embedding executable code samples when combined with an interactive editor and interpreter, e.g., Elmer in the EeLanguage. (The language is E, I'm just forced to write a god-awful WikiWord.) I just think Knuth was more interested in producing books that just happened to contain code samples, being that LP is really the antithesis of SelfDocumentingCode...

Issues:

(In what sense can LP be called "methodology"? This _simple_ idea of mixing "code" and "documentation", whatever this dubious distinction meant to Prof Knuth, is hardly ever used in ways indicative of any insight into SDLC, particularly, or SE in general, IMHO! A clip I saw out of the Professor's book (I don't have the book) was even rather, eh, infantile, meaning, the "literacy" of it was almost "lip service" like, a trivial macro language like trick, I mean what's the point?! And the "glue" text between those macros seemed more like a "stream of consciousness" narrative, ugh! Look, Wiki combined with an LP mechanism can really do things, solve practical difficulties (ie, software engineering sense) of what "code" means to people, and how it's being used. I suspect much of what's been done with LP is a disservice to the concept. IMHO, please.)


CharlesSimonyi states that with (his) IntentionalProgramming system, LiterateProgramming "will become practical".


IMO, literate programming emphasises a kind of documentation that is often forgotten in these times of document extraction tools: the "why" domain. Between the code with its interfaces ("what this is", handled by documentation extractors) and outside-look documentation like tutorials and the like, there is need for code commentary. How are the data structures used, what kind of techniques were considered but _not_ taken in use, why does a particular passage of code look weird, what are the cases to be taken into account, in what situation is a particular method expected to be used. That kind of thing.

Often, when you ponder a problem and attack it with code, you make notes and code simultaneously, intermingled. Why not put those in the same file, so you can sometime change some note into code or vice versa? In this way, literate programming is on the same level of being a "methodology" as TestDrivenDevelopment. By the way, UnitTests are a good example of code that would benefit of literate programming. -- PanuKalliokoski


If I recall correctly, the original compiler for TeX was a _Standard_ Pascal compiler, which is just horrible to code in since you are enforced by the language to provide your code in a way which is quite non-intuitive seen from today, namely strictly organized in sequence (labels-d?-types-variables-procedures-functions), and which had extremely bad StringHandling?. So basically Knuth wrote a preprocessor which processed his writings into standard pascal. His preprocessor was just on a grander scale than usually seen. The Literate Programming is just his personal way of writing code for books :) I tried it some years ago, and the verbose commenting just does not flow well for me. -- ThorbjoernRavnAndersen?


Re: "Note that "documentation" means a description of the implementation, not a user manual."... I was reading the manual of a project today, of the "instructive examples" kind (not the "reference" kind); it was horribly out of sync with reality, as usual. I got to wondering: why not provide an executable example? A lot of articles provide downloadable code, but I mean something that does exactly what's in the document, so the manual can be tested as being accurate.

The literate systems I've come across represent a single point in time (the finished code), this is slightly different as it represents the evolution of the code. The documentation would be interleaved with script and diffs, which would be rendered as code blocks or hidden depending on tags in the documentation; 'executing' the document would run shell commands, and apply patches. Partial execution should work too, so you can edit in changes in the process, and try the examples at each stage in the document. In noweb terms, there would be no cross references in the code: the documentation would refer to patch chunks instead; and unlike noweb, the same line of code may appear differently several times as it is edited. Anyone seen a system like this? -- BrianEwins


LiterateProgramming Examples

Implementations:

Check this out: https://github.com/RDFLib

Note how the webpages have "source code" intermingled with the (generated) HTML rendering the pages, all generated from a SemanticWeb describing rdflib -- I think. Yep. All the various bits of data including code are written in what's being called hypercode -- more documentation on the concept in the works -- in the mean time crack into the source ;)

Another example of LiterateProgramming is the lcc retargetable ANSI C compiler by Fraser and Hanson. Like Knuth's MetafontTheProgram? and TexTheProgram, a hardcover book was generated from the source code. See http://www.cs.princeton.edu/software/lcc/ and ISBN 0-8053-1670-1 -- TobyThain


NormanWalsh? (of DocBook fame) proposed to use the namespace feature of the ExtensibleMarkupLanguage to mix Code and Documentation. See http://nwalsh.com/docs/articles/xml2002/lp/paper.html


There is a similar approach named LiterateProgramming: writing such API, that code using this API looks like a text in NaturalLanguage (more than usual code does). An example of this is JayMock? (http://www.jmock.org/); the by-effect is that class and object structure of jMock looks unclear, and when understood, appears to be quite strange.

Actually, JMock and similar styles collectively go by the name of FluentInterface?s, not LiterateProgramming. LP is a well-known concept. Overloading LP to mean something other than what Knuth intended is a disasterously bad idea -- we already have many too many ultra-overloaded terms in the computer industry. Let's not compound the problem, please.


Except for programming authors do not typically write something, and then write another document, in another language, to explain what the first writing meant. Why should programming be different? A program should be written in a style that readers can understand.

If a programmer cannot write a readable program, what makes everyone think they can write a document to make it clear? Moreover, who needs to understand a program who cannot read a program? The scientists who's results depend on code, the accountant who needs to tell if "credit" and "debit" are added together correctly for counterintuitive accounts (ıe. the reversal between internal and external accounts of the same type). In general, people who care about the correctness of the results in the application domain. Good programming does not imply good accounting, good chemistry, good fluid dynamics, or even good looks: these are all things that will need external review. Unless, of course, you live and die by a WaterFall development model where these are all required to be "in the spec". The programmer should document not the code, but what the code fulfills in the application domain. This translation is the basis for trusting an application.

Additionally, user interfaces should be self documenting. Although, there are reasons to make x-for-dummies and CliffsNotes? type documents.

-- EdwinEarlRoss


From http://www.literateprogramming.com/ Knuth speaks ( and does so very well):

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: "Literate Programming."

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.


Links:


See: CodeOrdering, DocumentationBeyondTheSourceCode, LiterateProgrammingBibliography, LiterateProgrammingIdeas, SelfDocumentingCode, TheSourceCodeIsTheDesign, CommentsAreCode, LiterateProgrammingTools, ElementalProgramming, LiterateProgrammingAndTheSemanticWeb.


See also HereDocument, EmbeddedDocument


CategoryBook, CategoryTex, CategoryDocumentation, CategoryCodingIssues, CategoryLiterateProgramming


EditText of this page (last edited January 31, 2014) or FindPage with title or text search

Wikibase