Literate Programming Ideas

(from LiterateProgramming)

I believe Knuth's original goal also included printed output. (Certainly all the fine work on TeX and derivatives would argue that point!) Is this component of literate programming becoming obsolete?

Certainly, Knuth's philosophy includes a linearizing theme, an assumption that code and documentation are both interleaved and sequential. The Nelsonian system you are using to read this message is non-sequential, and it contains no code. Would merely adding a programming language to this documentation language produce "literate programming?"

I think it is time to move beyond literate programming, while acknowledging its wonderful contribution. I think programming documentation should embody:

  1. Conceptual integrity: the documentation for a thing must be on the same conceptual level as that thing.
  2. Constant accuracy: the documentation for a thing must constantly and accurately describe that thing.
  3. Accessibility: the documentation for a thing must be accessible, by creators, their peers, re-users, reviewers, end-user documentors, and the merely curious.
  4. Measurability: the documentation for a thing must be measurable, quantitatively and especially qualitatively.
Literate programming, as conceived by Knuth, wonderfully addressed 1 and 2, but left much to be desired with 3 and 4.

Hypermedia holds the promise of addressing 3, which Barbara and I have expanded in some detail and implemented in Smalltalk.[http://www.bytesmiths.com/pubs/9506ManagingDocs.html] Although our solution is original, it is hardly unique -- there is lots of discussion about hyper-literate programming these days.

But the remaining problem is number 4; we can measure the quantity of documentation, but that is often inversely proportional to its quality!

Getting back to the site topic, I think there may be hope in applying patterns to measuring the quality of documentation. Ideas? -- JanSteinman


CWeb doesn't have to be inaccessible or on paper. It's been recently extended to output PDF, and will probably generate HTML sometime. The PDF output contains hyperlinks, bookmarks, etc. I've recently been reading http://www.literateprogramming.com/adventure.pdf, a CWeb'ed translation of the original Adventure. -- AlanSHutko


I remind readers that this site is itself written as a literate program in an extended version of itself called HyperPerl. I believe this implementation completely satisfies Jan's Accessibility requirement. I'll also point out that nobody has yet used the wiki features of HyperPerl to contribute a bug fix or extension. -- WardCunningham

(1) Thanks for pointing this out. Nice approach. (2) Why would anyone modify a program with that sort of licence? -- AndyGlew


Thank you for saying, "Knuth's philosophy includes a linearizing theme, an assumption that code and documentation are both interleaved and sequential. The Nelsonian system you are using to read this message is non-sequential, and it contains no code."

The linearizing aspect of Knuth's examples of literate programming have always bother the heck out of me. I felt he had just written the program where the default was comment without delimiters (i.e. most just changed the syntax for the compiler). I have seen other systems, particularly IBM's FAPL executable specification language for SNA, which also produced both readable documents and compilable code. Also linear.

I feel that in literate programming, the problem and its should be described in the manner best suited to the problem and the intended readership. That means non-linear, in a compiler's sense. Fragments of procedures presented out of order, procedures presented in any order, data structures introduced in any order. Perhaps objects give us a handle on the non-linearization, because they can be created in any order, and methods are very short. Webbed descriptions could also be fine, but of course people do read one sentence at a time, and when you convert to paper there is an order to the paper. But that order should be ordered to the reader, not the compiler (until it is time to compile)!

-- AlistairCockburn


I think people are misunderstanding ordering in Knuth's LiterateProgramming; it says that you write the program in the order it makes sense as a story. Knuth's premise is that the programmer should not be constrained by the order that the compiler expects, since this is typically not the order that we think in. A Web file is treated as orderless; all expressions are evaluated simultaneously. This means that usage of a function (or anything) can occur before any declaration. I am no expert, but I have written a few CWeb-like programs (although not in the original CWeb or Pascal). I have also written a few JavaWeb? programs, and the Web preprocessor (for lack of better terminology) makes an excellent adjunct to the preprocessor-less Java world.

-- JasonNordwick?

As Knuth has emphasized, the real philosophy behind literate programming is that programs should not be written primarily for computers. The primary purpose of a a computer program is to explain itself to its readers (including the author).

Literate programming tools allow code to be written and read in a human-friendly format, and later transformed into a computer-friendly format. In this sense, literate programming is another step along the path from machine code to assembly languages to higher-level languages.


"But that order should be ordered to the reader..."; This is exactly the responsibility of the author. The author of a literate program is a storyteller, making sense, as do all good stories, of the complex, mysterious, and frightening.

I don't take my kids to the deep dark woods and tell them, "Just wander around and see what you find," I tell them stories. A literate programmer should do the same. It is as if, for the reader, the storyteller is sitting right there saying, "Let me tell you about the program."

When I take my kids to the woods, I also tell them to just wander around. Kids are really GoodAtLookingAround. -- JimCoplien
This is not easy to do. As the writer you have to make assumptions about your audience- what questions they will be asking and in which order and what will constitute a good explanation.

Because writing literate programs is hard to do, and because you will have different kinds of readers, sometimes you will miss the mark. That's no reason to completely abdicate your responsibility to your readers, even if that abdication is wrapped in the techno-pacifier of HyperText. -- KentBeck


With some carefully constructed templates for certain tasks, and a little pre-processing, I find I can feed my Literate Programs (and I do virtually everything significant that way, since it helps me think better about the code) to hypertext-style index generators (a.k.a. "documentation generators", which I think is a dangerously misleading term). My readers therefore have both approaches available for my code: the original story in context, and the ability to zap around at random. -- Simon Clift


I second everything Kent said. Perhaps my misunderstanding about Knuth's writings, but the literate programs of his I read looked like the program was still sequenced for his compiler, with lots of English written around it. That meant the English read to me like it was sequenced for his compiler. I should like to see an example sequenced for me, with the pre-compiler so adapted as to straighten the code out for the compiler. -- AlistairCockburn

I think you misunderstand Knuth's concept of literate programming. An absolutely fundamental concept of literate programming is that the program is sequenced for the human reader and not for the compiler. Most literate programmers consider systems that do not reorder code to not be true literate programming systems. You may not approve of the order of presentation in TexTheProgram, but it certainly very different from the sequence in which the compiler sees the code. For more detail, see Knuth's original article "Literate Programming" in his book of the same name, especially the section "Programs as Webs". -- DanSchmidt

Consider a tour through the HyperPerl version of Wiki (http://c2.com/cgi/wikibase?WikiInHyperPerl). It seems to me to be quite well sequenced for reading. Ward also provides a script to flatten HyperPerl out into straight Perl. The result seems to be sequenced more for the machine, and provides a distinctly different reading experience. -- DaveSmith


I haven't looked very closely at TexTheProgram, but I have to agree that starting with the lexical analyzer isn't a very good idea. But hey, what do you expect from the first one to do it?

Just yesterday I was hacking on a small literate (NoWeb) program, and found the structure of the program-as-text (HTML, at that), to be at least noticeably different from the program-as-code. Part of this comes from attempting to apply CodeUnitTestFirst. When you apply this naively, you end up with sequences of test and code interspersed. Still may not be exactly easy to read, but it does give a better understanding of the program's evolution, and to some degree it breaks the program up into functional behaviors even if those behaviors are spread throughout the code.

For example, my next "unit test" was to add line numbers to the output of my program to indicate where the match occurred. Even though tracking the line numbers and printing them out live in somewhat different places in the code, the literate program puts all that stuff together.

Now, if only NoWeb could handle putting multiple source files into separate HTML files and still have them indexed....

A student of mine, William Josephson, did some very interesting preliminary work on this problem. I keep hoping he will get it ready to distribute before he graduates. -- NormanRamsey

-- BillTrost


Isn't literate programming kind of anti-ExtremeProgramming? When your code is getting constantly refactored and written to communicate well, available in a browser, and has tests that serve as examples, you might be better off just describing the system architecture, a few patterns, and perhaps a how to read this program and away you go. I found Knuth's examples hard to read, but the Hanson and Fraser compiler book was a good example. Kernighan and Pike's UnixSystemProgramming? also contains excellent examples of code exposition, particular their example of hoc, an ad-hoc calculator. (PS. I wrote up some observations about programming and documentation in a paper[http://choices.cs.uiuc.edu/sane/home.html#readexpect]). -- AamodSane


See also SelfDocumentingCode.


For me, LiterateProgramming is more about SelfCodingDocuments? than SelfDocumentingCode! It seems (to me) to be about writing a mathematical and scholarly essay about solving a given problem, and then letting a program generate (1) running code and (2) a paper about the problem and its solution.

-- DickBotting


LiterateProgramming may be anti-XP in some ways, but some of the motivation is very similar. For instance, one benefit of PairProgramming is that you have to explain everything. This greatly reduces the amount of fuzzy thinking you can get away with, and helps to eliminate bugs. (This is why RubberDuckDebugging? works.) One of the main points of LiterateProgramming is that as you go you're explaining what you're doing for the benefit of an (unspecified) future reader. Same point, no? I think there's room for both methods. I might even advocate LiterateProgramming as a possible part of ExtremeProgrammingForOne, though I'm not expert enough to know whether it could work well. -- GarethMcCaughan


In one article [http://www.literateprogramming.com/knuthweb.pdf] (kudos to TomLeylan for digging up this paper, initially quoted from memory) DonaldKnuth mentions that he deliberately chose the name LiterateProgramming in reaction to the name StructuredProgramming (which at the time was used, I believe, to hype the family to which PascalLanguage belongs).

He resented the slur implied in the StructuredProgramming label : nobody, he said, would admit to writing an unstructured program. So in a one-up play, he came up with LiterateProgramming because nobody would admit to writing an illiterate program.

(Why hasn't this worked for FunctionalProgramming? Nobody would admit to writing a non-functional program, would they?)


Adding comments is not at all XP, but there are some other aspects which are:

Knuth's journal article on LiterateProgramming mentions that the programmer should choose his variable names with a thesaurus in his hand, not resting until he has found the name best reflecting its purpose and function (citing out of memory, oh boy). That certainly sounds like XP to me.

And writing an article means rewriting it until you like it -- which is a lot like refactoring.

-- ArieVanDeursen


LiterateProgramming consists of a source language, called WEB, and two tools, called TANGLE and WEAVE. TANGLE takes the WEB source and rearranges it for presentation to a Pascal compiler, stripping out the comments. WEAVE takes the WEB source and produces a TeX document with table of contents and index, that includes all the documentation, formatted beautifully, along with the code, which appears more or less as it was written in the WEB source.

There is also the language CWEB and its two tools, CTANGLE and CWEAVE. CWEB is based on C and CTANGLE produces C or C++ output instead of Pascal.

In source form, a CWEB (or WEB) program (or document) consists of named sections. Most of these sections are small and should contain a few lines of code. You can elevate any of them you like into chapter headings. CWEAVE numbers the sections and produces a table of contents (using the chapter headings) and an index. The index contains all the non-trivial identifiers in the program!

CWEAVE formats the sections in the order you typed them, but CTANGLE can rearrange them according to your specifications. The code in any section can "include" code from other sections without actually stating that code on the spot. Think of each section as a long-named, parameterless, single-use macro which you can use before, or after, it is defined. This allows you to present the code in one order, while the compiler sees it in another. So there's no reason you can't produce :chapter 1: overview," with actual code in it, and then produce "chapter 2: the lexical analyzer." A reader can use the table of contents to skip the chapters that bore him in the woven TeX file. For the compiler, CTANGLE can embed the lexical analyzer in the middle of the code from the overview.

I was intrigued by Literate Programming, but I found it difficult, because I had to debug not only the program but the documentation; sometimes it wouldn't format correctly and needed help. (Probably I was trying to use the latest C++ features with an old version of CWEB.) The power of reordering was enough to inspire me to invent the BuildSyntax. --EdwardKiser

See also EnhancedCweb


I find it interesting that much of this discussion of LiterateProgramming has talked about code ordering. I believe that CodeOrdering is a generic issue, common to many languages and programming systems - such as EdwardKiser's BuildSyntax, described above, VLSI hardware description languages, and AspectOrientedProgramming.

-- AndyGlew


CategoryDocumentation CategoryLiterateProgramming


EditText of this page (last edited September 26, 2010) or FindPage with title or text search