Documentation Beyond The Source Code

Problem: Your project needs documentation.

Considerations:

You don't want a lot of documentation
You know you need some

How do you find the middle ground?

With compiler support for longer variable names, it's more feasible to reduce the amount of documentation. If I can name a routine, PrintSalaryReport, it seems redundant to add a comment, "print salary report". Anyway, with the longer variable names, the comment will probably scroll off the end of the screen.

The responsibility of the class does not sit in the code, yet it is part of the design. Without it you can't run the ResponsibilityAlignment test. -- Alistair

Seems you're suggesting that the responsibility is / needs to be in some document. The C3 folks are all the time assessing and changing the names and responsibilities of classes, using what seems to me to be ResponsibilityAlignment. Yet we have no document saying what responsibility "should" be. What is it that we're doing? -- RonJeffries

Ron, you sneakily snuck that "should" in there. Alistair's comment says that the resposibility does not sit in the code, so he's wanting some document that records what the responsibilities are. That's are. As in "are now".

Presumably you would maintain that the responsibilities are recorded in the code. Could you give an example of a realistic class that has no "extras", i.e., nothing that isn't there in order to support some functionality, but which still clearly states its responsibilities?

I love ExtremeProgramming. It really stress tests any assertions I have ever made and challenges any way I have developed software. E.g., the test-code-design cycle. Counter-examples are the bread and butter of explainers of all stripes (would-be scientists and philosophers included). That little venting out of the way...

As I understand XP and C3, and have seen Ron and Kent do, there is no "responsibility statement". The code just is, right there in front of you. Instead of saying, "What should this class do?",

you say, "Do I like the fine-grained structure of this little bit of code?"
Then you probably say, No, and refactor a smidgen.
Then, when you are almost happy, you ask, "What is this almost-right code actually doing?"
When you answer that, you say, "Oh! That's what it's doing. In that case, the name should be xxxxx."
And then the details of the services, which spoke themselves into a subliminal responsibility statement, align with the the name.

If that is more-or-less true, then the plus and critique of the process is evident.

The plus is that you save the writing and rewriting of the responsibility statement, which has probably gotten out of sync with the current code.
The critique is that by not announcing the intended responsibility, you make it a little harder for the next programmer to get going on the process. The responsibility statement is never spoken aloud. The programmers must do the full cycle to get the full benefit. [section moved to HighDisciplineMethodology] -- AlistairCockburn

Has anyone tried Object International's TogetherJ (WikiName TogetherJava)? Unlike RationalRose, TogetherJ actually keeps code and graphical representation in sync at all times. If you remove a reference to another class in the source pane, the appropriate line disappears in the UnifiedModelingLanguage pane. It seems like you could keep your UML diagrams in sync with your code with essentially zero additional effort.

That said, the only UML diagrams I've ever found to be useful are ones that omit virtually all of the detail of the classes, leaving only the part that I want to focus on at a particular moment. TogetherJ painlessly reverse-engineered all of JavaSwing for me within a minute or two, but the diagram it produced was so complicated that it didn't add anything to my understanding of Swing. --JohnBrewer

I can certainly see why we want the source code to be the design, but I'm not convinced it is all of the design. For example, for folks using file-based programming environments and their languages, there is some significant "design" work in how the system executable(s) and/or libraries are built (constructed, generated, and derived) from the source code. But you likely won't see much of that indicated in the source code of the program itself. Some folks would call this the physical (rather than logical) architecture of the system. It is also sometimes called "code architecture" (Does that mean the directory tree is the code architecture?)

In the case of compiling and linking the source code via a program like make(1) [MakeProgram], I think you can argue that the Makefile (and any associated scripts) are the source code, and so the Makefiles are the (build) design (or at least they should be ;-). Would others agree with that?

The problem is, I find I need to use more comments in my Makefiles. The make(1) language/syntax seems a bit less self-documenting or self-evident than say, SmalltalkLanguage, or LispLanguage, or CeePlusPlus or CeeLanguage. I would probably rank it somewhere between C and AssemblyLanguage. OTOH - EiffelLanguage has something called LACE which is closer to being at the same level as Eiffel. VisualWorks has packages and parcels, which are still somewhat low-level, but I think they can just (barely) manage to be self documenting. Smalltalk/Envy on the other hand is much nicer.

-- Brad Appleton

I don't see what the "level" of Makefiles has to do with it. You check your Makefiles into RCS, right? And you wouldn't dare distribute the source code without its Makefile (that reminds me, I need to put a pointer to RecursiveMakeConsideredHarmful). When I do LiterateProgramming using NoWeb, the Makefile is part of the literate program -- which, I admit, can make bootstrapping a bit challenging.... -- BillTrost
I wasn't referring to the "level" of Makefiles, but of the make syntax itself. I believe it is not nearly as self-explanatory as C++ or Java or Smalltalk or PerlLanguage or Lisp. It looks a bit closer to assembly IMHO (especially when you use "targets" to factor out common things, almost like a subroutine). So I think it would require more comments than Ron or Michael would recommend using in corresponding Smalltalk or C++ code because of this. It seems to me that most of the "code is the design" discussion is referring to fairly minimal use of comments. I think one reason we can even talk about having the code be the design in this manner is that we all seem to be implying HighLevelLanguages for our programming. Would we be as adamant about this if we were talking about raw assembly or machine code? I suspect that we could still make the same argument for assembly, but we would probably demand more in the way of comments. Am I incorrect in thinking this? -- BradAppleton
I think you are correct. I'm a chip designer, working with HDL instead of schematics. Which makes me a programmer I guess. However, the tools and "languages" I need to use are so heterogenous, incompatible in terms of data interchange, and less than robust that they may as well be low-level languages. As a result, I spend a huge amount of time writing Makefiles, and Perl widgets in order to hold it all together. In a recent chip with around 50,000 lines of what would normally be recognized as "source", we had to construct over 10,000 lines of "Bolt It Together Stuff". Those "BITS" all had to be designed and configuration managed as lovingly as the chip proper. To me, it's all source. (mailto:tommyk@verilab.com)
Perhaps Makefiles & directory structures correspond to manufacturing instructions in hardware engineering: they're just as necessary, and related to but separate from the design of the artifact itself. There's a special set of skills to do them well, and so larger organizations might hand them over to specialists. At the same time, designs which can't be easily manufactured are bad, like programs that are slow or hard to link because of massive dependencies, etc. Just an idea: I like the analogy a lot. -- MartinPool

This conversation about whether the source code is the design seems confused between what a design is and what a design does.

The important point to remember is that source-code and binary-code are, for practical considerations, the same thing. There is a direct relationship between the two forms of the code. They represent the system. If the source is wrong, the resulting binary is wrong in exactly the same way. The compiler assures they both represent the exact same thing, but in different languages.

A design is a set of documents that describes what the system should do in the eyes of the customer. A design starts at a high level and adds detail until it describes the necessities to produce the desired system.

A programmer translates the information in the design into some programming language (source code). This act is the implementation of the design. There is no guarantee that the implementation matches the design. Any number of reasons can cause this. However, this does not mean the design changed, it means the programmed did not follow the design.

The popular lightweight methodologies (XP, RUP-Light, etc.) do not seem to ignore design, but instead to ignore documenting the design outside of the source code. This is fine if the source code actually implements the intent. However, without a higher-level design document, there is no way to verify the source code meets the design requirements. I.e., the source code exactly documents the implementation, but the implementation does not necessarily match the design.

A design is a collection of documents. A design does describe what the system is expected to do. A design uses various levels of detail to make this description meaningful to persons with varying depths of technical knowledge.

Source code, on the other hand, merely precisely documents the current implementation.

-- ChuckAndersen

Chuck, you seem to be saying much the same as me, that uncertainty of process dynamically delineates product from design. However, I am not seeing the light on separating what design is from what it does as you have suggested. Describing what something does is often the best definition of what it is, also known as operational definition. Further, your definition of design sounds like my definition of specification. How do you separate those two? -- WaldenMathews

"XP does address design verification. Put the specification/design requirements into your TestCases. Your test cases are also a part of the design. -- BinhDang?"

I think I agree with you, but am using different words. I am simply saying that the design encompasses much more than just the details of the final implementation. I choose to separate the design from the implementation and I consider the source code part of the implementation. To me, the design covers everything up to the implementation. In the old days, we had terms like functional design, detailed design, detailed specification, program specification, etc. These are all part of design. They are just different abstractions of design intended for different audiences. They are all important pieces in the progression toward a working system. -- ChuckAndersen

Suppose you visit NASA and you ask to see the design for a Saturn V. The handtruck full of documents comes in and you are lost. Then someone comes up with a one-page breakdown of the major components. You are back on track again. Does that mean that the one-pager is the design and the handtruck contents aren't? No. It just means that there are many views into the design.

First, let's step back and ask why you would want to see "the design." In truth, there is probably some piece of information that you actually want and you are hoping the design documents have that piece of information within them. When something goes wrong with a rocket, what do you do? You ignore the design documents and look at the actual rocket itself. When something goes wrong with software, what do we do? We ignore the design documents and look at the source code itself. The basic questions are: What are the costs associated with generating DesignDocumentation beyond the source code? and What are the benefits associated with generating design documentation beyond the source code? -- WayneMack

I have a question regarding TheSourceCodeIsTheDesign. If I make designs using RationalRose or a pencil or some such tool, I can put down information that will not make it to the source code. Or it will only be in it in a very indirect form. Example: a one-to-many relation is translated in a JavaLanguage Hashtable or some such. (To be fair, I can put that information in a comment. But we all know how comments are maintained. They aren't.)

So... if the source code is the design, how does it contain the extra information -- if it does it at all? If you don't have a separate design document, don't you lose very important information?

I take this as a "damned if you do, damned if you don't" situation, and am prepared to simply suffer until I learn a realistic solution. So I only ask people to record the smallest amount I dare (that amount changes from situation to situation). I typically get a fraction of what is needed, and it typically doesn't get updated. But there are bits to work from.

If this sounds weak, I still feel it is better than the two alternatives, both of which have outspoken fans. One is to write masses of stuff in the belief some percentage of it will survive. The other is write nothing, in the belief that the code contains enough. Both are arguable, both are doable. All three approaches lead to different suffering, in my view.

-- AlistairCockburn

Ultimately I suggest that it comes down to what is used (not just what survives). My recommendation would be to create all the documents that will really be used, and no others. My experience says that collection of documents is much closer to empty than some experts seem to expect. That's why your approach works so well, IMO.

The ExtremeWay, of course, is to document nothing until its absence hurts, then do it. (The Gallo approach to documentation?) -- RonJeffries

How I do it (as a team leader/project manager): I don't do design myself. I just ask people to document whatever they feel that needs to be documented outside the source code. The commented source code and the (informal) additional documents which are created usually allow a new guy (average or less), with only a few hints from other team members start working effectively very fast - max. two weeks on any project I've done so far (verified in everyday practice). This approach, judging by my experience, allows you not to have anyone knowing the entire project and still being able to maintain the code, even putting some guy with little or no knowledge of a project on maintenance tasks - he'll always find the information he needs very fast.

TheSourceCodeIsTheDesign does not state: Do not document your code or make auxiliary documentation. It just says that the source is the design that goes to the manufacturer (the compiler). It is all that the compiler or interpreter needs. All that other documentation is for people who do not have to make all the ones and zeros in the software (thank goodness). And, if the other people who need to know about the design can actually read the source and understand it, so much the better. -- MichaelFeathers

I am working on a fairly large system and I am intrigued by the notion of TheSourceCodeIsTheDesign. At the very least I have seen examples where source code is a big part of design (BertrandMeyer expects EiffelLanguage to be used that way). For a large system, however, I still need BigPictures.

Or, maybe what I really need is a single big picture (something vaguely like what is described in PicturesAsCompression). Right now, I am trying to capture the overall high level design (and architecture) of the system in one large picture. I want to create a blueprint: Pick a section, look up the related source code and give it a read.

This blueprint shows all of the major components of the system and how they fit together. This is not a reverse-engineered diagram. This is not UML. The blueprint conveys more. It shows something that code alone cannot (abstractions and concepts that exist in the designer/user mind and perhaps perceived structures manifested in the running system).

As for the learner to sit at a master's shoulder and watch while a real problem is solved. Next is something like a PatternLanguage describing the design rationale. Next is a TechnicalMemo describing a part of the system, perhaps in LiterateProgramming form. Next is just digging in and trying to do something with another learner. Last is digging in alone. This gives you a spectrum of investment/return tradeoffs. But it is a hard problem: the learner is trying to compress 1 2 3 years of high-capacity learning into weeks or months. It can't possibly go as quickly or as smoothly as you would like. -- KentBeck

It's not that there is no responsibility statement. The code and its class comment take on that task for you. As for "announcing the intended responsibility," let us not forget that YouArentGonnaNeedIt. Any additional responsibility not documented by the code itself, whether implied, intended, or hinted at, is in fact not the responsibility of this object right now.

It is also in error to believe that the "responsibility statement is never spoken aloud". If you are adhering to ExtremeProgramming then you have a person sitting next to you to answer those questions you are asking yourself. If two of us create this object today and I pair with you tomorrow I am going to tell you what our thoughts were, and you are going to tell me where we went wrong, then together you and I will fix it. The day after tomorrow you and I will both tell someone else what our thoughts were and possibly fix it again until the code either speaks for itself or is removed from the system as unacceptably complex. -- DonWells

I didn't realize there is a class comment. I thought there are none. Would you be able to show a sample class comment? Thanks -- Alistair

From the VCAPS project:

The Replicator is patterned after the visitor pattern. We chose not to put methods on all the domain objects because of all the trouble with multiple versions of classes we already have. Recompiling the GemStone database yet again would not be helpful and takes a very long time. We add replicate: to all the domain objects by adding it to Object.

The serialized issue is now in Topaz format. We changed to this format because the Topaz parser is much faster than that tick mark parser.

You will note that we have documented only things that are not obvious from inspection of the source code. To be fair, not many classes have class comments, only the ones that have issues requiring documentation beyond the source code. -- DonWells

In C3 there are almost NO class comments. We try to have the class name and position in the hierarchy tell us everything we need to know. As the project spins down, however, the customer is considering having deeper documentation written. I'd consider that reasonable ... with the program in maintenance, there won't be as many minds, or as much study, to keep things fresh in developer heads. Then (and only then) they will need to be on paper or in code comments. -- RonJeffries

I think we need to ask WhatIsTheSourceCode. The opening paragraph of this thread presupposes an unhelpful answer. Let me explain. My experience is that each time I find myself picking up a pencil (or a whiteboard marker), I am experiencing a shortcoming of my development environment. By construction, I have an idea -- often an aspect of design -- that is difficult enough to express within the system that I feel compelled to jump outside it. And so, to appropriate one of Kent's ideas, the development system is telling me something about itself. The separation of RationalRose and all the tools like them is that they are viewed, even by themselves and their creators, as separate from the system, when they are not. Perhaps we should listen more attentively to what our development systems tell us about themselves.

I'm old enough to remember when compilers were new-fangled, and developers didn't trust them. The only "real" code was machine code, or perhaps its assembler-code equivalent. In that era, in the shops where I worked, CeeLanguage and PascalLanguage were viewed as programmer shortcuts to the "real" thing, and the resulting assembler-code was archived and versioned.

It took a long time and hard work for the development community to integrate the compiler into our perception of "the system", and to view the compiler input as the "source".

I argue that if we believe that objects are real, that persistent objects are real, and that patterns work, then the design (meaning the expression of the patterns and their combinations that I have modeled the solution with) is the source. So I propose to invert the XP phrase: instead of saying "TheSourceCodeIsTheDesign", I offer "TheDesignIsTheSourceCode".

The most important decision I, as a designer, make is to model a particular situation with, to choose a trite example, a stack. Once I have made that decision, the "source code" binding in a particular language is relatively obvious. The same classes and methods recur each time I apply a stack.

Therefore, I propose that a "good" development system will let me denote my choice of "stack", and will then help me express that choice in a particular language on a particular platform. Yes, I need wormholes to cover tool shortcomings (especially in the early years) just as I need ASM directives in a good compiler if I'm doing time-critical work. I find that certain design "shapes" recur over and over; I call these "patterns".

So my argument is that we should build systems that let us express and version our design choices, in the form of a non-planar directed graph (for example). We can then traverse that graph to collect source code whenever we need it for a particular compiler. Eventually, our need for compilers will evaporate (just as our need for assemblers has disappeared) as our "source code" migrates to these DirectedGraphs of PersistentObjects.

Now, TheDesignIsTheSourceCode.

-- TomStambaugh

SelfDocumentingCode and TheSourceCodeIsTheDesign are solutions. But solutions to which problems? I think the same problems that lead to DesignPatterns. I need DocumentationPatterns. -- Gerd Castan

The documentation issue really comes down to cost versus value. Does the value of the documentation exceed the cost of creating it? Too much documentation is done merely because we have always done, and rarely justified based on its value. Most documentation never gets read after the proofreader is done with it. -- WayneMack

Isn't the problem simple. Humans are limited. If we were really clever then we would be able emit an accurate stream of binary digits directly from conversations with our clients. If we want a TheSourceCodeIsTheDesign solution, then we need to have source code that clients can read and understand (perhaps even 'compiling' it into pictures for them). My personal feeling is that we have a set of source languages that are chosen by programmers, and that read like shorthand script. Why not have a 'longhand' syntax for each of the languages that reads more like a NaturalLanguage. You could 'compile' to it using a documentation tool. You could even 'compile' to different natural languages. -- NeilWilson

from TheSourceCodeIsTheDesign

Hmm, So what if someone besides a programmer who understands the language the code is written in wants to review the design? -- StevenNewton

Tough. If this exercise will generate profit, then it can pay for a translator to render the code into the format the reviewer understands. Otherwise, we shan't screw with our entire methodology to add a "documentation layer" on the outside chance that we might someday need to satisfy an unqualified reviewer's curiosity. -- PCP

I think I agree with the above statement, but a more diplomatic rendering might be, "What benefit would result from having this type of review? What kind of review is being proposed? Do you really think someone can understand a 'design' but not the code?"

How does the idea of TheSourceCodeIsTheDesign hold up after a SalmonMousse incident? I can imagine it being a problem. -- KenMegill

I can imagine a SalmonMousse incident being a problem, and not only for the people involved, regardless of how correct the intuition that TheSourceCodeIsTheDesign turns out to be. If you lose all of the development team, the project is in trouble.

Also, to be fair, the same question should be asked of the current documentation methods as well.

See: SelfDocumentingCode, TheSourceCodeIsTheDesign, DocumentationAnecdote, ToNeedComments, ProblemsWithDocumentation

CategoryDocumentation