Meaningful Comment

From SelfDocumentingCode:

Meaningful comments help.

Not to be confused with HotComments (an antipattern).

Rationale:

Not all of us are compilers. Natural language is our preferred means of communication, albeit less precise and more ambiguous than formal languages like programming languages. Comments provide meta information about the program, reasons for choosing this implementation, known issues, hints for future readers (including yourself).

Commenting interfaces is an especially good thing to do. Implementations don't need to be commented heavily if their interfaces are sufficiently documented. That is, a well-written implementation for a given method is rarely more than 20 language statements (typically around 5 to 10). Thus, if you know what the method is supposed to it, it should be straightforward to understand the mechanics of it.

Meaningful in this context has two parts: First: the comment can be understood by the people who read it. Two: the comment says something that is likely to be not understood by the same readers unless it was present. So: who are your readers?

Arguments:

"Commenting isn't really self-documenting code."

It's close though. Comments have a tendency to go out of phase with the code, especially when code is rapidly changing. Also, bugs in the comments (there are such things) are hard to catch because it requires a human being to sit down and parse the text with his or her own brain.

However, they are very close to the bone and have a higher probability of being correct. If you don't spent more than minute writing a comment (unless this Wiki page!), it's likely that the comment is small enough that it will change when the code below it will change.

"Comments are annoying!"

So is maintaining UndocumentedTrickyCode. Which one costs more money? [hint: Maintaining code can cost upto 100 times as much as the initial development; cf. CodeComplete by SteveMcConnell].

"Comments go out of phase."

Only if you don't refactor the comments along with the code. It's just like refactoring the tests along with the code. On a related note, beware of the ThePalimpsestEffect.

Exceptions:

Self-documented code doesn't need comments. That's the whole point!

Examples:

There are plenty of good commenting examples around. Here is one very bad technique:

  int iFoo = iBar * 9; // foo is 9 times bar

Not only a totally useless comment because it provides no information not immediately obvious from the code, but it also doesn't say why foo is 9 times bar, or what 9 is (see also NamedConstants). Plus, it is a comment at the end of the line. This limits the amount of space the comment can fit into and promotes this fluffy kind of comments. No, it doesn't. If it doesn't fit one line, it can go extend over subsequent lines. [It also violates MeaningfulNames.] Even better would be:

  // According to lore, cats have 9 lives.
  static int s_kiCAT_LIFE_FACTOR = 9;

...

int iCatLives = iNormalLives * s_kiCAT_LIFE_FACTOR;

It may be entirely useless to use NamedConstants for the cat life factor if you brought the comment down to the same line. However, if you had many animal to human life conversions (dogs to human lives, elephant to human lifes), it's better to centralize the data instead of distributing it throughout the code.

(eww, s_kiCAT_LIFE_FACTOR? How about CatLifeFactor? instead?)

Another very good technique would be something like

  int numCatLives(int numHumanLives)
// <insert literature reference here>
  {
return numHumanLives * 9;
  }

Overcommenting is worse than no comments, because it makes the source code longer, and provides more opportunity for non-testable errors.

If the code and the comments don't match, they're probably both wrong.


Other than the literature reference, which I might have to think about, I see no code here that would need a comment if written expressively. Should we work a Comment Example somewhere? --RonJeffries

ok...

Imagine the problem being a producer-consumer using fork()ed processes that communicate through shared memory and semaphores (on Linux). We're using that very expressive language, C.

 /* prodcons.c -- Simple Producer/Consumer solution.
----------
February 6, 2000 -- Sunir Shah
 */

#include <sys/sem.h> #include <signal.h> #include <sys/shm.h>

#define NUMBER_OF_CONSUMERS 10

typedef enum { false, true } bool;

/* Returns a semaphore handle; -1 on error */ int CreateSemaphore();

/* Destroys a semaphore. Returns true on success */ bool DestroySemaphore( int hSemaphore );

/* Wait/signal a semaphore. Returns false on OS error */ bool WaitForSemaphore( int hSemaphore ); bool SignalSemaphore( int hSemaphoreHandle );

/* Allocates a shared memory segment. iSize is the size is bytes. The handle will be stuffed into *pHandle Returns NULL on error, or a pointer to the memory. */ char *AllocateSharedMemory( int iSize, int *pHandle );

/* Deallocates a shared memory segment. Returns true on success */ bool DeallocateSharedMemory( char const *pBuffer, int hSharedMemoryHandle );

/* On error, returns an error message. NULL on success. */ char const *Initialize();

void ConsumerMain(); void ProducerMain();

void ForkConsumers(); void KillConsumers();

pid_t aiConsumerPIDs[NUMBER_OF_CONSUMERS]; char *pSharedMemory = NULL; int hSharedMemory = -1; int hProducerSemaphore = -1; int hConsumerSemaphore = -1;

main() { char const *szError;

/* We must allocate the shared memory, create semaphores, etc. BEFORE we fork the consumers so that they can have access to them.
if( szError = Initialize() )
{
 Deinitialize();
 printf( "%s\n", szError );
 return 1;
}

ForkConsumers();

ProducerMain();

KillConsumers();

Deinitialize();

return 0; }

/* Rest of program left as an exercise for the reader. ;) ... */

Comments I feel are really important are: the file header, the description of Initialize()'s and AllocateSharedMemory() signatures and the comment inside main() detailing the important fact that everything has to be allocated before the fork()s. If things get shifted around, subtle bugs will be induced. --ss

Seems unlikely to me that anyone would move an initialize method out of the front of the code, but maybe i'm an optimist.

I frequently do. If it makes the code easier to understand, factoring out a method is my favorite way to reducing complexity. --AlbertBrandl

If the signatures have to be away from the code, I suppose they could be commented without, um, comment. OTOH it seemed pretty clear to me without. One might say *pReturnedHandle, perhaps. But the code would tell for sure, and if the code says, I'd not comment.

A couple of suggestions I would make to the above code that would help reduce the need for comments are

Applying these would provide the following:

 typedef int handle;

bool CreateSemaphore(handle * theSemaphore);

bool AllocateSharedMemory( int iSize, handle *pHandle, void * theMemorySegment );

bool Initialize(const char * * theErrorMessage);

I believe applying these changes would eliminate at least some of the need for comments. If these are operating system or library calls that one cannot modify, I would probably favor the comments as opposed to using either wrapper functions or #define macros to change the signatures. -- WayneMack

Please note that the comment regarding allocation before threading could be eliminated simply by embedding it within the if block. This makes the sequence explicit.

main()

 {
char const *szError = Initialize();

if( szError == null)

{ ForkConsumers();

ProducerMain();

KillConsumers();

Deinitialize();

return 0;

}

else

{ Deinitialize(); printf( "%s\n", szError ); return 1; } }

There are a lot of ways this code could be improved by rewriting rather than just commenting.


Is the above code an example of best commenting practice or of over-commenting?

I assume that it an example of best practice but surely the declaration of DestroySemaphore, for example, does not need a comment. What the function does is clear from its name, i.e. it destroys the specified semaphore. Moreover the given comment does not add any value, it only repeats the name of the function.

The only possible ambiguity is the meaning of the returned bool. This IMO is again clear and does not need a comment. If its meaning is not clear then the following change would be better than an comment:

 enum CompletionStatus { Failure, Success };

CompletionStatus DestroySemaphore( int hSemaphore );
-- JohnWilkinson

Concur. I have created a lot of enumerated types exactly like this just to make clear what pass and return values are supposed to convey. That way there is no question in some reader's mind what the method needs to see coming in or what it will come back with. On the other hand, I do not depend on some Hungarian notation in the variable names to describe what a variable might contain -- it had better be clear through the var name and its type, otherwise I use a comment to indicate range, etc.


I have a particular beef with AutoGeneratedComments?. My favorite IDE for Java inserts template comments when it generates the body of a method. I don't use Javadoc so I hate this particular feature. I always turn it off, but I have to work with code from other teams which is littered with method declarations that start something like this:

/**
 * @see SuperclassName#foo(int pitch).
 *
 * @param int pitch -
 * 
 * @return int
 *
 * @exceptions IOException
 */
public int foo(int pitch) throws IOException {
...


 counter++;  // increment counter

isn't very descriptive.
 nr_lines++; // we've found another line;
// increment the line counter
is much more useful.

 number_of_lines++;

may be all you need. (See MeaningfulName)


I despise underscores and would rather write

 numLines++; // found another line

Somehow shifting for a capital letter feels utterly natural while typing, but shifting for a punctuation mark like underscore does not. Note that I avoided the spurious comment about incrementing the line counter partly because I think it's obvious to a programmer and partly because I feel that the comment flow is clearer when it ReadsLikeProse.

My code tends to have few comments. I tend to write comments to explain algorithms that are not readily obvious and I find that such sections of code will often have lines just like the above increment. If the code looks like this:

 numLines = 0;
 while ( (lin = fin.readLine()) != null) {
do something or other
...
numLines++;
}

then I don't feel the need for a comment. But if it was more like this:

 numLines = 0;
 while ( (lin = fin.readLine()) != null) {
do something or other
if (isFunny(lin)) {
numLines++;  // got another funny one
}
do some more stuff
...
}

then I feel inclined to explain the code. Since there are not many comments in the code to begin with, the eye is naturally drawn to the ones that are there so they had better be meaningful enough to be worth distracting your reader from the code itself. --MichaelDillon

Why not just remove the need for the comment by renaming numLines to numFunnyLines?

--EricUlevik

I would suggest creating a method CountFunnyLines(). You might even want to go to the level of creating a FunnyLine class or module with a Count() method. The implication of "do something or other" and "do some more stuff" is that there are multiple level of abstraction going on here, though I may be overanalyzing an example code snippet. -- WayneMack


Comments that simply repeat what the code says are not meaningful. Do not write them. After making your code communicate as much information as possible, strive to write comments that communicate important information that is still not obvious from the code. While writing code your head is full of such information; learn to recognize it and make it useful by capturing it. -- JimPerry

My own criteria is that if a comment reduces the time it takes a maintenance programmer to scan a block of code then it goes in. The comment takes only a few seconds to type and allows other developers to look through the code structure at the highest speed possible.

The source ends up having a comment for just about every block of code (1 to 3 line blocks). The theory is that there are two streams of code intertwined: the comments in green for scanning through large sections of code and the code itself in black for when you are changing stuff.

The goal is to abstract the code as much as possible into the comments even when it repeats well thought out method and member names. The comments should make the source as easy to read as possible so that the maintenance programmer never needs to read the code unless they are actually editing it.

I understand that many developers despise this practice. The objections I hear are that it streches code over twice as many pages, that comments get out of sync and the information is often redundant. My counter-arguments would be:

1. Most methods are not usually more than 1.5 pages long. Stretching them to 2.5 - 3 pages with comments seems like a good tradeoff for easier maintenance scanning.

2. If other workers are too lazy to update comments to keep them in sync then I can assure you they will be doing much worse things such as cutting and pasting code rather than refactoring.

3. It is possible to understand just about any block of uncommented code making 90% of comments redundant. The difference is in the time it saves those doing maintenance. If a maintenance worker has to constantly spend 5 - 10 seconds analyzing blocks of code it adds up, she loses her train of thought more easily. It's like walking through mud rather than solid concrete.

And when you have written comments communicating what isn't obvious from the code, refactor the code to remove the need for the comments. In my code, this is when things actually get named properly. -- EricUlevik

Assembly Language suffers from the age-old tradition of putting comments on the end of the line. There's not enough space on the end of the line to write meaningful comments, so most programmers end up commenting the obvious.

  INCAX;Increment record count
  CMPAX,10;Compare with 10
  JLEOK;Jump if <=
  MOVAX,10;Set to 10
  OK:

Most assembly reads better if you use a block comment followed by a block of code:

  ; Limit the record count to 10.
  ; This works around a kernel bug that causes buffer overruns when
  ; reading more than 10 records.

INCAX CMPAX,10 JLEOK MOVAX,10 OK:
In assembly as in all languages, assume the reader knows the language. If the reader doesn't know the language, no amount of comments is going to do any good.

Subroutines work in assembly as do constant defines (replace 10 with "theMaximumRecordsAllowedByKernel"). If your assembler supports renaming registers, one can also substitute a name for the AX register (I am not sure if many assemblers still support this, so I will omit this from the example). This would create something like the following (please feel free to correct the syntax, I'm only showing an example).

  INCAX
  JSB   LimitRecordCount
LimitRecordCount:
  CMPAX,theMaximumRecordsAllowedByKernel
  JLELessThanMaximumRecords
  MOVAX,theMaximumRecordsAllowedByKernel
LessThanMaximumRecords:
  RET
Even in assembly, a block comment is a good hint to use a subroutine.


See also MethodCommenting, MeaningfulName, SelfDocumentingCode, CommentCostsAndBenefits, ReplaceCommentWithAssertion, CommentingChallenge, ToNeedComments


EditText of this page (last edited July 28, 2008) or FindPage with title or text search