Comment The Why


Implementation notes belong in the source code

The comment tells why this implementation was chosen and that the implementation choice was not arbitrary. It saves the next person the effort of relearning what has been discovered when he attempts to refactor the code.

Refer to CodeComplete 19.5, Document Surprises.


Implementation notes belong in the tests and source code

Use the names of methods and variables to indicate what is done. If a specific implementation is needed, provide an automated test for the needed characteristics, such as performance.

The selection of an implementation should be arbitrary. If it passes all the tests, it works. If the client's not happy with how it works (e.g. performance), write a test for that and make it pass.


Example

An example of a 'why' comment:

 /* The FeynmanAlgorithm used here gives us a three order of magnitude performance boost */
 /* over the more straight forward Newton Raphson implementation. [date] */

A proposed replacement:

  class Equation {
//...
public double solve() {
return solveUsingFeynmann()
}
public double solveUsingFeynmann() {
//...
}
  }

class EquationPerformanceBaseline { public double solve() { return solveUsingNewtonRaphson() } public double solveUsingNewtonRaphson() { //... } }

class EquationTest { //... public void testPerformanceRequirements { //... e1 = new Equation(); e2 = new EquationPerformanceBaseline(); //... t1 = time(new Block() { doing() {e1.solve();} }); t2 = time(new Block() { doing() {e2.solve();} }); assert(t2/t1 > requiredPerformanceRatio); } }
Which is (very) arguably a better separation of responsibilities than the suggested comment.

The crucial place where something is written which satisfies the "document surprise" requirement is the implementation of solve(), which reveals that a particular, non-obvious algorithm is being used.

Appropriate statements are made at appropriate places for the particular details of what that algorithm is (in the algorithm implementation), what the obvious algorithm was (the "performance baseline" class) and what need is addressed with the more complex algorithm (the acceptance test).

All such statements are in code, not in comments.

Some interesting points. The test you present above is checking the wrong thing, which tells me that the comment, which I had thought to be much clearer, wasn't sufficient. What we wanted, of course, was the fastest available algorithm, so the check should be that the fastest performance came from the actual implementation, rather than one of the alternatives that lives only in the test.

Could you please clarify how the test presented is not asserting precisely what the comment would lead its readers to expect?

In the above code, the class Equation is supposed to stand for your production code; the acceptance tests checks that it is faster than an alternative implementation. I admit that the AT doesn't check that the implementation is the fastest, but I submit, first, that it isn't possible to check that; and second, that the comment said nothing about fastest, and implied only faster. Thus the "performance baseline" class, which is test code rather than production code, exists because we need something that the production code can be faster than.

What am I missing?

That aside, I still find the comment clearer than the test code (may just be a matter of getting used to it) and still prefer that detail being attached to the implementation code. I don't expect to sway anyone to my view without a better counter-example, though. -- DanilSuits

The problem is that I can't retest the comment. Let's say my colleague comes by and notices that I implemented the FeynmanAlgorithm wrong. So I decide to touch up the code to do a 'proper' implementation. Now how can I be sure that the performance increase is still there? Maybe I was clumsy and implemented something really inefficiently. With the comment I become falsely confident that my implementation rocks. But with the automated test, I just run it again and in a split-second I'm brought back to reality by a failing test.

Correct comments can easily become incorrect and misleading. Correct automated test code never does.


On the other hand, I have found comments very useful for doing safe 'refactoring' of LegacyCode as in RefactorInsertComment? of RefactoringLegacyCode. But the idea here is to remove the comments afterwards with RefactorReplaceCommentWithTestCase.


When you have the idea "why", do you always have a code which it describes? If no, where do you put it in the first place?

When the requirements for the code are changed, how do you find which "why"s are not longer relevant, or even became plain wrong? -- NikitaBelenki


Methods names don't provide context and they don't tell you why. These are the most important things when looking at code.

It has not been my experience that the first statement is true.

My experience is that people who make the methods may think this way but developers coming cold into a system will probably not feel that way, unless the domain is very well known. I'd love to see an example of something you think is a good example. Would that be possible?

Sure. Start at CommentExample; there are other bits with actual code here and there on the Wiki.


Developers coming cold into a system will probably not feel that way, unless the domain is very well known.

Working on a system without domain knowledge is always a very risky business. The program should reflect the language of the users. Just being a good coder is not enough. You have to understand the problem area if you want to understand the intent of the system.


I think I disagree with either using comments or UnitTests for performance issues (as is suggested by the above example). The comment will really become interpreted as "Don't touch this," rather than enlightenment. A UnitTest tends to focus all of the attention on the algorithm and some ad hoc number. A performance test should always be done at a system level, and is usually a comparison of two different approaches. There are usually many issues contributing to a slow operation and you need to avoid the trap of optimizing one area to death while ignoring other contributing areas. You are rarely going to get an absolute number for how fast something needs to run. You are going to have to rely on mainly subjective feedback from the users that is documented in some manner outside of the code (perhaps as a permuted UserStory?). -- WayneMack


UnitTests are not that understandable anyway. Especially as the test scenarios get more complex. You can't understand nuance without understanding the large structure. The larger structure is very difficult to build up in ones mind from names alone. Otherwise books would just make very long character names and we could deduce the story from just the names.

The ents in the LordOfTheRings books believed names should tell stories.


(See also CommentExampleTwo)

Here's an example coding scenario that suggests comments/documentation are useful. I've written an equation solver that uses a complicated algorithm instead of a more well-known simple method. My UnitTests check the situations that make it necessary to use this more complicated algorithm, so the why is, in some sense, documented in the test code as has been suggested above.

Now my colleague, who has great understanding of both the problem domain and the language, comes along and is simplifying/refactoring the code. S/he sees this complicated algorithm and wonders why the simpler technique isn't being used. The UnitTests give the situations where the simpler algorithm fails, but reason just isn't clear. S/he codes the simpler algorithm, finds that the UnitTest fails, rolls back the changes, and continues. No harm done.

Except that s/he's spent 20 minutes coding the change, running the tests and rolling back the changes. One well-placed comment would have made it clear. Why not have put the comment?

If your colleague was refactoring, the UnitTest should not fail. Refactoring is changing the structure of the code, not changing the functionality. If your colleague was changing code just because s/he believed he could do it better, you have a problem that needs to be solved at a different level than in a comment.

The first sentence seems to miss the point, making the rest of your comment irrelevant. The UnitTest specifically checks that the solution is sufficiently accurate. My colleague might change the code to be mathematically identical, but numerically insufficiently accurate/stable and the UnitTest is designed to catch that. Specifically, sometimes (a+b)+c does not equal a+(b+c). Examples are when a+b=0 and, because of machine arithmetic, a+c=a. We have UnitTests to catch this, but without putting a comment in the code as to why we use especially convoluted computations people will still "simplify" numerically correct formulae to numerically incorrect ones.

Perhaps the complicated method was not necessary. The comment also stops appropriate refactoring. I would also suggest that if the complicated method was made clearer to expose what it is doing, then it would not be necessary to explain why you are doing it that way. I believe what is being commented here is "what needs to be done by the algorithm," not "why the algorithm was selected."

I would suggest that you don't understand some of the difficulties involved in EmbeddedSystems. Frequently, we come across code that was clearly written by a moron, only to find that the author was extraordinarily clever in avoiding subtle traps and we are humbled. Our mistakes are caught by UnitTests, but prevented by comments. The comment does not stop appropriate refactoring, it simply warns the potential refactorer that this code is complex because it needs to be, and not simply because someone hasn't yet simplified it.

In all circumstances you would want a comment as to why an algorithm is chosen. It can't possibly be obvious from the context. Comments do not stop refactoring. If you can refactor the code then you can refactor the comment just the same.

This seems to suggest that comments are sometimes required, which is my contention.

I still don't see why anyone should care why you chose a specific algorithm. What do you expect the comment to accomplish?

The algorithm dictates the speed in various contexts, predicts the memory usage, predicts the CPU usage, and predicts robustness under different data sets. How could you not care about any of this? How could you just accept code without knowing anything about it? Sometimes there is another algorithm that you expect can be better and can be shared with another part of the code. OnceAndOnceOnly dictates that you should touch the code - the comment warns you before you do that, the UnitTest can only tell you off afterwards.


The effort that went into avoiding commenting the choice of the FeynmanAlgorithm is pretty amazing. Instead of simply choosing to DoTheSimplestThingThatCouldPossiblyWork, we're also now expected to implement another formula-solving algorithm, just for the purpose of a unit test to guarantee that we have a performance differential? One little comment, and all that code falls into YouArentGoingToNeedIt.

Furthermore, the test is wrong anyhow. If you're trying to meet some performance guarantee, then all you want to do is to make an assertion about the performance of the algorithm, not the performance of the algorithm with respect to some other algorithm. Imagine, for instance, that a library change is made to improve the speed of both algorithms, but which improves the simpler one more - enough that the unit test comparing the two algorithms' speed fails. We're now in the position of having an improvement in code speed causing the failure of a test to assert adequate code speed!

The other problem is that we probably don't care about the performance of the equation-solving algorithm - we want to know that the program as a whole runs fast enough. That's a functional test, not a unit test, and it doesn't belong here.

If you do not implement a second formula, how can you possibly say the currently used formula is faster in your comment?

The same reason we know that a QuickSort is faster than a BubbleSort. We don't have to ask the computer everything, we might be able to use the back of an envelope to figure this out. Copy this back of the envelope into a comment.

That misses the point, however. Once we have a fast, working formula, what reason do we have for leaving the slow formula in the code base? If the code isn't being used, it should be removed.

Also, a unit test would not implement a second formula to prove speed (the unit test may not even run on the same machine); the unit test would merely measure the speed and report if it is fast enough.

Then we're back to square one - how are we going to prevent some poor sod from replacing the FeynmanAlgorithm with something less complicated that prevents the program from meeting the high-level performance specifications? I know, we can add a comment!

Why would someone alter working code within a module unless it either failed a test, or was unused?

A new test arrives that causes the current code to fail. Maintainer sees that the current code is not the SimplestThing, so he replaces it with a simpler algorithm, only to discover it doesn't cut it.


An attempted summary:

Someone proposed that a comment might be an appropriate way to identify why one algorithm was used over another, especially if the algorithm that gets used is not obviously the SimplestThingThatCouldPossiblyWork. There are various reasons for this, such as performance, or algorithmic instability at certain places. If possible, there should be tests to make sure that the algorithm being used does not have the performance or stability problems.

Someone else proposed that a better approach would be to introduce unit tests that would assert the superiority of the algorithm being used over the obvious alternatives.

From this, a disagreement arose. The pro-comment position is that a separate algorithm implementation just to let future maintainers know an algorithm is inadequate is excessively complicated. The pro-test position seems to be that the tests will encourage more appropriate decisions in the future.

[I have to admit, everything after "a disagreement arose" was very difficult to write. I welcome improvements.]


Moved from ToNeedComments

Hmm...something that seems missing here, although Betsy hinted at it, is the need for comments that explain why the implementation is the way it is. Things like

  # We temporarily set the Foo bit before
  # setting Bar because otherwise, the
  # board locks up the next time it is
  # inserted.
or
  # We don't simply use a binary search here because the
  # interrupt handler needs to quickly slap new entries in.
or
  # Don't use a nested join here - the database will crash
  # if you do. See bug 14,113 for details.
These are implementation comments, and they're important. They tell me why the code is the way it is, preventing me from refactoring it into something that doesn't work. A few lines of comment can prevent a serious mistake that might take a long time to discover. -- BillTrost

This is a case where the comments should be replaced by a test. A comment says don't change this code, ever. A test says, if you can change the code and still pass the test, go for it. There are times when you have to go in and make changes. A test will help you make the changes while maintaining the required integrity of the original code. A comment just makes you sweat, worrying that you might have missed something. -- WayneMack

No, the code says, "we discovered something bad happened when we did it the obvious way. That's why we did it this weird way." If you can find some way to deal with "something bad," then you can think about changing this code. A lot of the time, this is just to flag pathological coupling between pieces of code, or code and hardware.

"Board failure on insert" is one of those cases where UnitTests aren't a great idea. If I have four board types and a dozen different UnitTests per board, I'm not going to have the patience to insert and remove boards 50 times. Similarly, the obscure RaceCondition that we think might've once caused a crash in the field, because we have no other explanation, isn't particularly amenable to unit tests. Again, a comment along the lines of "if you do this, this other code over here might get confused occasionally" can save a lot of unnecessary rework.

So how do you verify changes to "Board failure on insert"? It doesn't seem to me the comment provides any help at all.

Anyhow - is this a discussion of comments, or a discussion of ExtremeProgramming? If you're in an environment that doesn't do, and maybe can't handle, extremism, blaming the nonexistent UnitTests isn't going to help. -- BillTrost


Comments are useful if you are intentionally doing something in a way you know is weird. Sometimes you want to know if this particular side-effect is required, or has kinda "grown" there and been preserved. If you want to know why this line of code exists, surely the one authoritative source is the commit logs of your revision control system? cvs annotate tells you who last edited this line of code and when. The commit log for that checkin tells you the why. If the commit log says "Replaced NewtonRaphson? with FeynmanAlgorithm to fix #xxxxx (some <feature> taking too long on <now obsolete hardware>)" you can judge for yourself if perhaps replacing it with the more straight forward implementation now machines are faster is wiser choice. -- PerryLorier


A comment says don't change this code, ever.

I do not agree. A comment says 'think twice before you change this code'. -- LorenzBeyeler


Some environments lack the extreme nature. "Board failure on insert" probably lives in the kind of environment where repeated testing of all possible boards is both prohibitively expensive (meaning that the hardware to do automated board insertion testing would cost your employer's entire gross yearly income every other week) and gives little benefit (compared to, for example, reading the recommended initialization procedure on the board's data sheet and implementing it exactly as specified, no questions asked).

Someone might have spent a few hours with millions of dollars' worth of equipment to figure out _why_ some code has to be implemented in a certain way, or cannot be implemented in some other more obvious way. You usually don't want to incur that cost every time you make a change, even for non-trivial changes. Unit testing without the millions of dollars' worth of equipment may also be infeasible. Maybe the code controls expensive equipment of its own, which relies on correctly operating software to avoid permanently damaging itself.

A UnitTest that simply compares the code with a copy of itself, or with an SHA1 hash, might get the message across to people who just refuse to read comments. Languages like Tcl that can introspect themselves are good at that. ;-)

I think a comment says don't change this code until after you have proven categorically that this comment is wrong. -- ZygoBlaxell

The XP line is to simulate these sorts of issues with a MockObject. If you're having to code around the idiosyncracies of hardware, it shouldn't be too hard to write a MockObject to simulate those idiosyncracies, right?

You must test against the real equipment to discover the idiosyncracies, right? And a mock object isn't sufficient as the problems are often interrupt and timing related.

Of course, you have to discover the problem first. You'll have to do that whether or not you're using comments. Once you've discovered it, you can store knowledge of it in the code in two ways: 1. You can write it into a MockObject and a UnitTest that ensures your software is insulated against that particular hardware glitch. Or 2. You can write a little comment, planting a landmine for your fellow programmers to tiptoe around as they do their work.

In some cases the very last thing you ever want to do is discover a problem. Sometimes there really are landmines (hidden but partially known problems that will cause irreparable harm if they are uncovered). If you have code that interacts with a physical device, and a certain kind of software-related failure of the physical device will destroy the device's owner (physically, legally, or financially), then you really do need to tiptoe around that code, because there is significant risk--with or without UnitTests. Note that by "device" I'm thinking more along the lines of "industrial robot motion controller" or "expensive and rare scientific instrumentation," rather than "a CD-ROM drive" or "a digital camera."

The more that I think about this, the more that I think this breaks down into cases:

Note that in none of the cases are comments part of the solution. -- ZygoBlaxell


The UnitTest side of this affair seems to be asking an awful lot of their tests. UnitTests experience the same kind of bitrot and shifting requirements as the actual code. If you manage to move some of the "Why?" testing to a UnitTest, you may still have to comment that. Who, after a significant amount of time, hasn't stared at a UnitTest and asked "Why does it do that? What is it really testing?" Who hasn't looked at a UnitTest and said, "Wait, that isn't true anymore, is it?" The tests need "Why?" comments too.

In such cases, I can't really see converting a "Why?" comment to a test if the work is going to be extensive. The test is just as likely to be wrong at some point in the future as the comment, but it's more likely to scare away someone simplifying the code. -- RandyBrown?

UnitTests should be kept small and simple, so their purpose can be understood easily. And it's not as if you code your UnitTests and then simply forget about them forever. They're part of your code, and you pay attention to them.

That assumes that the functionality you're testing is short and simple. If you're trying to make sure that the application doesn't lock up some piece of hardware which can happen only after a complicated series of steps, the test is going to be complicated because you don't know any more than that about the world - you can't know enough to make it simple. -- rb

It's not impossible to write a MockObject that says "Throw an exception if client code does X, then Y, then Z, then X again." It's cumbersome, but it's definitely possible. The exception could contain a descriptive string like "The real object will explode in flames if you do that". This is more work in the short-term, but in the long-term I'd much prefer it. Any non-trivial program is going to contain a number of these gotchas, and life's too short to waste time trying to keep it all in your head. But it in the code instead.

Fair enough. I guess I was just arguing that it's often harder to make the tests simple than it is to make the code simple, because you're asking more of the tests. Very simple code can avoid doing XYZX, with no added complication, but the tests have to check for those steps, which may involve several substeps. -- rb

This is true. I often have long test-method-names like testCircuitBoardUpdatedDoesntCauseMassiveMeltdown, which are practically comments. And I agree that writing test-code and MockObjects are definitely more verbose than a simple comment. What do you get out of that verbosity, though? You get another part of the code-base that you can be fearless with. If you're confident that you wrote your MockObject well-enough to simulate the hardware errors -- you should test your MockObject if you're not yet confident -- then you can bang around with the code all you want. In my corner of the software business, change always happens, and you're always asked to revisit code, and refactor it, and even try to re-use it.

I think this is one of the things I like about XP, and generally Agile methodologies: We do not accept fear as a normal working condition. We account for it, minimize it, and then get on with our lives. The comment approach to this says: "There's an awful scary thing that could happen, so I'll write my code nervously, and then leave an artifact of that anxiety in the code. That way the person who needs to maintain this code will be appropriately nervous, and tread cautiously." And the fear grows as the code grows, leading eventually to unmaintainability and rot.

The aggressive-testing approach says: "There's an awful scary thing that could happen, so I'm going to attack that head-on, by writing a MockObject that I'm sure simulates all the ways this awful scary thing could happen, and writing test code that screams at me every time my test code sets off the MockObject. That way the code that we have tomorrow won't be any more dangerous or fragile than the code we had yesterday. I won't leave fear as an artifact in the code I write." -- francis


As for the fear factor, UnitTests are there to make you feel more secure, not less. You can code with impunity, knowing that if you break previously guaranteed functionality, your UnitTests will warn you of this so you don't check in code that breaks.

Yes, but you're missing my point. Certainly, in the average case, the tests make you more secure. I'm talking about the suggested case of replacing an easy-to-read one line comment with a difficult-to-read multipage test. Both can be wrong, but the comment's incorrectness is easier to catch. I guess I'm valuing readability and simplicity over feedback in this case. -- rb


Why comment the "Why"?

Please describe what action the reader should be expected to take upon reading the "Why" comment. I find this information lacking in both the rest of this page and in the referenced page.


Moved from ToNeedComments

The only comments I expect to see in implementation sources are what's been described as XP: stuff not obvious to someone skilled in the art. Historical comments like who did it and what they were thinking at the time I expect to go into the CM system.

Relevance of this is that comments are most useful to the poor souls who get tasked with shudder maintenance. Inadequate documentation yields high maintenance costs - on occasion inspiring entire "cultural purges" that just go on to repeat the cycle. As an AntiPattern this might be called NewWorldOrder.

So ... always comment interfaces. Yes, XP, this means you too.

-- PeterMerel


Comments contain information orthogonal to the code. Information like algorithm sources, design history, false steps, some of which Ron and Kent agree is useful, and sometimes even worthy of a comment.

[Aside, please check definition of "orthogonal;" it means only sharing a single common point. Surely, if the comments are needed, then they should have a stronger relationship to the code.]

Ironic, in that the definition is not "sharing a single point". It means making a right angle (see http://dictionary.reference.com/search?q=orthogonal). Sharing a single point would be "non-parallel" (assuming Euclidean geometry). Orthogonal is a special case of that and I think is appropriate. The code describes one "dimension" (what) and the comment another (why).

[The original comment was quite correct. The conclusion from stating that lines are at right angles is that they only share a common point. The further clarification of being different axes only reinforces the lack of relationship of comment to code.]

{What about orthogonal planes? They share a single _line_.

    Generalize to N dimensions... head explodes! :P (only adding to the clutter :)
Anyway, the point is that a function(x,y) correlates a point in a bi-dimensional space (your program) to two orthogonal axis: you could say this_part_of_code <- correlate(intent, implementation).}


Note: this page is a heavy RefactoringCandidate. I refuse to touch it because of my own bias. -- MartySchrader <hanging head in shame>

I made a pass at refactoring the top portion. My advice is to refactor bit by bit and clarify the writing regardless of whether you agree with the content. Merge text expressing specific viewpoints instead of leaving the back and forth exchanges. Eliminate duplicate and near dulicate statements. Eliminate most first and third person references.


See: TheWhatButNotTheWhy, TechnicalMemo, AppropriateTechnicalDocumentation, ProblemsWithDocumentation, etc., etc.. ad nauseum

Applies also to: GoodChangeLogEntry

CategoryDocumentation


EditText of this page (last edited March 4, 2014) or FindPage with title or text search