Production Code Vs Unit Tests Ratio

Is the ratio of number of lines of production code to the number of lines of test code a useful metric?

Is a large value for "production / test" lines a bad sign?

Is a large value for "production / test" lines with 100% production code coverage a bad sign?

Is a low value for "production / test" lines a bad sign?

Assuming 100% production code coverage, would a small value for "production / test" lines indicate something good or bad about the tests? ...or the design of the production code?

Maybe some metrics would be helpful here. How large is your project? What's the "production lines of code /divided by/ test lines of code" ratio? Do you have 100% code coverage? Was the code written TestDrivenDevelopment?

One data point: I happen to have a small ant script that I happen to have written TestDrivenDevelopment style, using AntUnit:

123 lines of production code
261 lines of test code
so slightly more than 1:2 of production to test code.

And I'm "cheating" on the tests too:

15 tests
45 asserts
giving an average of 3 asserts per test

So I'd guess that if I restricted each test to doing only one assert (which some think is a good practice) I think I'd end up with something around a 1:5 ratio of production to test code lines.

I kind of think that the range of "good ratios," if there is one, may depend on implementation language. For ApacheAnt, being a DomainSpecificLanguage gives it expressive power within it's domain. But DomainSpecificLanguages can be cumbersome and/or verbose outside their target domain, and being an XML ScriptingLanguage could tend to make that problem worse. -- JeffGrigg

A flavour of the code (design?) smell: as mentored by the TestDrivenDevelopment, optinal ratio of ProductionCode vs. UnitTests lines of code is about 1:1. If it deviates at more than 2 or 3 times it's probably a sign of some code/design problem.

Sadly this page has turned into a pedantic and pointless argument, made largely by someone who has chosen not sign his name. I think this discussion should be deleted, and replaced with something useful; the hypothesis about the optimal ratio of application to test code, and a discussion about that. Obviously this hypothesis cannot be applied to all software or methodologies; it clearly relates to the AgileProgramming, and only defines a CodeSmell (which is by definition *not* authoritative). It cannot define any absolute metric. -- IanBicking

Non-sense. The arguments to the contrary are not pointless, and not pedantic. If somebody's gonna define that by some essentially arbitrary criterion (indeed no argument whatsoever has been presented for the 1:1 ratio) large bodies of code can be seen to suffer from CodeSmell, the claim should either be defended by reasoned arguments, practical empirical evidents, etc, or deleted altogether.

It stands to reason that if there's any relevant criterion to judge/detect the adequacy of a set of tests, than that criterion is code coverage, see TddCodeCoverage for example. Depending on so many particulars test can cover 100% of the code base in ratios of 1:100, 1:10, 1:2, 1:1, 2:1, 10:1, 100:1. Any prescribed ratio, is absolutely useless, unless someone is able to show a preponderence of evidence that the correlation between the ratio and the 100% code coverage (which is the real target) follows some normal distribution centered around 1:1 value.

If not, people who follow the advice on this page will follow a useless advice while measuring code coverage is almost trivial these days and is a way better measure of the adequacy of any tests.

Indeed, watch this meaningless transform

 double Foo() { return 2.0 }

(1:1 code to test ration)

 void testFoo(){  assert(2.0 == Foo()); }

(et voila an 8x change in the code to test ratio)

  void testFoo()
  {
     double expected;
     double gotten;
     expected = 2.0;
     gotten == Foo();
     assert(expected == gotten);
  }

Perhaps there is a hidden assumption in the original post that the unit tests and the production code are written in the same style. If you write the production code in the same style as the test code:

 double Foo()
 {
     double return_value;
     return_value = 2.0;
     return return_value;
 }

The ratio is now 8:6, which is not very far from 1:1 for such a small sample.

(Just for the fun of it:

 double Foo()
 {
   double part_a;
   double part_b;
   part_a = 1.0;
   part_b = 1.0;
   return (part_a + part_b);
 }

for 8:8 = 1:1

 double Foo()
 {
   double part_a;
   double part_b;
   double part_c;
   part_a = 1.0;
   part_b = 1.0;
   part_c = part_a + part_b;
   return part_c;
 }

 double testFoo() { assert(2.0 == Foo()); }

For a 1:10 ratio :P

-- JeromeBaum? )

Unadulterated non-sense. Some mr. Ischenko issues by the way of handwaving a pearl of wizdom on software engineering and somebody repeats it on wiki justifying with a link to handwaving. Wow. Wiki is really making progress.

And your contentless pointless post is an example of progress?

Do you have any better idea? Do you know any reason why it's wrong? I want to know how this ratio was measured and how do they know that is the right one.

MaxIschenko: I agree that "should be close" or "most probably" may be a bit strong here. But still, what's your arguments? As for how this ratio was measured: just count LinesOfCode in production code and test modules. No? My statement about "ideal" or "proper" ratio is based on some TDD/XP figures. Non-authoritative, yes, but still reasonable. See, for example, http://dotnetjunkies.com/WebLog/darrell.norton/archive/2003/12/19/4680.aspx.

Not reasonable at all.Reasonable means based on reason, i.e. on logical arguments of some kind. Since you are making the strong claim here, it is you who should bring a supporting argument. 1 to 5 is not unreasonable. 1 to 10 is not unreasonable. If you refer strictly to UnitTest as in "TDD", as opposed to more gneral types of tests (like acceptance tests, etc) 0 to X is not unreasonable either. Just go about studying some SoftwareMasterpieces out there before you impart your invaluable wisdom on us.

See also: TddCodeCoverage

Much of the above is ContentFree? (other than a few flames) and is need of ReFactoring.... at any rate, there is a good question posed on this page. What is the appropriate ratio (and what tolerance)? Obviously, the boundary conditions of 1 to 0 (no tests at all) or 0 to 2 (no product) aren't good; but what should the optimal ratio be? I've no idea. There is a data-point in support of 1 to 1; anyone have other datapoints? Since this is a highly qualitative (and thus not falsifiable) argument, telling other parties to "prove it" is inappropriate--and the numerous AdHominem attacks seem unwarranted.

That was the point, that it is ContentFree? with no improvement in sight. Demanding a "supporting argument" (which is distinct from proof -- therefore be mindful of who is flaming whom) is entirely appropriate in the case that somebody makes a strong claim. Your "obviously" doesn't bode well either for the content of this page. What the hell is so obvious ? 0 -- no UnitTests at all obviously cannot be qualified as "no good" since we have plenty counterexample in SoftwareMasterpieces. What is your data point and what claim does it support ? And what is your methodology for cherry-picking data-points ?

Easy, Costin... I wasn't the original poster. At any rate, there is no need to be rude (which was the point I was trying to make). As far as various SoftwareMasterpieces go... you don't seriously propose that these have gone untested? While they may not come with a suite of UnitTests all bundled up in the source tarball, each and every one of them has likely undergone rather thorough testing of some sort.

Whereas "testing of some sort" may not have any connection with current "fashionable" UnitTests, which is what the strong claim on this page is about. Therefore the prudent conclusion remains that 0 lines of code of UnitTest can be perfectly acceptable engineering practice, and no evidence or reasoned argument has been shown to the contrary.
- Probably, the claim/question ought to be ReFactored: "In an environment where UnitTests are the primary or sole testing methodology (in other words, no separate QA department to bang on the thing, etc.), what is the appropriate ratio)?". I would think that the answer is greater than 0 lines of unit test code--what it is I don't know.
- Can you tell me what environment has UnitTest as sole testing methodology ? How about "primary" (-- meaning exactly ...) ?
  - I'm not aware of any.
- One would think that even in the strongest XP fiefdom acceptance (end to end) tests are much more important than unit tests, because you can have all the unit tests you want but if you don't meet requirements, they're useless.
  - Of course, end-to-end tests often don't involve code--if they do, it's code that can also be used for unit-testing. Customers generally don't write test cases in secret, in order to spring them upon unsuspecting developers at the last minute...
So let's rephrase the claim thusly: in an environment where developers are expected to write UnitTests or practice full blown TDD, ) 0 lines of unit tests is not a good thing. Claim to which I subscribe wholeheartedly. As to what regards the "golden ratio", the only prudent claim to be made is as follows:
- Of course, that claim is vacuous, but I suspect you know that.
In an environment in which the person(s) in charge think(s) that 1:1 is the golden ratio, 1:1 is the golden ratio. Furthermore, since (/ lines_of_unit_tests lines_of_production_code) is a function of modules of code and therefore not expected to be constant, we can claim that if the guys in charge expect a particular statistical distribution, than that particular distribution is desirable.
- "What the PointyHairedBoss thinks" is not an interesting question, nor is it at all scientific. Some PHBs think that any testing is a waste of time; developers should just WriteBetterCode?. At any rate, I suspect the answer to a well-posed version of the question is highly dependent on many things:
  - Tools used. StaticTyping and other things which can be inferred/checked by a compiler reduce the number of test cases needed.
  - Difficulty of the problem
  - etc.

Many have been in the public view for a long time and have undergone the extensive peer review afforded popular OpenSource programs. The closed-source masterpieces (such as VMS) most certainly underwent thousands of hours (if not more) of "traditional" formal QA efforts. Many of these programs were written before UnitTest-heavy methodologies such as XP became fashionable. And a couple of them were written by DonKnuth. :)

I think there's an interesting discussion to be had on this subject--one that is perfectly acceptable to conduct informally.

Ah, bon. But strong claims along particular ideologic lines backed up by strong handwaving are not as acceptable.

After all, this is wiki, not Communications of the ACM--rigorous standards of research and peer review need not apply to opinions held here. While your objection to the 1:1 ratio proposed above is duly noted--I have no particular reason to either believe or disbelieve that figure--I propose that additional suggestions from others be collected. Premature demands for "proof" are ConversationalChaff. Again, I think you are dreaming about premature demands for proof.

I am not sure how to determine what a reasonable ratio of test code to production code would be. On the one hand, when using a unit test framework, a single assert statement may be sufficient, but the amount of set up code can vary. Also, the amount of stub code required can vary from zero to quite a bit. Emotionally, I would start to be concerned if the quantity of test code were to be come equal to the production code, because it would appear that the test code may become a higher risk area than the code itself. Has anyone done any measurements on what is being experienced?

OK, I have removed this page from a list of CodeSmells. Would the following formulation be acceptable: if the project aimed to have 100% of code coverage by UnitTests and you want to quickly check this, just check the ratio. If it's far from 1 than you may have problems. If it's close to 1 than you may have not problems with code base.

Guess than it is really not much sense here. -- MaxIschenko

Somebody raised the very valid issue that if the ration is 1:1 then on needs to be at least as suspicious of unit tests. Who tests the unit tests?

In TestDrivenDevelopment...

The tests are tested for effectiveness when written: They must fail first, before you write the production code, and then they must pass. This proves that each test can successfully discriminate between good code and (at least one form of) bad code.
The tests are tested for brittleness as the code is refactored: Tests that fail when refactoring is done are brittle. Typically, their design could be improved.

I'd like a more nuanced view, using opensource projects as a metric. Does test ratio increase with project's maturity? Is there a difference between the ratios of buggy crap vs. secure stable software? Etc etc.

Only small fraction of projects are committed to serious unit tests. And among those, I believe open source ones are minority. I'd also want to make clear that "no unit tests" doesn't mean buggy likewise "100% unit tests" doesn't mean "buglessness". Unit tests would (may?) yield other important benefits as well, like cleaner design, reduced development time, reduced time for system testing and deployment.

Ah, but what is the argument for the claim above. At 1:1 ratio, I'd suspect a lot of development time would be wasted in writing and validating unit tests. The cleanliness of the design has no logical link connection with how many tests one writes. Badly designed code passes unit tests as well as cleanly designed code.

The ratio make sense only for the projects that actually **have** tests and it allows to make a gross evaluation of it's quality. -- MaxIschenko.

Again, at that humonguous ration one need to be suspicious of the quality of the unit tests themselves. I'd also be suspicious that the rather extraordinary effort put into writing that many lines of unit tests detracted from focusing on the real issues of the code itself. One data point from my private project is that I achieved 100% code coverage with 1:5 ratio between lines of tests and lines of code, and that doesn't mean that the code is perfect or bug-free.
If open-source projects tend not to focus on Unit Tests as much as the XP gurus would like us to believe is necessary, yet we do have plenty examples of SoftwareMasterpieces in such settings, then maybe there's a lesson to be learned.
- successful open-source projects rely on horde-review - see GitHub. Teams working on proprietary code can't do that. So we need more tests and pairing, obviously...

Re: "I achieved 100% code coverage with 1:5 ratio between lines of tests and lines of code"

I, for one, would be very sincerely interested in learning how production code can be effectively tested with tests that are only 1/5 the size, in lines of code, than the production code. Is there lots of boilerplate production code here? -- JeffGrigg

So far, the above content seems to concentrate on lines of production code versus lines of unit test code. I think this is a false measure. What you ought to measure, instead, is control flow points in production code versus function points in unit test code. You should have no less than one unit test per production control-flow construct. In no uncertain terms, every if statement needs two unit tests (one each to exercise the consequent and alternate paths). A case construct will need no less than n tests (where n equals the number of cases in the construct). Etc.

A ratio close to 1 should be the minimum to strive for. You may not achieve exactly 1, because of Beck's rule, "Test only what can fail." Some production code might not need unit testing, if you can prove it cannot fail (and there damn well better be proof-annotations in the production code to back your claims, too). However, ratios significantly greater than 1 are clearly a red flag, clearly indicating that there exists significant quantities of branch points which have yet to be exercised. (Use a code coverage tool, at this point, to determine precisely what needs to be exercised.)

How does this affect SystemSizeMetrics ?

WRT to "Is the ratio of number of lines of production code to the number of lines of test code a useful metric?"

Here's the thing about metrics - be careful to measure the desired outcome, and not the mechanism that achieves that outcome. In this case, I'm assuming that your goal is not really to have a certain test-to-target ratio for lines of code. That's just a means to an end, which is likely to have a high-quality system, or a great user experience, something along those lines.

What happens when you measure the mechanism instead of the desired outcome is that people will "game" the system. They'll satisfy the metric, but often at the cost of the real goal.

But let's say that you want to achieve a great user experience. That's much more work to measure, isn't it? And it seems a lot less scientific, but that's what you really want. Organizations shy away from measuring the right thing because it's hard. But it can be done. A survey with numeric rating on various questions, for example, might measure what you want. What's important isn't the absolute total, but how it changes over time. Getting better? Getting worse?

You'll get what you measure. Be sure you measure the right thing. -- DonBranson