Lines Of Code

LOC - Lines of Code, usually referring to non-commentary lines, meaning pure whitespace and lines containing only comments are not included in the metric.

The number of lines of program code is wonderful metric. It's so easy to measure and almost impossible to interpret. It can be used as a measure of complexity or productivity.

As an aside, a good programmer should want to SubtractLinesOfCode.

"Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs."- Bill Gates

"When the weight of the paperwork equals the weight of the aircraft, the plane is ready to fly." - standard Boeing engineering rule

"Including passengers, cargo, and fuel?''

Since LinesOfCode (LOC) only measures the volume of code, you can only use it to compare or estimate projects that use the same language, and is coded using the same coding standards. To change one is to change the volume of code. A better method to compare without regard to direct volume is to measure the complexity of the software. This can be done with FunctionPointAnalysis which can measure the complexity of the programs inputs and outputs.

Many studies do show a rough correlation between LOC and the overall cost and length of development, and between LOC and number of defects. So while it may not be a precise indication of anything (in particular, "progress"), it is not a completely useless metric. The lower your LOC measurement is, the better off you probably are.

Hmm. It's still kind of useless to measure LOC by itself without any correlation to other, more important aspects of a project. I remember an advertisement from the early 1990's (or sometime) for Actor (ActorLanguage) that showed a screen shot of a text editor with full search and formatted print capability. The ad said that the source code for that application contained exactly two (!) lines of Actor code. How would you compare the metrics of that to the same application created in assembly, which would contain thousands of lines of code? This is kind of an extreme example, but hey! This is an extreme site, right? Anybody care to expand?

This may be a good thing. While not knowing anything about ActorLanguage; studies have shown that writing one line of code has roughly the same cost no matter the language; a programmer skilled in SmalltalkLanguage should be able to produce twenty lines of "fully debugged" Smalltalk in roughly the same time that a programmer skilled in AssemblyLanguage takes to produce twenty lines of "fully debugged" assembly. Since a line of Smalltalk can do n times as much (and take n^2 times as long) as a line of assembly, it does suggest that Smalltalk programmers will be more productive.

In short, while LinesOfCode is not very useful as a measure of system complexity (due to its dependencies on language and style), it is a decent predictor of development time. Trouble is; LOC isn't available until the project is underway....

KLOC can be used to measure thousands of LOC.

How about a logarithmic scale; as order of magnitude is often more important, and projects vary in size by many orders of magnitude? We could use decibels relative to some arbitrary size, say 100k lines of code. Call it dBl. A 10k project is -10dBl; a 100-line project is -30dBl. Obviously the smallest possible program; a single line, pegs the scale at -50dBl, as fractional lines of code don't exist (unless we find a definition for them, I suppose). A million-line project is +10dBl; a 100-million-line project +30dBl. If the dBl rating of a project goes up by more than a couple points; different management techniques will need to be brought to bear.

Or, we could treat lines of code as voltage rather than power; and make a 10x increase in LOC as a +20dBl increase....

Just think of the fun we can have with this. When a project is halfway done, we can legitimately claim that its at the -3dB point...

-- ScottJohnson

In acoustics, 0 dB(A) equals 1 picoWatt / m^2. This power density is (roughly) the quietest sound a person with good hearing can hear, in the middle of their hearing range.

In programming, a similar value might be: 0 dBl = 1 line of code.

Unfortunately, the lower-case "l" leads to BadVariableNames.

http://www.wordiq.com/cgi-bin/knowledge/lookup.cgi?title=Decibel

One rough metric I've been playing with is the size of the source when zipped - it unweights wizard generated bubblegum, and is nearly invariant under refactoring away duplicated code. -- PeteKirkham

I think that lines of code as a metric are a hangover from the past when software development was so programming-intensive. In the modern world of high-level UML-based software architecture, I propose we measure our performance in polygons per month. -- LukeGorrie

Ever since I discovered that a significant part of my best productivity was achieved while staring off into space, I've striven to concoct some kind of metric besides KLOC that measures the useful end result rather than endless "progress" measured in "effort expended" fashion.

A data warehouse I coded had most of its code written in the last 20% of the overall time spent, with the bulk of the time spent thinking and experimenting. My boss wasn't particularly smitten by this approach, but we made the deadline and the customer paid. Boss always figured that "thinking" time was wasted.

The point was made, however, when I had to turn the code over the the customer's engineers. From around the table we kept getting remarks like, "How did you arrive at that solution?" and, "Yeah, I would never have thought of that."

Has anyone found a clean metric that measures "head work" that leads to good results, rather than "number of times I swung the hammer?" -- GarryHamilton

Number of true bugs found after delivery?

Unhappily, this doesn't give you any idea of production during ... production.

You want to have some idea that the guy at the keyboard is doing something worthwhile, but lines'o'code ain't gettin' it.

Things I Tried Today
Ideas Tried and Discarded
Methods Attempted
Methods Refined
... and so on

I don't have a good general way of describing "workin' on it, Boss!" when some of the results will never see the light of day.

It was claimed in ArrayDeletionExample that Specimen 2 was one half the lines of specimen 1.

However, if we use other metrics besides raw line counts, the difference is much smaller.

Specimen 1

 while (foo = getNext(bar)) {
   filterThis = false;
   if (condition1) {
     doSomething();  
     filterThis = true
     if (condition2) {
       doSomethingElse();
       filterThis = glob();  
     }
   }
   if (! filterThis) {
    process(foo);
   }
 } 

 Named tokens: 20
 Symbolic tokens: 5  ("!", "=", etc.)
 Brackets and parenths: 24
 Total Characters: ?

Specimen 2

 bar.applyMatches(function(each) {
    if(!condition1) return true;
    doSomething();
    if(!condition2) return false;
    doSomethingElse();
    return glob();
 }, function(foo){process(foo)});

 Named tokens: 20
 Symbolic tokens: 2
 Brackets and parenths: 22
 Total Characters: ?

If raw lines were the most important issue, then specimen 1 could be reformated as such:

 while (foo = getNext(bar)) {
   filterThis = false;
   if (condition1) {
     doSomething();  
     filterThis = true
     if (condition2) {doSomethingElse();filterThis=glob();}
   }
   if (! filterThis) process(foo);
 }

Or even:

 while (foo=getNext(bar)){
   filterThis=false;
   if (condition1){
     doSomething();  
     filterThis=true
     if (condition2){doSomethingElse();filterThis=glob();}}
   if (!filterThis)process(foo);}

Now it is the same number of lines as specimen 2. Subjectively, I would judge specimen 2 to be roughly about 8 to 15% "less code".

A great story about Bill Atkinson at Apple regarding lines of code:

http://www.folklore.org/StoryView.py?project=Macintosh&story=Negative_2000_Lines_Of_Code.txt

That's a classic!

My PointyHairedBosses would see the negative sign and bill me.

One more quote:

"The use of lines of code metrics for productivity and quality studies [is] to be regarded as professional malpractice starting in 1995." -- Capers Jones

I interpret this quote to mean that measuring developer productivity directly in terms of lines of code should be regarded as malpractice, such as with the Bill Atkinson story. (And I agree.) Source code complexity/maintenance burden is a different thing, which is more reasonable to measure in terms of lines of code IMHO. -- DougWay

Counting Symbols

How symbols are counted can make a big difference in a score. For example, APL has special symbols for operations that would normally be words in other languages. If we had a single symbol for "while" for example, should that count as being less code than the word "while"? Perl does similar things by using the entire keyboard of symbols in place of keywords. It is like claiming that Chinese is less wordy than english because it uses mostly (but not entirely) pictograms instead of phonetic-based word construction. To "normalize" the comparision, it would probably be more fair to expand the symbols to equivalent words.

Context

Another issue is "context". Usage of context can reduce the size of code by reducing the need to declare the environment and/or name-spaces being referred to. Some consider excess use of context to be a bad thing, some consider it good. For example, languages or styles that lead to a lot of use of "self" object references is considered a smell by some, but helpful to others. Rather than pick sides on this, it is a factor related to code size that may need to be considered. Example:

  // With context blocks
  X {A B C D E F G H}
  Y {A Q R S T}

  // Without context blocks
  X:A X:B X:C X:D X:E X:F X:G X:H
  Y:A Y:Q Y:R Y:S Y:T

The context approach is often more compact. However, some are unconfortable with it because there is more chance for confusion some say. Here "A" is in both examples, but means something different. Unless one knows the context in the first example, it may be mistaken for the meaning of "A" in a different context.

SLOC Considered Not Harmful

It's worth remembering that Capers Jones was selling something, specifically, function points.

But empirically, it turns out that:

as pointed out above, SLOC per staff hour is constant across languages
function points correlates very highly to staff-hours
SLOC per function point is nearly constant for a particular language.
SLOC correlates very highly to cost (which you'd expect, since labor is the dominant cost in software)

So, there really is very little quantitative justification to prefer one over the other, and very little quantitative justification for looking for something a lot more complicated than SLOC as a volume measure.

Now, ideally, one would measure in terms of something like "new acceptance tests passed per staff week", since the customer doesn't really care how many SLOCs you build. But I wouldn't be at all surprised to discuover that SLOC per acceptance test on a particular project has a pretty small variance, either.

--AnonymousDonor

Maybe as an aggregate it is somewhat effective because good coders and bad coders get averaged together. Second if you start measuring on it alone, then it may encourage bloat. If I was paid by LOC's, I know a lot of ways to bloat up the line count. In other words, a kind of HeisenbergUncertaintyPrinciple problem: if you start measuring it formally (such as tying paychecks to it), its worth or utility changes. I don't dispute the findings you mentioned as an after-the-fact research project, I just dispute it would stay useful if it became a focus in terms of programmer or management compensation and retention.

These two caveats make it problematic for many uses. If I was to judge the "size" or complexity of a project, I might use it like this: sample about 10 different random spots in the code to get a "bloat factor". The bloat factor is a divisor used to "correct" for bloat in LOC counts. However, the bloat factor is somewhat subjective. It is a "personal" tool and not something that makes for good formal metrics.

objective_factor / subjective_factor = subjective_result

--top

Here is an example of LOC not being the same as "better".

I had a situation where I was filtering out characters fairly often in the code:

  myString = replace(myString,"@", " ", scope="all");
  myString = replace(myString,"*", " ", scope="all");
  myString = replace(myString,".", " ", scope="all");
  Etc...

Rather than scattering the code with this, I decided to make a function called "replaceChars". A call would look like this:

  myString = replaceChars(myString, "@*.", " ");

After making and using this function, I noticed that the line-count, and perhaps the token count, was still higher because the implementation of "replaceChars" took an amount of code that was still a slight bit more whan what it was without. However, I prefered the new version because it pushed implementation details out of the code that focused more on domain issues. I don't like technical details clouding up domain-related code. I like domain-related code as close to psuedo-code as possible and repetitious "replace" commands didn't contribute to that goal. Plus, I had a subroutine I could use in the future for other projects. I realize that some might disagree with such a choice, but from my perspective LOC didn't win here.

From ComputerLanguageBenchmarksGame which no longer shows LinesOfCode as a metric:

I think LinesOfCode is a ridiculous measure: if, for example, I was told up front that a low LOC was considered "good", I would pack as much into one line as I could. You can do this in the ForthLanguage and in CeePlusPlus for but two examples. Using a measure like LOC to rate programs or programmers reminds me of the Tech Support debacle that Gateway experienced recently: tech support people were rated on how quickly they could "process" phone calls (under 13 minutes was the limit), so some of them started the practice of hanging up on customers only a moment after picking up the call! (At the end of the quarter, when the reports were generated and reviews conducted, those who did this "looked" best in the eyes of their management.) UnintendedConsequences arise from such measures. -- BillZimmerly

Agreed, there are UnintendedConsequences of such measures, especially if used to compare programmers. As a mechanism to compare programming languages or productivity-cost-per-feature-implemented, performed well after the programmers are no longer fearful for their own jobs, LinesOfCode is a more reasonable measure (as is size/(zipfile size) loosely as a measure of 'waste' in languages, etc.)
The ComputerLanguageBenchmarksGame would, I think, be a bit better if it had a rule saying that the program must be written in a manner considered "direct", as judged and approbated by a small panel of experts in that language. This would also help reduce obscure optimizations and behavior such as 'packing as much into one line as possible'.
You can write Oberon software entirely on a single line too. Do not single out Forth as though it's a misfeature of the language. The more experienced the Forth coder, generally, the shorter the definitions. I find I average about 8.2 words per colon definition (by comparison, ChuckMoore averages 7). If you place one definition per line, the LOC of my Forth code compares with my C code, which also compares with my Python code, etc. In the end, lines of code are meaningless. Statements of code, however, serves a much more valuable metric, and I believe that a "line" refers to a statement. Remember, when the LOC metric was first used, programming languages typically were formatted one statement per line anyway (e.g., COBOL, assembly language, Fortran, etc.).

On the other hand, I'm beginning to think that NumberOfKeystrokes is a valid measure of how much work a program has caused. -- PanuKalliokoski

The LinesOfCode measure is not intended as a naive counting of newlines. Though hard to realize, the intention is to count something meaningful. In C and similar languages it amounts to basically counting semicolons. This is still only intended as a rough estimate of program length, and it fills that role quite well.

A possible alternative to LOC is the delay between keystrokes as well as the number of keystrokes. For any developer, he'll have an average inter-key latency. Code which is more difficult will require more thought, and therefore, should show a larger-than-average inter-stroke latency. That's my hypothesis, at least.

That sounds like an interesting performance or productivity measure, though it is still subject to UnintendedConsequences if made obvious. And it may be difficult to determine cost of just the code portions (especially when dealing with both inserts and deletes and adjustments). As a stealthy measure, it would be interesting to see results on many different programming languages.

FWIW, I have found that LOC is a terrific quick measure of progress, complexity, time spent, likelihood of bugs, etc. Sure, it can be abused. However, if you have programmers that are deliberately basing their coding decisions on upping (or lowering) their line count, you have bigger problems anyway. Empirically, it is a nice fast measure that *does* mean something. Like everything else, you use it for what it does well and move on.

Somebody said that you don't count comments or white space. I do. AFAIK, that is the accepted way to count lines. Empty lines and comments, done by a thoughtful programmer contribute to the code and also take time and care. Someone also said that they don't count generated code. It depends. In some instances, if the code is generated as a result of building a GUI (and it simply is the code manifestation of the work done in the GUI builder), then that also should be counted. I generally count generated code like that because it has empirically proven to be roughly equivalent in terms of time, effort, complexity, etc.

I have to disagree. Code readability is a different metric. We shouldn't try to fold that into code volume metrics. The exception is where the space means something, such as in Python. Python shouldn't be ranked "better" than brace-oriented languages simply because we can "see" braces. The Python interpreter can "see" tabs just as well as it can braces. -t
Except that the only significant whitespace Python groks is horizontal indentation. Python does not care about vertical whitespace, which is the kind that Bob is referring to above. I also agree that counting (vertical) whitespace and comments is silly if they're provided by an automated tool, because those are not lines maintained by a human, and will be overwritten by the tool in the future regardless if a human touches them anyway. One should count whitespace if, and only if, typed by a human; only then are the lines actually maintained.

-- I agree that one must tread carefully with LOC measures. However, empirically, in my experience (YMMV), it is a really good down and dirty measure of progress. That *does* include generated code, but again only if that code is a reflection of work by a programmer at tool level. I am thinking of GUI builders and the code they generate combined with the code done by hand. It has (again, YMMV) been an equivalent measure in terms of work-effort, likelihood of errors, system complexity, etc to any other coding. Note that this is either my own code or teams that I work with. In that case, certain standards of care are applied such that white space is generally uniform in its use (an empty line between variable declarations and code, an empty line between functions, simple headers for main functions, no headers for small HelperFunctions, one statement per line, lots of narrow scoping, small functions, etc). AFAIK, it is still true that LOC is a good measure of time and effort across languages.

I strongly agree, though, with some of the objections here. Bad code or code that is skewed by making LOC a performance metric, or atypical code generation or particularities that force time into reducing LOC, etc will all have an effect. Like everything else in programming, there is some art and judgment needed in the use of LOC. On balance, though, on my own projects it has been a terrific measure to use, I use it frequently and it has generally not been a disappointment. It has given me the information I needed. Programmer productivity is traditionally about 20 lines of production code a day. When you look at your output over the years, you will see that even though you can crank out 1/2/3 hundred lines in a day, the larger that code body is the more likely you are to revisit it prior to or during production. My productivity has been higher than 20 over the years, but not that much and my days have been long and I have been a rather prolific writer. Still, in a quarter century of programming I have not likely delivered much more than 500K lines of code, tops, to final production. Off the top of my head, there are only about 150K that I can account for that resides on my systems here. I know from a long difficult run at it, when I was near the top of my game, I produced just a little over 1K lines a month through to final production over a period of about 14 months (about 16K lines). That was proper, solid code that saw about 3,000 man years of production without requiring a fix. Those were long months, though and I got some leverage from working on the same system on top of my own code from the ground up over that time (the actual product was about 35K lines of my code, but close to 20K was library code written over the years). In ordinary situations, I don't think that 20 LOC/day is far off the mark. As it goes up from there, it starts to accumulate bugs from what I have seen. Note that this can fluctuate wildly over the short term. Some days you see zero lines or even negative lines as you debug and scrap bad code, some days you see 150 lines of 'easy code' and I have produced as much as 300 lines of (not very good) code in a day.

I usually have quibbles in these debates, but I agree with just about everyone here in one way or another. LOC is a good tool for me, but may not be for others and it is (I hope obviously) only one of a number of metrics that I use to judge code. -- BobTrower

Note that counting tokens, etc, is essentially a different kind of measure. It is a good one, but it speaks more to how difficult it was (high symbol density is harder for a programmer to deal with) and how likely you are to find errors there. High symbol density is usually a sign of a problem. Again, though, it is a good measure. If you have a way to find 'hot spots' of high symbol density, they are good candidates for code review and have greater 'gravity' for me when I am looking for bugs in a nearby region.

-- BobTrower

Perhaps LOC is good for a metric internal to the group - measuring velocity, for example, but a poor metric for outside the group, like measuring individual productivity for raises?

Velocity is measured by features completed per unit of time, not by lines of code typed per human. As a recent blog post at ObjectMentor indicated, the act of refactoring can sometimes increase total line count. However, doing so, no new features have been added to the project.

I was more pointing out that if LOC is used as a metric, it is not a good one for evaluating programmer skill/effectiveness

CategoryMetrics