Are Long And Descriptive Related

[Talking about measures of work the coding takes, from NumberOfKeystrokes]

I agree completely with you, lines of code is ridiculous. But number of keystrokes is good but not adequate, lest we want to go back to strncpy and "while ( x++ = y++);". So I have a minor improvement to your proposals. How about NumberOfTokens? ? That should take care of strstr, strtok and the likes. --CostinCozianu

We talked about this with my friend, and I'm not quite sure about it. Many people tend to prefer long (and descriptive) names even for local variables and "HelperFunction"s. (In my book, a function that is called more than ten times in a program is almost certainly a "helper" function.) I and my friend, we do not, because the longer the names of "helper" functions are, the less they "help" (I think strdup should be called strdup and not duplicateAndAllocateString), and we are quite strict about generally having lifetimes for locals that are at most about six lines of code. Of course, YMMV.

As a result, we often refactor variable and function names to be shorter, not longer, as we gain understanding of the problem. Our identifiers start out long and descriptive, then get abbreviated as we know which of them should be abbreviated. I don't know whether many other people program in this way.

Anyway, this does point into a weakness in the NumberOfKeystrokes: we do not write short names initially, because we prepare for NameSpace clashes if (and when) the size of the system grows. Routines that are called often deserve a shorter name. When the component is ready, you shift it into its own NameSpace, so you can begin anew.

Interestingly, this minimization process once again minimizes the NumberOfKeystrokes: often used identifiers get shorter, whereas less often used are left longer. But the time to find the right names to abbreviate is not counted. (It's not very big, however, in my experience.) -- PanuKalliokoski

I suggest using an editor with auto-complete. I use long identifiers, but after I declare them I only have to type a few of the characters. -- EricHodges

I do use auto-completion, but to my eye, the code looks clearer and less cluttered if often used identifiers are short and seldom used identifiers long. You don't have to wrap as many lines, and the code stays focused on what is being done, not how it is being done. -- PanuKalliokoski

Perhaps all of my identifiers are seldom used. What name would you give an object that generates sequences of vertices? -- EH

I wouldn't model that as an object, and the question cannot be answered anyway, because I'd probably name the function after what sequence of vertices it creates. -- PK

I don't understand your answer about the function that generates sequences of vertices. All it does is generate sequences of vertices. There's no other "what" to use in the name. What 5 letter name would you give it?

-- EH

I'll give some examples of the "what" might be. I might have a function that generates a sequence of vertices for an icosahedron; or a terrain; or a hyperspace manifold; or a random network in three dimensions; or whatnot. These tasks are different enough that they need separate functions. There is no mind in making a function that "just generates sequences of vertices", because what it actually does is heavily underdetermined.

Unless you mean something like an interface: a name I could give to anything that can produce some sequence of vertices. If a global interface (a type), I'd call it VtxF if I had to stick to 5 chars; VertexFac otherwise. If an argument to a function, I'd call it contextually, but probably e.g. "vs" if I had to stick to 5 chars or the identifier was used a lot; "vertices" otherwise. If you ask for the method for generating the next vertice, that would be ".next()".

-- PK

I have an object that generates sequences of vertices. It doesn't generate them for any specific purpose. It can be used for any purpose. If I named it "VtxF" I would never remember what that meant. If I named it "VertexFactory" I wouldn't remember that it generates sequences of vertices, but would instead think it created vertices. So I named it VertexSequenceGenerator. If I ran into "VtxF" in code I was maintaining, I would immediately research it and rename it to something less cryptic. -- EH

Now here is a semantic problem: what on earth does the VertexSequenceGenerator actually do? For me at least, that name, as the description "an object that generates sequences of vertices" says nothing. Show me the code, so I can tell you what I would call it. -- PK

[Lots of clarifying code cut out]

Okay, now I see what you're doing. First, you're not generating vertices, you're generating vertex indices. (I had quite a hard time trying to guess how an integer can be a vertex.) Second, and more surprisingly, you're not using those indices directly, but as index deltas (differences). So the name for the object should (if I try to keep your style in the naming) VertexIndexDifferenceArrayFactory.

That is actually only good as an interface name: it just tells what the client can do with the object, not what the object does. The implementation, then, should (keeping similar standards) be named AllVertexIndexDifferencesInAscendingOrderWithinRange. Now that's clearly nonsensible. I'm bad at trying to do this stuff by other people's standards, you will play your game better.

Now, as for what I think these should be called: I think it is important to keep the model-view distinction, because we're interpreting data on a very low level. Therefore, I would name the objects after what they do, not how they're being used: the interface IntArrFac? (IntAF for 5 chars) and the implementation AllAscWithin? (AllIn? for 5 chars). The view would be named in the client code: I would name vertexSequenceGenerator as generator (or genr), and vertexSequence in calculateNextLocus as vertexDeltas (or vtxDs).

But I don't quite understand the structuring of the problem: is there some reason to return vertex index deltas instead of vertex indices? If I was to model the problem on this information, I would probably make one generator that generates vertex index sequences (not vertex index delta sequences), one generator that transforms those vertex indices into vertices by shape, and one that reduces these vertices into the next location on the curve. -- PK

Why would anyone name interfaces differently from classes? (DontDistinguishBetweenClassesAndInterfaces) We have very different design philosophies. I don't expose implementation details through class (or interface) names. -- EH

DontDistinguishBetweenClassesAndInterfaces discusses a different phenomenon than what is going on in here: their point is, you shan't tell in the name whether it is a class or an interface; I'm trying to separate the concept of "anything that can produce some vertex index arrays" from "something that produces deterministically all possible vertex index arrays in ascending order within a given range".

Steve Newton added InterfacesShouldBeAdjectives, not me. I disagree with that notion. Mixins should be adjectives, but interfaces are not mixins. I never know if a name will be attached to an interface or a class. What begins as an interface may grow implementation through refactoring. What begins as a class may have its implementation refactored away. I try to name the externally visible parts of a class or interface from the client's perspective. The client doesn't need to know if it's working with a class or an interface. [I agree. -- PK]

In the example we've been using, the client code doesn't care about the implementation of the vertex index sequence generator. That's private information used only by the factory or strategy participants.

-- EH

It occurred to me that I haven't actually explained just why I think too long identifiers (where the definition of "too long" depends on context) and long code overall decrease readability. For example, when discussing the VertexSequenceGenerator, most of my time reading code was spent on trying to find where the beef is, not trying to understand what the code does. Also, I had to read three separate classes before I could understand what's happening, because the names didn't tell me enough, long as they were. (Maybe I'm bad at understanding names or good at understanding code, who knows; anyway CodeTalks.) If there weren't all these constructors etc., all the code would have fitted in one screenful and I would have had it much easier to look at how the different parts interact. Conciseness improves readability because it allows me to grasp bigger parts of the code at once and length of identifiers to me is a hint of where the code does something specific, i.e. where there is a place in the code that could have been otherwise; where there is a detail. This latter one is difficult to explain. -- PanuKalliokoski

I agree that conciseness is a virtue, but I think short identifiers work against it by forcing the reader to browse into other methods and classes. Most of the time I learn (and refactor) OO code one method at a time. The methods I posted here were all less than 10 lines long. I think the average for my code is around 4.5 lines per method. While I'm reading those lines, I shouldn't have to browse into other classes and methods. Their names should convey their intent. That's my goal. -- EH

Yes. Maybe it's just that to me, code often speaks much much more than names. -- PK

That's true, but when you're maintaining code you often don't have time to trace every branch of every call around the code you're modifying. I trust that standard library methods are going to do what their names indicate without reading their source. I should be able to trust all code in the same way. -- EH

It would seem to me the problem described is not one of length, but one of poor descriptiveness ("because the names didn't tell me enough"). Making a name longer will not make it more descriptive, but neither will making the name shorter. -- ??

I don't think anyone has suggested that making a name longer will make it more descriptive. I'm arguing that making a name more descriptive often makes it longer, and that descriptiveness (clarity) trumps shortness (brevity) when it comes to code maintenance. -- EH

Agreed. Still, we seem to assess the value of these differently. And for many things in programming, there just is no name that would tell just what the thing is (and what its purpose in that program "programmer intent" is). -- PK

I don't know whether long and descriptive are related, but I know for sure that short and descriptive are most definitely not.

I don't agree. As the esteemed donor did not provide any argumentation, neither do I :)

Let me try to come up with a StrawMan, then.

 (define (f a o l )
  (if (null? l)
      a
      (let 
            ((x (o a (car l)))
             (y (cdr l)))
        (f x o y))
      ))

versus

 let rec fold_left start_value operator list=
 match list with
 [] -> start_value
 | head::tail -> let 
                     next_value= operator start_value head
                 in
                     fold_left next_value operator tail

Your first example proves well that shortness does not make identifiers descriptive. But your second example goes to prove my point: in my opinion, this is more readable:

 let rec reduce acc f = function
 | [] -> acc
 | h :: t -> reduce (f acc h) f t

I don't think the above is more readable. But anyways to make an apple to apple comparison I had a let in both my Scheme and Ocaml versions to show the usage of identifiers local to a block (expression). Going past that, in the above the reader does not see right away the third parameter to the function, imagine it was a longer match block, or the expression for the first case was much longer, I'd have to parse quite a lot of code to figure out the header of the function. For functional programmers this is trivial, but for people coming from other backgrounds this very brief style is a definite turn-off. --Costin

I'd especially like to point out that "_value" is almost always null-semantic; "operator" instead of "f" goes against years of functional tradition; and that calling the accumulator "start_value" is a bit misleading (if it wasn't, I'd call it "init").

The "f" is an operator, therefore why not make it clear. The _value is not redundant. If I'd call it just start or init, an unadvised reader would be inclined to say that it is the name of a procedure to be executed at the beginning of a larger process. To name it start_value makes it clear that it is the value with which we start the folding.

Because the separation between an operator and a function is not sensible to make here at all. You could shorten "operator" to "op" without losing semantic value, but "f" is the canonical name. I know start_value is the value from which we start folding; but by naming it "acc" (or "accumulator" if you want to be long), we simultaneously imply that the routine is tail-recursive, which is a good thing.

But the important point here is that at least to me, the readability of the routine is greatly enhanced by its brevity. Not only is the routine faster to read, its overall structure is also better presented in the short version.

I think you're fooling yourself. No one but you will glean your intent from "acc", "h" and "t".

I think you haven't done much functional coding. After "h::t", calling "h" "head" and "t" "tail" has zero information increase. Note that the Haskell standard is to call these "x" and "xs", which are just one character longer. "acc" is straight from the times when tail-factoring recursive routines was a Big Thing(tm), and some programmers go as far as calling this "a" because it's so common. "h", "t" and "acc" are like "i" and "tmp" in C.

Yep, I can only imagine looking 10 lines down froh h::t wondering where is the tail of the queue. "Tail" is a very good name[no, actually tail is a bad name, but for routines like reduce there just isn't more information], while "t" sucks unless you adhere to a very specific cultural background. By the same tradition, new languages should be using car and cdr. No offense to lispers and schemers, but software engineers havin an idea what car and cdr is are a rarity on my block. The very fact that Eric which has some years of experience shakes his head in disbelief is indicative that if he is to read some scheme code, even as good a code as directly from StructureAndInterpretationOfComputerPrograms, he is likely to be turned off by some very awkward looking "cultural assumptions". [Are you saying that the Java culture does not have similar cultural assumptions?] I love FunctionalProgramming, but some thing are just not objectively desirable in that tradition, and some things turned out much better in the OO world. We now have the seminal BadEngineeringPropertiesOfOoLanguages, but the FP gurus have yet to face up to the challenge on their side, as they are culturally more enclined to theory and experimentation rather than engineering. --Costin

I think you're making a wrong assumption about where the trouble lies. New programmers have to learn anyway, what is meant by "Factory", "Sequence", "xxxxable", and so on. It's not easier to learn ".contents" and ".next" than to learn "car" and "cdr", if you don't know anyway what linked lists are. I think the reason Eric despairs at this stuff is that he has learned much of the Java culture, and has so been able to sidestep the "traditional" names. -- PK

I learned Java culture after learning Basic, C, Pascal, FORTRAN, C++, PDP-11 and 6809 assembly culture. I'm familiar with many traditional names, even "car" and "cdr". I don't use them because they provide no benefit to me or my team. The names I used in the code above are the same names I used (with underscores instead of CamelCase) in my C++ implementation, so I don't think this is just a Java culture bias. Perhaps it's a large project bias. When you maintain millions of lines of code for decades, you can't keep all of this information in your head.

C++ and Java are quite similar in naming culture. One reason for the long class names is that C++ used to have no NameSpaces and no parametric types. Now that it does, the typename (interface name) for an IntArrFac? could be fac< std::vector< NDim::coord > >. Look at std:: names.

Another thing I've been doing recently some mathematics, and of I was put off again by the usual one letter identifier names, and special characters operators. Oh, boy these mathematicians, they do love their special purpose hieroglyphs. Just as an exercise, I did some theorems on paper using ASCII only,absolutely no special purpose characters, and meaningful identifiers, for meaningful things (although not for all dummies). It was much easier to understand and it was a pleasure. The need to get used with the notation of each book is a tremendous slowdown. --Costin

Please also look at Python standard library names (or Haskell standard library names). There's a lot of code there, but the names are quite sensible by my standards. http://www.python.org/doc/current/lib/lib.html

I'm curious. How big are the projects you tend to work on? How many lines of code, and how long is that code maintained? -- EH

The biggest inseparable components I've worked on have been about 30K lines of code - they were badly factored. In FactoringLargePrograms, I argue that you should never have to make bigger components (and preferably never as big, either). If I make a preprocessor for LaTeX, should I count the LOC of TeX in the size of the "project"? What if I make a LaTeX macro package? What about counting the LOC of the depended-on packages in the size of the project?

Is the standard library of some language a "project"? What about the Linux userland? Where do the boundaries go, when you don't have a commercial organization to "begin" and "end" a project? -- PK

I use lines of code as an indicator of maintenance effort (the more line of code you have, the more line of code you have to maintain) as opposed to problem complexity or difficulty. [But if you have a terser language, will your project be bigger for the same LOC?] Let's step back from that for a bit. The approach I'm advocating (identifiers as verbose as they need to be for maximum descriptiveness) comes from working on projects consisting of millions of lines of code, spread across hundreds of individual programs and shared libraries, where each unit may be maintained indefinitely (20+ years for some and still going) by groups of 50-100 programmers with annual turnover rates of 50% and higher. The size of each component is probably no bigger than 30k LOC, but any programmer may need to modify any line in any component. This approach may seem silly for smaller development, but I've used it for personal projects and been quite happy with it. I find I can revisit them years later and quickly understand how they work. What sort of projects do you work on? What applications do you develop?

I find your situation very similar to the open source software world overall. There are thousands of programs, every one of which might be over or under 30k LOC, and which, when they misbehave, have to be corrected. Now some of the open source programs are hopeless to debug, yet many are very well coded (by my standards) and very easy to get a hang of. -- PK

Also, another problem with abbreviations just occurred to me. If you use them sometimes, readers are less sure if a name might be an abbreviation. I thought "proact" might be an abbreviation because there were many other abbreviations in the code and I was unfamiliar with "proact" as a verb. -- EH

True. -- PK

What should "i" and "tmp" be called in Java? -- PK

"i" is often "index", "tmp" is often "temporary", but that's rarely a good name for a variable. Most variables are temporary. That isn't the reason they were created.

You're correct, I haven't done much functional coding. I don't use "i" or "tmp" in C. I see no advantage to cryptic identifiers. I write code with the assumption that a novice will be maintaining it at some point, which generally turns out to be true.

-- EH

Jeez, there's nothing cryptic about "i" or "tmp", they come with years of tradition. It's stupid to throw away all those good naming rules just "because you have to make descriptive identifiers", because those identifiers are already descriptive. In Java, you'd have "StringFactory?"; in C, you'd have "genstr". It's all about language tradition. -- PK

Yes the code below is much better than using cryptic i and tmp ... just think what we can do with 3 or maybe even 10 levels of nesting..... :)

for ( int localLoopCounterofTypeInteger= 0; localLoopcounterofTypeInteger < localLoopGuardMaxofTypeInteger; localLoopCounterofTypeInteger++) { /* lets hope this is a joke...... */ }

When I started programming in C, I had 64k of RAM, no hard drive, and 120k floppies. My display was 32x20 characters, all uppercase, with lowercase indicated by inverse video. Short identifiers made a lot of sense in that environment because I had very little screen real estate, and the size of source code files was a legitimate concern. Those days are gone, and good riddance. There's no good reason to use "i", "tmp" or "genstr" anymore. Some traditions are best abandoned. And new programmers enter the business every day. I'd like to spare them the burden of translation "traditional" identifiers if at all possible. -- EH

I think of one-letter names as being like pronouns. They work fine as long as their scopes are extremely small and there aren't too many of them at once. (Contrast with: "Pronouns work fine as long as the pronouns' scopes are extremely small and there aren't too many pronouns at once.") Sometimes you really don't want to bother giving something a name and saying the name over and over again - you just want to say "it" and move on. -- AdamSpitz

What this does not take into account is understanding the change, understanding where the change is to land and testing the change; but these are things you have to do anyway, and I claim they are not affected drastically by the terseness of a program.

I claim they are drastically affected by the terseness of a program, especially if the people maintaining the source code aren't the original authors. -- EricHodges

I'd say these are quite orthogonal (the length of identifiers and the clarity of identifiers). Of course, it is sometimes impossible to make clear identifiers if you try to stuff them into at most two characters each; but if you're allowed as much as five, you can already almost always choose a descriptive identifier. [Are you stating that 5 characters is enough to make clear identifying names for objects in many/most situations? It's enough for methods, instance variables, and local variables. It's not enough for global variables, classes, and global functions. If your packages are small and pretty, it's enough for package variables and package functions. But it's not optimal.]

I've read a lot of code with short but good identifiers and code with long but obscure identifiers. In my experience, there is little correlation between these two properties.

Unless you're saying that they are made easier by the terseness of the program. But I wouldn't say they're made drastically easier, either. -- PanuKalliokoski

I used to work with Whitesmith C on PDP-11s. Identifiers were limited to 6 (significant) characters. The code was impossible to understand at a glance. Every method had to be explored to get any insight into what it really did. Intent was not clear. I've also seen lengthy but misleading identifiers, but that's no reason to be terse. For any code that is serious or valuable, inheritors are my target audience. The code should be holographic to them: they should be able to read and understand any slice of it, be it a line, a method or a class, without having to dive into the rest of the code.

Perhaps we have different audiences. I often write code that will be maintained (hopefully) long after I leave by people I've never met. Code I write for myself can be less verbose, but even then I like to be able to come back to it a few years later and understand why I did what I did without too much trouble.

-- EricHodges

Are Long and Descriptive Related

It appears much of the discussion above concerns long versus short and little about descriptive. I don't see that length and descriptiveness are correlated, i.e., I cannot define an optimal length that corresponds to an optimal descriptiveness.

If one accepts the two qualities are no correlated, then one must decide how to deal with them. Some potential alternatives:

Optimize descriptiveness and ignore how length is affected
Optimize length and ignore how descriptiveness is affected
Optimize the result of a function based on length and descriptiveness and ignore how each is independently affected.

The third option is really a superset of the first two with the importance of length set to zero and the importance of descriptiveness set to zero, respectively.

Length and descriptiveness are related. Length constrains the amount of total information in a signal. Abbreviations increase ambiguity. If they weren't related, option 3 above wouldn't be possible.

Does making a term longer necessarily increase descriptiveness?

No, it creates means for the programmer to come up with descriptive names.

Descriptiveness approaches a limit as length increases. I see no good reason to constrain length at the expense of descriptiveness.

[Why not leave the discussion at this point. There's a golden middle that depends largely on personal preferences. Too short is clearly bad, too long is bad also.]

It would be nice to see the contributors agree to this and go one further by refactoring the discussion. For what it's worth, I've been loosely following this conversation and agree with the GoldenMiddle? resolution.

The subject is also much discussed under IdentifiersAreComments.