Underscore Versus Capital And Lower Case Variable Naming

Which is easier to read? I sent this around the office a few months ago. -- TqBerg

WhenInTheCourseOfHumanEvents

 -vs-

when_in_the_course_of_human_events

 -vs- (the Lisp way)

when-in-the-course-of-human-events

 -vs- (the COBOL way)

WHEN-IN-THE-COURSE-OF-HUMAN-EVENTS

 -vs- (ugh! the Visual Basic way)

[When In The Course Of Human Events]

 -vs- (the MySql way)

`when in the course of human events`

([In Forth,] case doesn't matter; upper or lower-case words can be used, as long as you reference the word the same way. -- Samuel A. Falvo II) ...although you will often see a lot of legacy Forth code in all-caps, simply because it was written in the era of all-caps terminals. It is also bad style to have such a long word; it indicates poor factoring.

Actually Forth style would be: EVENTS HUMAN OF COURSE THE IN WHEN :-)

(Ancient Forth would only store the first three letters and a symbol length, so it would be WHE-------------------------------.)

Now try in a fixed pitch font ...

UnderscoreVersusCapitalAndLowerCaseVariableNaming -vs- underscore_versus_capital_and_lower_case_variable_naming

This is a straw man argument above, because variable names are not that long. Code already contains lots of space and symbols such as brackets, operators, etc. See also:

http://z505.com/cgi-bin/qkcont/qkcont.cgi?p=Underscores-Discussion

  forExample(This - Is) + notHard * toRead(atAll) 

  for_example(this - is) + hard_to_read * it_contains(too_many_symbols)

Code does not contain nonsense such as ThisVeryLongSillyStrawManArgumentAgainstPascalCase?.

I'd say that they're both pretty awful. My preferred style is to use CapsSeparation for words, but underscores for punctuation. Of course, if your names become this long then there's probably something else wrong. -- DaveWhipp

See BadVariableNames.

They are both icky. I tend to use whatever method is common in the codebase or language I am working in.

For_very_long_sentences, underscores_clearly_makes_the_text_easier_to_read. But identifiers are seldom longer than StringIndexOutOfBoundsException... so for me the issue is rather: which is easier to type? And then mixed-case wins hands down. -- JonasLindstrom?

Code will be read many more times than it is written.

Underscores are harder to type than capitals, since the underscore is so far away from the home row. I find that the repeated extension of the right pinky can make my wrists sorer.

So re-map your keyboard and put it somewhere sane. (Dvorak puts the hyphen/underscore on the home row, where the apostrophe key is on qwerty. ;-)

Have you never been forced to use someone else's computer with a non-standard mouse, let alone non-standard key mappings? Even if you aren't practicing PairProgramming, it interferes with guest driving, leaving you stuck with the less-efficient verbal back-seat driving. -- A reformed CompulsiveCustomizer with another sore pinky [SeanGugler?]

From this arguments you shouldn't conclude that it's bad to remap a keyboard, but rather that it's bad to not allow remapping in a quick and easy way. I regularly switch between German QWERTZ and English Dvorak in a fraction of a second. The former actually is the standard key mapping a guest driver would expect here in Austria; however, it forces you to use AltGr? for a couple of characters you often need when programming. You wouldn't use it all the time just to accommodate guest drivers. -- NeKs

At the moment, I prefer caps and lowercase, primarily because I'm doing a lot of java and that is the idiom, but also because a number of IDEs treat the underscore as a word break (which, granted, it is) and when I double-click on a variable with underscores, the IDEs only select up to the underscores. In general a lot of tools make it harder to work with underscore separators for something that is actually one thing.

Having said that, I know a good argument for underscores, in the case where you have developers whose native language is not English (or whatever they are trying to code in) and don't know the words so well, it's easier to see the underscores as word separators.

-- StevenNewton

As a LispBigot? I should add a vote for words-separated-by-hyphens, which is idiomatic in both CommonLisp and SchemeLanguage. Look, Ma, no shift key! -- DanBarlow

The value of mixed-case is that it can be easily read as either a series of words or as a single token among operators in the context of a program. Hyphenated Lisp words scan well because Lisp tends to spell its operators and therefore removes the visual competition. Fortran, by the way, allows spaces to be inserted at will into variable names. It didn't take people long to figure out that that was a bad idea. -- WardCunningham

Furthermore, FORTRAN does *NOT* require spaces between words:

  DOI=1
vs
  DOI=1,10

(The first is an assignment statement, the second the start of a DO loop, or what's known as a FOR-loop these days.)

Lisp doesn't spell more of its operators than other languages, but it uses prefix notation. So the difference between (- foo bar) and foo-bar is more marked then between foo - bar and foo-bar. Infix Dylan distinguishes between the two latter forms and requires spaces for its operators. -- Lieven

I believe an early Venus probe crashed because somebody keyed in something like:

DOI=1.10

Instead of the intended:

DOI=1,10

I also like the ability to use non-alphanumerical characters in Scheme, especially since there are some good naming conventions for using them. Especially use of !, ? and ->, as in null?, set-cdr! and integer->char. -- StephanHouben

COBOL LIVES!!!! "MOVE EMPLOYEE-PAY INTO PAY-AMOUNT OF SOME-ODD-RECORD." (COBOL requires spaces around the subtraction operator - in the few places where it allows a general algebraic expression.) -- JeffGrigg

Several years ago, I was involved in a project that had two captives along with about a half-dozen or so rent-a-bodies working on a message-based (pSOS) system. [Never mind what the product was.] There was a typical RWAR raging for a while before the group leader said, "Okay, everybody get your !@#$%! together, and I mean like right now! Come up with a standard for this stuff!" So we did. One of the conventions we settled on was the use of init cap concatenated names applied not only to variables, but to classes and methods as well. Why not? If a naming convention is good for the goose, it's good for the gander. I love to use the example of an input buffer. What do you call the durn thing? Here are some examples:

 input_buf
 input_bfr
 inp_buffer  (oh! too long!)
 inp_bfr(ahh, much better)

You can see how ridiculous this can get. And this is just one lousy entity! We ended up deciding that names, no matter what they described, should be whole words, spelled out, concatenated, with init caps. The variable name is, "InputBuffer." It is always created the same way, spelled the same way, accessed the same way. There is no question about slubs - er, sorry, underscores - or abbreviations or any of that noise.

As far as names getting too long, what are you whining about?!? If you use an OO language like C++ the names are relative to the local entity - class, method, whatever. A name like InputBuffer should only be used in the context of something like a serial device driver or a math processing unit. In fact, there is no problem at all in reusing the same entity name several times in the same program as long as it is used in different scope. Be careful about your naming conventions, design good separation of your objects, and names never get to be too long. Eh? -- MartySchrader

CapitalAndLowerCase? is also called CamelCase.

Okay, according to the definition provided on the CapitalizationRules page, the style we settled on is Upper Camel Case (a.k.a. PascalCase). This particular convention violates the Wiki convention of not allowing contiguous capital letters in an entity name. We decided to keep it because that way we would have no exceptions. Even if we had a name that contained a well-known abbreviation the abbr. would be spelled out as a substring of caps (MinorRAMTest). To repeat: this stopped confusion. The rule was the rule. -- MartySchrader

In other words, "Always, always, always use good, unabbreviated, correctly-spelled meaningful names." -- MeaningfulNames

What about the Ada style for multiple word variables, which is to use both underscores and capitals. So instead of

TheCourseOfHumanEvents

the_course_of_human_events

we have

The_Course_Of_Human_Events

I personally find this most readable and wish it had become more common. I feel like someone who just doesn't get with the program when I use it myself (except in Ada, where it is with the program, which I haven't used for many years). When in Rome... (even if the Romans are arguably quite oppressive?)

The run together names are just too hard to read. The underscore names are not as bad, but the underscore detracts too much from the horizontal flow (the lisp convention is thus probably better than either). Perhaps the capitals and underscores work together to avoid the worst of both.

What about the Possibility of MixedUsages as in TheCourse_of_HumanType_Events and TheCourse_of_ComputerType_Events being included in a broader -> TheCourse_of_Events?

I've developed a new habit while programming in StandardMetaLanguage and found it improved readability nicely: using underscores for naming the variables and CamelCase for functions. (In SML, it's a nice to be able to distinguish quickly between them, since functions are often passed as parameters.) -- PatrickParker

ErlangLanguage actually enshrines this rule (well, the contrariwise rule) as SyntacticallySignificantCapitalization: words beginning with a capital are parsed as variables, those beginning with a miniscule are parsed as barewords or function names. It plays well with the pattern matching:

my_fun( [ list_tag | ListOfThings ] ) -> do_stuff(ListOfThings).

means match a list whose car is the bareword "list_tag" and do stuff with the cdr.

Can anyone confirm or deny the oft-cited claim that non-native speakers of English find underscorey_words easier to read than CamelCase/PascalCase? (See EmbeddedUnderscore for one instance of this claim.)

Not all languages have two forms (capitals/lowercase) for characters.

At any rate, it seems like a YouArentGonnaNeedIt argument. Why would we optimize our programming for who might not know English well?

As a non-native English speaker, I can neither confirm nor deny this claim. Underscored may be easier to read, but I don't think it has anything to do with my English skills.

I agree with YAGNI on this one. Besides, if you don't know English well enough, you are not likely to become a good programmer. There are just not enough translated texts, even for the largest non-English languages. -- AndersBengtsson

This is a nice example of SurvivorshipBias. You know, people who have a lot of difficulties with English will probably not read this page to the end or even do a lot of wiki reading...

Exactly! ;)

"There are just not enough translated texts, even for the largest non-English languages." Actually, this is completely false.

The second largest software maker is the German company SAP.

You can also get all of Microsoft's latest programming, OS, certification, etc. books in German.

You can find them here: http://www.microsoft-press.de/

Here are just a few examples:

 - Microsoft Windows 7 - Die technische Referenz
http://www.microsoft-press.de/product.asp?cnt=product&id=ms-5927&lng=0&sid=80f6ea72228348ca21e120eeda7bd900

 - Microsoft Exchange Server 2010 - Ratgeber für Administratoren
http://www.microsoft-press.de/product.asp?cnt=product&id=ms-5668&lng=0&sid=80f6ea72228348ca21e120eeda7bd900

 - Handbuch der.NET 4.0-Programmierung
http://www.microsoft-press.de/product.asp?cnt=product&id=ms-5438&lng=0&sid=80f6ea72228348ca21e120eeda7bd900

 - Microsoft Training & Certification books for Windows 7/Vista/XP:
http://www.microsoft-press.de/catalog.asp?cnt=catalog&cat0=21&idx0=6&lng=0&gr=b%FCcher&cat1=22&idx1=0

 - PHP programmieren unter Windows
http://www.microsoft-press.de/product.asp?cnt=product&id=ms-5447&lng=0&sid=80f6ea72228348ca21e120eeda7bd900

AgglutinativeLanguages, like German and Swedish, combine many words into one, where English uses separate words ("pairprogrammingexpert" vs. "pair programming expert"). Does this affect the ability to read longVariableNames?

I guess it would. I'm a Norwegian and used to reading words like (translated into English) "beerbottle", "beerbottlelabel" and "the beerbottlelabelmuseum". And, English speakers trying to learn Norwegian often find this a problem. -- KurtGeorgeGjerde

An identifier should be easy to read, but it should also be easily recognizable as a unit. These goals are contradictory: the more you separate the individual words within the identifier, the easier it is to read. But at the same time, the words appear less to be a unit, potentially making it harder to see the code structure.

How much this is a problem of course depends on how much the language syntax and conventions already separate identifiers from the surrounding code. I could, for example, imagine preferring CamelCase in Java but words_with_underscores in Python. I'll have to try that out... -- FalkBruegmann

Ruby, Python and other languages often use CamelCase for class names and underscores for other names. It may be a coincidence, but I think that for class names the "unit-ness" is very important. Method and variable names have other tradeoffs.

I HaveThisPattern, when writing personal (CeeLanguage) code, i adopt the following convention: THIS_IS_A_CONSTANT, ThisIsAType, this_is_a_variable. I find that it makes it clearer what is going on when looking at code you wrote a while ago. (I lump function names in with variables on the grounds that you can take the address of both). I've recently been learning RubyLanguage, and I guess one of the things which appeals to me is the language uses the same convention that I'm used to...(HappyCollision?)

I would call this a matter of style and forget about it. I can read either approach equally well. As long as you use a good name, I don't care how you may choose to type it in.

Just look at the top of every page and you'll see the best naming convention: PascalCase. If it works for Wiki, it works for me!

An identifier should be easy to read, but it should also be easily recognizable as a unit.

Excellent point.

If we are trying to write identifiers that are as near to English as possible, then isn't post_code nearer to "post code" than PostCode?? i.e. to get back from post_code to the original you just replace _ with " " whereas PostCode? back to "post code" requires more steps. Thus doesn't post_code preserve the information in the string "post code" better than PostCode?? Simon Redfern.

It's postcode (one word), at least in my dialect of English.

I think this is getting quite silly. I find either method usable. When I read code, I only try to decipher the name once. From then on, I merely recognize the pattern of the letters. This is the way the brain reads; it recognizes a pattern and even with gross misspellings, one can still recognize the word. When typing in code, I will often use a mix of Camel case and underscores depending upon how much unity and separation I might desire in the name. When I go back and read the code, however, I am not sure this makes a whit of difference to understanding what I wrote. Use whatever convention you prefer, it really doesn't matter.

One very good reason to use underscores, beside readability, it that it is much easier to write regexp to change variable names when refractoring code.

Seems like the tail wagging the dog

Some languages have more clearly established patterns than others. In Java for example, if you don't obey the universally observed case rules it looks very silly (it is no longer Java, it is C#!)

The otherwise perfect Python language uses this_style, ThisStyle? and thisstyle in its core functions, and Guido van Rossum actively encourages developers to use whichever they like in his Python Style Guide. I find that unfortunate, because it means that I cannot ever really look at a piece of code and say 'this is perfect, I do not want to change a single character', because I am always slightly bothered by the case conventions. They look wrong/right/strange/great to me depending on what other languages I have recently been exposed to (C, C++, Java, and especially other people's python code of various case conventions). I don't actually care which it is but I want there to be 'one best way to do it' (otherwise I'd use perl!) I want to concentrate my thoughts on the code and the meaning, and not to have to think about the case conventions.

-- Thomas Munro

Python style is in PEP 8 - <http://www.python.org/peps/pep-0008.html> Except where there's pre-existing project styles or legacy items, casing is now pretty much mandated [I believe the intention is to strengthen the language in PEP 8 some time - probably as part of the push to clean up the library for 2.5]

I translate between languages occasionally. I use This.Style (This.HTML.Block) or This_Style (This_HTML_Block) in the source, and produce whatever style is appropriate for the destination. This is much more difficult to do when the source is ThisStyle? (ThisHTMLBlock). -- PeterLynch

I prefer to use a variety of names, each with semantic meaning. This allows an indicator to be embedded into variable names. The convention I use is: myVariableName = local variable, parameters MyVariableName? = property (get/set accessor), method name _MyVariableName = the private copy of a variable for each property

So in C#, we have this: private int _MyVariableName public MyVariableName? {

get { return _MyVariableName; }
set { _MyVariableName = value; }

} public MyMethod?(string myParm) {}

So the initial upper-case and the underscore prefix both indicate that a variable is used in a property, and how it is used. I found this useful.

I do this..

  instance var's prefixed with "the"
  local vars camelCase
  params prefixed with "a" or "an"

  private int theVariableName
  public VariableName? {
get { return theVariableName; }
set { theVariableName = value; }
  }
  public Method(string aParm) {
String localVar = "";
  }

And find it much more readable, try reading an _ out loud and see how it sounds.

[humor]The proper pronunciation is, "slub."[/humor]

Has any language allowed spaces in identifiers? AppleScript allows spaces in an application's class and property names, but not (as far as I know) in user-specified variable names. Several languages allow escaped spaces.

Apart from return (which is inconsistent with while and for anyways, as a flow control keyword with a parameter) and declarations (which are a non-context-free mess already), even C is most of the way there.

To do a snippit from CeeLanguage:

  $string copy (destination address : char*, source address : char*) : void {
while (*destination address++ = *source address++);
  }

Or the properties above in C#:

  private $the variable name : int;
  public $variable name : int {
get { return(the variable name); }
set { the variable name = value; }
  }
  public $Method(a parameter: string) : void {
$local variable : string = "";
  }

The obvious problem seems to be that it's no longer as obviously a unit, but it does mean that you can just say "type it like it wasn't code".

(For the regex guy, \w((\w|\s)*\w)? would find all the keywords/identifiers.)

-- ScottMcMurray

For those interested in the effect of the various wordstyles, the simplest test of all is ItWorks. That it is clear to all as to why is of secondary interest. If the figures I am hearing are correct, 90 percent of programming is in the form of maintenance programming, so most often you are in Rome, and will probably do as they do.

But if you want another approach in a UserStyle_ManuallyPlaced_AutomaticProcessed, I believe the following examples may be Useful if not Usable and Used:

Using_CompoundWords_SeparatedByUnderscores:

 * HumanEvents_WhileInProcess  
 * VariableNamingStyles_Underscored_or_CapitalizedAndLowerCase 
 * Example_ThisIsNotHardToRead_ContainsFewSymbols
 * WordSeparation_UsingHyphens

--DonaldNoyes

Huh???

There's one good reason for keeping punctuation out of names: punctuation is used for operators. When reading code it takes extra attention to separate the operators from non-operators in names. That's all the justification I need for using camel case for everything. -- MarcThibault

Almost everyone has an opinion on this subject. I'll join in.

CamelCase is much easier for me to type than underscore_spellings because my fingers type an error about half the time when I stretch for _. This distraction affects my bug rate in two ways. First, I may not catch a misspelling. Second, when I do catch a misspelling, I am distracted from thinking about the algorithm I'm coding.

I like underscore_spellings because they are easy to read. Therefore, I need an editor with an optional feature to convert abbreviations/misspellings to desired spellings, for example us into underscore_spelling. This feature should not show preference to any type of spelling, for example cc might convert to CamelCase. Whatever the user of the editor configures, and it would be best if the editor remembered a configuration per program as well as one per person. The per-program configuration would be nice if comments at the end of the program configured the abbreviation/misspelling changes to correct spellings.

Moreover, compilers that enforce any kind of spelling rule annoy me. The compiler should not be a spelling enforcer. Programmers, their managers, and program managers should be able to select spelling standards.

Finally, do in Rome as the Romans is OK to a point. However, if the current spelling conventions are causing problems, it is time for ReFactoring. --EdwinEarlRoss

Or else perhaps _The_Course_of_Human_Events

There are many opinions here about which naming form is more readable. Unfortunately, there isn't any data.

Someone should put a test on the net with pages to read containing various name styles CamelCase, with_underscores, Lisp-like, English like, etc. One name style per URL page, basically the same text on all pages, about 250 words. It's probably best to present the pages in random order per person taking the tests, since all will contain basically the same 250 words.

There should be a few buttons to push on each page, like a combination lock, that if pushed in the correct sequence cause the reading speed for a page to be recorded in a database. Instructions on which button to push should be located in the text at different locations. For example, "Now is the time for all good men to come to the aid of their country. Now it is time to press three. The love of my life...."

A different button combination for each page that contains a different naming style. It's not much security, but a little.

Each reader pushes a button to start a timer, begins reading, ... finishes reading, and presses a button to stop the timer.

The test would collect the reading speeds and publish the data. It isn't a controlled study, but like c2.com/wiki, people will be honest, mostly.

I would do it if I could. Always worked mainframes and minis. Don't have the right skills to do this one.

Perhaps there is an easier way to collect the data, anyone else have any ideas? -- EdwinEarlRoss

PS: I made an emacs program to display three different versions of the same text, English, CamelCase, with underscores, and again English. It gives the reading time for each one. For me CamelCase is not as quick to read as English, which surprised me, and with underscore is about as quick to read as English. -- EdwinEarlRoss

This makes sense to me; spacing is significant to delineate words, and most words you encounter in reading are lowercase.

The serious psychologist's way of testing this would be to calibrate reading speed against memorability (e.g. read a list of phrases, then do a quiz on whether certain phrases appeared or not.) This is actually the kind of thing AmazonTurk? would be perfect for collecting a large sample of data on.

See: MeaningfulName, CapitalizationRules

CategoryRant