Misunderstanding Hungarian Notation

Hungarian (as CharlesSimonyi has complained) is more than just type checking - it helps you distinguish the intent of a variable. For example, if a "pe" is a picture element, and you have an array of them, then "npe" is the number of picture elements in the array, and "ipe" is your current index into the array. These two variables are both ints (and they really have to be the same type, since they have to be compared), yet the Hungarian lets the programmer do "semantics checks" beyond mere type checks.

Then again, there was a function that took a pointer to an index of a picture element, so it was called "pipe", and you start to gag on the whole notation (I'm not making this up).

-- BillTrost

Types or intents - it still doesn't work. npe may be Number of Picture Elements but how do you know it's not Next Pointer to Edge for instance? Why not use numPicElems?

-- JonHanson?

Or for that matter numberOfPictureElements? Or even, in Algol68 (AlgolLanguage), number of picture elements?! JasonGrossman

Which, using the InlineTemp? refactoring, might become pictureElements.size()!

HungarianNotation is easy to misunderstand. For example, some folks have the charity to grant it the ability to express "intent". Raw HN was strictly type-based because the old online help systems could actually look up the "official" HN prefix for a variable. This stood in place of a real source browser database.

Type is orthogonal to intent. Don't try to mix them. Type is a vehicle for expressing intent. If you find that your types are orthogonal to your intent, then your type taxonomy is broken. -- AndersMunch

That misses his point. The point is that there are many ways of using types in order to express your intention, and any single use of types might suit multiple intentions, so I disagree that this flexibility means anything is broken. They're just different issues. The point here is that names should reflect intent; having them reflect type, instead, is too indirect. It mixes up types and intent.

''' This "missunderstanding" is somewhat very similar to story by KirkBailey in ZenConcepts.

Use warts on your variables as abbreviations for the more common intents in programming. Use 'str' for string, 'p' for pointer, 'm_' for member. These add a lot of meaning & self-documentation. Use 'str' to declare you intend the variable to hold text, whether the raw type of that variable is 'LPCTSTR', '::std::string', 'char const []', or whatever.

-- PCP

Why use "str" when you can use "string" or something more precise like "name"? Why use "num" when you can use "height"?

Hungarian Notation is rad kewl, d00d. I like the secret DecoderRing aspect of it all! It turns programming into a game! Solve the puzzle of the types, and win a prize! I kid you not, I once came across the following variable name:

 prgrgpszDictionaryBase

What the hell is that? Why it's a pointer to a two dimensional array of pointers to null-terminated strings, or at least it was in the dialect of HungarianNotation that one company I worked for used. See how your understanding of the type of this variable has been enhanced?

I have these cool keys on my keyboard that scroll the page up and down. If I come to a variable of some unknown type, I press these magical keys, and there I see the type declaration. I don't need to memorize both C's syntax for type declaration and another notation. One notation works for me with the awesome power of page up and page down.

If you write smaller functions you don't need to page up at all. Perhaps HungarianNotation dates from the days that people wrote 1000 line functions so you couldn't remember where you left the variable declaration.

I don't know if the official CharlesSimonyi flavor of Hungarian Notation would ever construct something so painful and stupid as my example. But there are plenty of people who promote their own private mutations of HungarianNotation.

: It does, it does! Ten years after the fact, and I can still read that variable name! Isn't it amazing the garbage the human mind can retain? -- BillTrost

Needless to say if you are working in a typeless language (or a language where types are more fluid), HungarianNotation makes absolutely no sense.

-- JohnPassaniti

Actually, I would argue the opposite. HungarianNotation is not needed in typed languages; the compiler does much of the checking for you. Where it is helpful is where the compiler or interpretter does behind the scenes type conversion for you. I mention this due to recent Y2K scars from VB and Oracle date to string and string to date conversions. However, in most cases, a good variable name will implicitly identify the data type. There is no real advantage to additional decoration which detract from the readability of the name. -- WayneMack

Hmm. Seconded. I think hungarian notation is loathsome, but you need something in JavaScript to help distingush between objects that are meant to be class template and ones meant to be instances, or objjects owned by this window and objects borrowed from elsewhere. -- PaulMurray

I think that HungarianNotation was probably a pretty rational reaction to C. The unfortunate thing about C is that it is very easy to develop a program where you alias representation in a bad way.. a program which is silently wrong. You just have some data which has been corrupted.. lying in wait like a timebomb. Statically typed languages get you past that hurdle. Interestingly, dynamically typed OO languages do too. If you type mismatch in a dynamic language, a message send will fail. That could be bad at run-time, but at least it is obvious. It isn't as nasty a demon. -- MichaelFeathers

If I'm dealing with a "Height" then is it a double, an integer, a structure holding a number of feet and a number of inches or a class that provides a whole host of Height operations? iHeightOfJohnInInches, dHeightOfJohnInMeters etc. go some way to solving that. Furthermore I'd rather choose selectively from the rules presented rather than slavishly following any rule that forces me to use a name like "prgrgpszDictionaryBase". My choice is to use a "slimmed down" version of HN to prefix variable names with information that makes sense to me and the readers of my code. Why come up with all the possible bad results from following rules slavishly - is that your idea of something that's "rad kewl, d00d"? -- JohnHarding

Why not just look at the declaration of the variable? Isn't that where you got the information when creating your personal decoration for the variable name? Why retype it?

If it's a Height, it ought to behave like one. Why does it matter if it's an int, a double, or a class? It's a Height - you said so yourself. What am I missing?

It matters because many languages distinguish between these types, either syntactically or semantically or both. For example, what is "Height / 7"? If Height is a double, you'll get exactly one-seventh of Height (ok, almost exactly). If Height is an int, you'll get approximately one-seventh of Height. If Height is a string ("5 feet 6 inches"), you'll probably get a syntax error. If Height is an object, then what you get depends on whether and how the class supports the division operator.

Sure, you can always look up the declaration of Height. Set a bookmark, page up to the top of the file where the class members are - dang, whoever wrote this didn't follow CodingConventions, it's somewhere in the middle of the file - find the declaration, return to your bookmark. No, not the bookmark you set when you were looking up "Width". Now suppose it's declared in a different class. What was that filename again? What directory was it in? Bring up the file browser, find the file, search for "Height". Let's see: WindowHeight... HeightPixels?... HeightWeightTable?... HeightenContrast?... it must be here somewhere.

When you're working with hundreds of variable names, this very quickly gets tiresome, especially if it's someone else's code, or even your own from months or years ago. And even if it only takes you two seconds, that's still about 1.99 seconds more than it would take to see the "i" in "iHeight" and remember that it's an int.

Except that, unfortunately, some other maintainer came along and needed to change it to floating point, but couldn't be bothered doing a global search and replace on the name, so it's still iHeight, with of course no complaint from the compiler, and now you are mislead into believing it's an int, when it's not.

It's an AntiPattern to do language extensions that are not enforced by software; if names are to reflect type (assuming your argument, for the moment, that this would be a good thing), then you need compiler errors to flag this kind of botch.

Also, if you're using any half-decent editor (such as Vim ;), you can run ctags on your source, and then jumping to a class declaration becomes a single key-press operation. Use a tool appropriate to the job.

I agree that Hungarian can help with those issues, but there are other tacks you can take which give you other benefits as well. For instance, in a well-structured system there are only two places you need to look to find the type of an object: one to seven lines above where you are (for method parameters and local variables) and in the class declaration. On the other issue: encoding units in variable names can make things clear, but the QuantityPattern is often a safer choice. It is a shame that we lost the MarsClimateOrbiter? over units mismatch. -- MichaelFeathers

Hungarian notation is for lazy programmers, because it allows them to not to give meaningful names to variables. Examples are b for a flag, sz for string, p for pointer etc. If you prohibit its usage, programmers will begin to think more about variable names. -- SavasAlparslan

Yes it gives the illusion that thought has gone into the name, also it makes a meaningful but wart free name look too simple -- AndyMorris

Is the problem with the intention of Hungarian Notation, or with the syntax? ColorForth type of editing could be used to decorate names with CSS attributes at display time in order to illustrate the properties of what the names represent. For example, when the mouse hovers over DictionaryBase?, the mouse cursor will be swapped for an icon of an arrow pointing at a 2x2 grid. In the top left cell is an icon of an arrow to a series of elipses with a trailing slash, followed by (0..3, 0..5). This shows that DictionaryBase? is a pointer to a two-dimensional array of pointers to null terminated strings. Of course the editor would be configurable (by swapping in a new XSL stylesheet) to have a variety of display possibilities for this type of display, including prepending Hungarian abbreviations to the variable name. When content and presentation are properly separated, no developer need ever see content in a presentation form they do not like.

I think the problem is that HungarianNotation conflates the variable name with its properties. Your idea would resolve this problem, since the name would be unaffected by the type/size or the variable. -- PeteHardie

I just bought a new carFord. My strFirstname is Dave. Hungarian notation in many cases is repetitive and redundant.

or the object-oriented version:

 I.history.recent.buy(new carFord)
 I.name.first = Dave
 notation.Hungarian.properties.add(repetitive)
 notation.Hungarian.properties.add(redundant)

The counterexample, of course, would be that I just bought a new Yamaha. What do you mean, how fast is it? Argh... no, not a motorcycleYamaha, a keyboardYamaha.

The countercounterexample is that you don't name variables "Yamaha" or "Ford". You name them "motorcycle", "keyboard" or "car". In HN they would become "strMotorcycleManufacturer", "strKeyboardModel", etc. So you are both wrong and do not give any new insight to this discussion (also "strFirstname" should have been "strDave" in order for you to be consistent). And "strMotorcycleManufacturer" is also a bad variable name. The correct way is to have a "strName" member variable in a Manufacturer object in which case the str prefix becomes redundant (the obvious type for a name is string). So if the type of a variable is not obvious then the context of the variable needs refactoring. This leads me to the conclusion that HN is a MedicineWorseThanSickness?.

I agree that the biggest problem with Hungarian Notation is the issue of burying representation and meaning in the name of a variable. Variable names should not have to change when their type changes. One of the funniest arguments against Hungarian notation is in the DelphiLanguage style guide which states:

Delphi is created in California, so we discourage the use of notation, except where required in header translations:

-- TobyFarley

I think that misundestanding Hungarian Notation is a symptom of sanity. Why did such a fool thing becomes so popular?

When MicroSoft scattered it throughout the MicrosoftWindowsApi API?

In his article Making Wrong Code Look Wrong, http://www.joelonsoftware.com/articles/Wrong.html, JoelSpolsky (a former Excel developer) differentiates between Apps Hungarian (the way it was meant by CharlesSimonyi and was used in the application development group at Microsoft) and Systems Hungarian (the way it was propagated by the Windows documentation writers, as propagated in the books by CharlesPetzold?). According to Spolsky Apps Hungarian was hinting at a higher semantic level, to indicate e.g. relative coordinates instead of absolute coordinates by the use of proper prefixes, while Systems Hungarian degenerated to plain type prefixing.

-- MarcVanWoerkom?

Yes! This is very critical. The current state of the c2 Wiki is in confusion about HungarianNotation! JoelSpolsky did a great job of clarifying the issue. The thing that HungarianNotation became is what is commonly hated. This is because it is (was) a buzzword propagated into complex schemes with some hope of raising productivity or reducing errors. Putting the type of variable in the variable name is idiotic, at best. Noting that a certain variable is used for something in particular by Apps (true) HungarianNotation is exactly the same as using a desriptive variable name! winMaxRows, Max_Win_Rows or something similar == rx_Max. It's a little more compact, and I prefer whole words in my variable names, but it was a DamnGoodIdea? noetheless!

-- PhilipMaynard?

I'll grant that if JoelSpolsky is correct, HungarianNotation had a valid place in the useful toolset. But now, with compilers and interpreters that can hande identifiers longer than 8 characters, why use HN when you can just name the variable clearly without the abbreviations?

I really dislike Joel's solution in that article. The proper solution is to have strings maintain their own taint, and act accordingly. You can even make the whole process silent, where tainted strings are automatically untained before displayed or concatenated. Hungarian notation is like a trap laid down in your code, making future refactorings unbeliably difficult to perform (in the worst case). If you change a variable from one type to a liskov-substituitable type, you may potentially have to change not just ever declaration, but every use of that variable in your entire system. Joel loves to solve technical problems with social solutions, despite the obvious risks. I am not sure why his opinions are so popular. -- DaveFayram

As I read Joel's proposed use of HungarianNotation, he convinced me that what he really wanted was a WholeValue representing TaintedString?. First use case based on the description of the article, it would have methods for returning an untainted version of itself, a version that is both untainted and escaped where necessary, and of course the original exact string. In his example code, TaintedString?.untaint() would replace Write utString. In a language with manifest typing, then create an UntaintedString? and code the UI output to require that type. -- StevenNewton

That was my thought, too. Joel really wants a more featureful String class

That works great, so long as you control every API into which you need to pass your more featureful String class and every API that might return one. The RealWorld is a messier place, full of some_legacy_function(str.c_str());.

He'd probably use one, but since it's C code you don't get classes (nor a C class) in the first place. Ruby does provide tainting/untainting mechanisms on it's string class, but you just can't use that in C, and can't test a simple array of chars for taintness.

Joel does that all the time. He's totally convinced of his own amazing ability as a software engineer, so if he can't find an immediate technical solution, he figures it must be impossible (See his thoughts on interviews and pointers, it's impossible to do doubly-indirect points in your head? Maybe for Joel). This Hungarian notation mention here is the perfect example. A richer string class, or even just two methods (myStr.safe() and myStr.raw()) would have sufficed to make it perfectly safe, just as obvious to a human eye, and far more error resistant. -- DaveFayram

Or even a single string type with an added phantom type parameter. So you could have (String Safe) and (String Tainted), both would be isomorphic and still different. At runtime they'd be the same again. The same could be done for the classic example, numbers and indices. Just declare a Number and an Index type, make them isomorphic to int, but still distinct. Tack on phantom parameters to make Numbers of different things different, too. No warts on names needed, the compiler checks if things add up. Amazing things can be done if you don't restrict yourself to the dated type system of C, but despite his occasional dropping of the words Haskell or Lisp, Joel really only knows C and VB. How sad, he's a talented writer and could write some really profound stuff if he'd only look beyond C.

In defense of Joel: It would be wonderful if you were a programmer who wasn't restricted to something like the dated type system of C (or a braindead environment like PHP). Some of us are stuck using languages like this--and I have come to realize that using some sort of naming convention, like cleanString or taintedString, is very useful. Heck, I do a lot of programming in Python, and I find it convenient when, say, I loop over an array, to give the index a name that describes both the thing being looped over, and the condition it's in, before I transform it into a condition that I want it to become!

And I second the notion that you don't always get to create a string class: you sometimes have to work with the datatypes that a given library will accept--which can be a pain sometimes, you know? --Alpheus