Static Typing Repels Elephants

Remember the old joke about the guy selling elephant repellant?

Salesman: "Want to buy some elephant repellant?"
Customer: "But there aren't any elephants around here!"
Salesman: "See how well it works?"

Seriously, StaticTyping folks: there aren't any elephants around here. These "type errors" you're worried about just aren't a problem.

Seriously, DynamicTyping folks: go read more about types. See TypefulProgramming for example. A weekend crash-course in Haskell, Ocaml or other modern functional language wouldn't hurt either.

Those languages do a better job of keeping static typing from getting in the way. But any feature of a programming language getting in the way of the developer is a pretty significant cost. It's better to not pay that cost when type errors aren't a problem.

In some applications (and most things written by corporate IT departments do not fall into this category), a dynamic type failure (i.e. DoesNotUnderstand, bad_cast, or the like, for which the only appropriate program response is to halt) is an error just as much as a traditional "type error" (wherein the program invokes UndefinedBehavior when trying to do something inappropriate for some object), and just as unacceptable. While a graceful type exception is preferable in any case to UndefinedBehavior - it is useful in some cases to have a stronger guarantee than that provided by UnitTests that "type errors" do not occur at all. To do so requires StaticTyping of some sort (as well as programmer discipline to not circumvent the type system using casts and other dirty tricks).

In other words, some countries do have elephants, and they need to be repelled.

Can the author please provide an example of code written using TestDrivenDevelopment that would let typing errors through? I have a mild doubt that this situation exists, however I'm honestly curious as to what people have in mind here. [ed. I meant to say that I doubt this situation does exist]

While gross type errors like trying to pass in a string where a number is expected are likely to be caught by all but the most lackadaisical suites of UnitTests (excluding, of course, polymorphic functions that can handle both a number and a string), I can envision quite a few scenarios that may not be caught by UnitTests:

Of course, the operative word is may. You can always add explicit UnitTests (or assertions/contracts/whatever) to ensure that the arguments to your function satisfy all the invariants that you expect. As the number of tests approaches infinity, the probability of a type error being discovered at runtime approaches zero.

Of course, a static type declaration (or TypeInference in a language which supports it) gives you all of this in a rather compact form.

With regards to the above comment about "getting in the programmer's way"; I must point out that similar arguments have been used to justify all sorts of things nowadays ConsideredHarmful. C hackers have long bleated about strongly typed languages "getting in the programmer's way"; back when I was in school the BondageAndDisciplineLanguage that everyone despised (as it was taught at the university) was PascalLanguage. Sometimes, getting in the programmer's way is a good thing. I'm not equating DynamicTyping with, say, GotoStatement?s or PointerArithmetic; just pointing out that this isn't a very good argument.

-- ScottJohnson

Would it be fair to say that these examples only apply to external APIs? i.e., can you see any reason why dynamic typing would be harmful/suboptimal in a self-contained business system, written using TestDrivenDevelopment?

Again, depends on your requirements for robustness. Many systems can tolerate the occasional DoesNotUnderstand without killing anyone/costing lots of money. At any rate, a key part of UnitTests are testing the APIs of subsystems--so the above applies to "closed" universes as well as open ones. I hope you're not only running tests at the perimeter of your application (black-box system testing); much is discovered by inspecting the parts. (I also hope your testing methodology employs assertions/contracts as well as unit tests...it can be argued that typechecking, or its analog in a dynamic typed program, is a better thing to implement in an assertion included as part of the module, rather than as a unit test separate from it. The purpose of unit testing, after all, is to show that a module behaves correctly when given correct inputs; typechecking is an example of rejecting invalid input. Unit tests can be written, of course, to show that "good" behavior (DoesNotUnderstand, or some fallback behavior) and not bad behavior (divergence, UndefinedBehavior) occurs when invalid input is provided--however, I think that this is better expressed in a module precondition rather than an external test case. -- ScottJohnson

Why would I only be testing at the perimeter if I'm using TestDrivenDevelopment? Production code literally does not exist before a test demands that it does. With this in mind it's hard for me to imagine an example, in a closed system, of any possible dynamic typing problems.

Resolved: you should save yourself the cost of typing and maintaining all the extra stuff you have to deal with in a statically typed language and just use a dynamic language, if you're doing TestDrivenDevelopment and building a closed system.

Regardless...at some point your system will be turned over to the users, and they may find ways to surprise you. No suite of unit tests will ever consider all possible uses of the program, after all--a user may (ab)use a program in infinitely many ways (for most programs of any magnitude); and you'll only have a finite number of UnitTests in your suite.

At any rate - this is a tradeoff to consider. Use of dynamic languages is certainly appropriate in many circumstances. Likewise, for static ones. And for many of us, static typing solves real problems that are far more relevant than mysterious disappearing elephants.

Resolved: HorsesForCourses -- ScottJohnson

Hmmm, ok, well normally my users are interacting through some sort of user interface, and not calling methods directly and calling their managers if they get type Object back instead of expected type String. Guess it depends on what sorts of users you have.


Seriously, DynamicTyping folks: go read more about types. See TypefulProgramming for example. A weekend crash-course in Haskell, Ocaml or other modern functional language wouldn't hurt either.

Hey, take a break from your reading and go program in a dynamically typed language. We've been programming with dynamic typing for decades without type errors being a problem. What is a book about types going to prove? That we've just been lucky so far? Don't get me wrong, I'm all about reading and I think any programmer benefits from reading about any unfamiliar language, especially one as well thought-out as Haskell or O'Caml. But it's insulting to tell successful programmers that they're doing it wrong because your type theory says that what they're doing doesn't work.

How about you try programming in a serious typeful language? Try Haskell, Clean or ML; do not try C, C++, C# or Java. This would put some sense into the debate, which is mainly raging because of false generalizations (Smalltalk is better than C ==> dynamic typing is better than static typing). It works the other way, too (Haskell is better than Awk ==> static typing is better than dynamic typing). Broaden you horizon, understand the other side better, then reflect on which arguments are insulting to whom. For the record: after programming in Perl and in Haskell, I find static typing useful. YMMV.


It is sometimes thought that less unit testing is required if you use a statically typed language. I looked at a project written in Python to see which test cases would become redundant if I ported it to C++. I didn't find such a case. Does this mean that I don't do enough unit testing, or does it mean that unit tests also get rid of elephants?

Probably not enough unit tests. Why not post the code and the tests and let's see?

I find that the types of things you "test" for in static-typing are implicitly included in your UnitTests: You may not be explicitly testing method interfaces, but if you pass in the wrong arguments then something else in the test will probably break. -- francis

Probably isn't good enough. -- AnonymousDonor

In my experience, I've never had a bug slip into production code because of an incorrectly type method parameter. Other bugs, yes, but not because of typing issues. You pay a high cost for that static-typing: I've been able to discover solutions in dynamically typed languages that were impossible to discover in statically typed languages because the refactor cost was so much higher. In practice I've found the cost isn't worth it.

As regards code coverage, I think I agree with what WayneConrad said over on TestEverythingThatCouldPossiblyBreak:

If 80% coverage (to pick a number out of the air) is pretty easy to do and has clear benefits, then that's the coverage I get. If 100% coverage is harder than I perceive the benefit to be, then I don't do it. In my mind, there's a curve of cost vs benefit, and that curve has a sharp knee. I'm looking for that knee when I test.

Testing is pragmatic. -- francis

I'll remember that next time my car brakes fail, I get a lethal dose of radiation, or my core switch goes down for 2 hours.

What does that mean? Car brakes failing clearly falls in the Value(benefit) > Value(cost) realm for almost any realistic program testing cost. This true for X-rays, and likely also true for core switches. If no rational defense of the above comments is posted, please delete this and the comments.

I guess you have a different definition of rational. You start by saying type errors aren't a problem and then invoke the 80/20 rule. They are very different propositions. I would suggest you drop unit tests in favor of system tests because you'll get the same 80% or better that you seek and you will save a lot of time. -- AnonymousDonor

I'm pretty sure the Therac-25 had been tested a lot. Now tell the patients who suffered radiation burns about some knee in some curve. Good luck.

Mr. Anonymous: You're talking to more than one person, just so you know. -- francis


Good engineering requires more than invoking the 80/20 rule. An implicit and dangerous assumption in the 80/20 rule is that all failures have about the same cost. When WayneConrad looks at the cost vs benefit curve, he needs to look at something like the expected cost of the failure (the cost of the failure multiplied by its expected likelihood) when he's looking for knees. Car brake failures and lethal radiation dosages are cases where the very high cost offsets the perceived low likelihood. I don't know what a "core switch" is.

Meanwhile, so long as the "static typing" we're discussing occurs in a language that supports a cast operator, unit testing is surely needed. Most of us who have coded in both C/C++/Java and Smalltalk know that "Segmentation Fault" is the static type system's way of spelling "DoesNotUnderstand".

Not quite. A compiler error is the static type system's way of spelling "DoesNotUnderstand". A segmentation fault is one way that a type-unsafe program spells "DoesNotUnderstand"; unfortunately there are many other ways that it is spelled in a type-unsafe program--again, see UndefinedBehavior.

In CeeLanguage, use of unsafe casts (the language does little to distinguish between unsafe casts and safe ones) is almost necessary to write any non-trivial OO program - it is the only way to implement subsumption (other than ignoring compiler warnings) in C. CeePlusPlus is somewhat better, in that StaticCast and DynamicCast are not type-unsafe. There are still lots of other ways of breaking the type system in C++, though... PointerArithmetic, ObjectSlicing, HeapCorruption?, DereferencingNull? all can do nasty things.

JavaLanguage is TypeSafe, as far as I know - any typing error will either be caught by the compiler, or result in an exception being thrown. Actually Java has a hole in its static type system because JavaArraysBreakTypeSafety. This is a failure of the static type system; however it is caught by the dynamic type system. Mistreat an array in the manner described in JavaArraysBreakTypeSafety, and you get an exception thrown - yet another different way of spelling DoesNotUnderstand. Array abuse in Java does not result in death, destruction, or UndefinedBehavior.

At any rate - in a typesafe, statically typed language, the incidence of DoesNotUnderstand is much lower than in a fully dynamically-typed language. Whether or not this benefit is worth the cost of having to manually add type annotations everywhere (as in C/C++ and Java) or use a language with TypeInference (not found in any of the major production languages, nor in the languages with a large set of SmugWeenies?) as opposed to the greater flexibility of coding in a dynamic language like LispLanguage, SmalltalkLanguage, or PythonLanguage, is an engineering tradeoff that needs to be considered.

Anyone who tells you that you needn't ever bother with either statically or dynamically typed languages (or that one or the other is only appropriate/necessary for oddball/fringe applications) is doing you a disservice. -- ScottJohnson

In the shops where I've worked, a significant portion of the runtime segmentation faults result from errant casts. The role of the static type system has been to hide the developer's error inside a cast. Unit tests are the most effective way I know to discover and untangle such bugs.


The argument that you need more unit tests in a dynamically typed language is just silly. I'd wager that the people who are arguing that point don't have significant experience doing TestDrivenDevelopment. I say this as a developer who works primarily in C# and Java. I'm not a dynamic-typing advocate: I have no intention of learning Smalltalk, for example, unless someone pays me to.

It's like this: If you're testing a method, you're also testing its signature. It's that simple. In most cases, asking for more is academic perfectionism. The remaining cases are too few, and too easy to resolve, to increase costs significantly.

I'll repeat my challenge from UnitTestsRequirePerfectDevelopers: look at the code in CommentingChallengeResponse. Pretend the language is dynamically typed. Describe how you could change a type without a test failing.

-- JimLittle

I'll repeat as well. If you have perfect unit tests then everything is perfect. How about an environment where people are far from perfect or the cost of being wrong is dramatic? Personally I have lots of experience with dynamic languages. I also have lots of large project experience. You can't just take lessons from one place. -- AnonymousDonor


Anyone remember the lost MarsOrbiter? It crashed because some programmers confused centimeters and inches. These things can easily be prevented by static typechecking, you just need to encode the units in the types. But testing such a thing seems cumbersome in comparison. How can you reliably write a test that catches values that are off by a factor of about two? Leveraging the type system for such a check creates even more related opportunities: no chance to accidentally add a length and volume, you can even keep height and length apart, making it impossible to confuse the order of parameters, and much much more.

Was the lost Mars probe written in a dynamically-typed language?

The ArianeFive rocket was also lost due to a type conversion error. A critical number was assigned to a variable with less precision. The compiler warned of the problems, but the warning was overridden with an explicit cast.

I don't know know what language the Mars probe was written in, but whatever type system had been available, it obviously wasn't used. Note: I don't blame dynamic typing for the loss of the probe, it's just that static typing provides a tool to prevent such a thing. The tool still has to be used. ArianeFive is a different thing. The warning was overridden because a programmer proved that no overflow was possible for ArianeFour?. This proof needed to be repeated for ArianeFive, but this was never done. The compiler still warned that there was a problem, and reproducing this warning in a weakly typed environment is hard or impossible.


I'm a little confused about the relationship of static typing, unit tests, and production (run-time) bugs.

  1. I thought the reason of static typing was to be able to have non-special computer languages. Unless you have an "everything is an object/list/string" language, don't you need static typing so your variables work? Without static typing, how does a variable know what goes in itself? In other words, if you don't have an all-whatever language, don't you get static typing whether you want it or not? I think you're pretty much correct. Static vs dynamic typing arguments usually refer to languages where every variable is a reference to an object or a function.

  2. I don't know the reason for unit tests, but I know their benefits to me and why I use them. My unit tests don't guarantee working code. Instead, they guarantee I won't make the code worse then it currently is when I make a change six months from now, after I have forgotten all the details of my sloppy coding practices and weird design decisions. They lock me in to the current condition of the code, so I won't slip back. Those are called regression tests.

In order to make sure the run time code always worked, I would have to have a special requirement from the customer as to how much he/she wanted the run time code not to break. Even then, I don't know what I would do. Since I don't usually write code that has special requirements to be bug free, I don't know the special tricks that you have to do to achieve it. I would imagine there are many "low-production-bug code" or "no-production-bug code" best-practices/rules-of-thumb/patterns. -- StanSilver


Sure, if your tests provide 100% code coverage, static typing is redundant. However, I've never had the time or discipline to achieve 100% coverage (which may be because static typing slows me down).

Another factor affecting programming efficiency is how quickly you detect problems. EclipseIde displays many Java programming errors in real time, letting me fix them without interrupting my flow or returning later on after testing. Few problems remain when I start testing. Rapid problem detection may outweigh the extra keystrokes that static typing requires.

I'm a very productive Java coder, and I'm not sure whether I'd be even more productive in Python with TestDrivenDevelopment. The latter approach would lead to higher code quality, though I rarely find bugs in my production code.


In attempt to add some clarity to this discussion, let me define two types of static typing, language defined primitive data types and programmer defined classes. [Note: I am speaking from purely a C/C++ perspective, feel free to adapt this to other languages.]

In practice, I have never seen any difficulties due to static typing of classes within an executable program. When multiple executables share class definitions, then static typing does present a configuration problem. In this case, one is usually restricted to add methods to a class only, one may not delete methods or change the calling signature of methods.

Multiple executables isn't usually the problem; it's executables which used DynamicLinkedLibraries? that are the problem. Very often in CeePlusPlus, changing the definition of a class in a seemingly trivial and harmless way (such as adding a debug function) where the class is defined in a DLL (or .so file for Unix users), will often require that all programs using the library - in particular, those that derive from one or more base classes defined in the library - be recompiled. Otherwise, the program will fail, often in StrangeAndMysteriousWays?. This is known as the FragileBinaryInterfaceProblem; and it's a particularly nasty problem with CeePlusPlus. It's also one (of many) reasons that lots of folks feel C++ is unfit for the enterprise; as enterprise software is often deployed and updated in exactly such a piecemeal fashion

Dynamic typing is one solution to the FragileBinaryInterfaceProblem - all member lookups happen at runtime, rather than using the optimization employed by C++ of static offsets into an object's memory space or VeeTable. However, Dynamic typing is not necessary; Java is an example of a statically-typed language which doesn't suffer (much) from the FragileBinaryInterfaceProblem. If you look at JavaBytecode? (or a textual representation of bytecode), you'll see why - all members of an object in Java are looked up by (mangled) name at runtime. A Java object is kinda like a map (AssociativeArray) of name/object pairs; moving things around in the array doesn't break it. C++, on the other hand, with its fixed offsets determined by the compiler and hardcoded into the binary (at least on most C++ implementations), is very sensitive to the exact ordering of its members.

In the arena I play in (embedded systems); this is much less of a problem. Our product has a single program inside, which is delivered as an atomic thing to the customer. For this reason, CeePlusPlus is an acceptable solution for us. However, it has lots of problems in the enterprise; enterprise deployments of CeePlusPlus usually use some distributed object technology (Corba, COM/DCOM/ActiveX, etc.) to get around the limitations of the language.

[The above is a good clarification. C++ does not define communications between executables; rather this is provided through added on libraries or features. Within a single compiled executable static typing does not pose a problem, unless one considers extended compile times a problem. Microsoft problems with both pre-COM DLLs and COM DLLs are well known. Pre-COM DLLs did not have classes per se, but the API set is probably close enough for discussion purposes. Methods could be added, but the previous method entry codes needed to be preserved. COM went even further in this regard providing written guidelines to ensure both forward and backward compatibility, but Microsoft did not follow its own guidelines. The original point I was trying to make was that static typing of programmer defined types has not caused me any problems and provide some usefulness within a single program. I went a little bit over board when talking about cross-program communications.]

Primitive data types present a different issue. In C/C++, the primitive data types do not represent logical constructs, rather they are compiler directives concerning the storage size and arithmetic operations to be performed on the data elements. For example, use of "int" is wide spread and provides almost no value in detecting calling errors in methods. If a method takes a set of 5 "int" types as its calling parameters, the compiler cannot detect any misorderings. In C++ and especially C, "int" is really a typeless type; the caller and callee must both interpret an "int" as being the same logical (programmer defined) type.

In general, I have found that class types add clarity to variable declarations and allow the compiler to detect things such as misordering of parameters. The use of the primitive types does not provide this value, and the programmer needs to rely on naming conventions and ProgrammerTests to identify and verify parameter correctness.

WayneMack


There is a repeated argument above that even if the number of errors detected by static typing is small, it's still worth having because some of those errors may be important. What about the other edge of this blade, though? If code must often be made longer and more complicated to be compatible with static typing, and then the amount of code with potential errors in it goes up.

If there's a way to have our cake and eat it too, that's great (perhaps O'Caml is just the ticket), but outside of that realm, there is simply a trade-off between static and dynamically typed languages. To me, one of the few pretty neat things about VbClassic is the ability to pragmatically mix statically/dynamically typed code. Use static the 80% of the time when it doesn't get in the way, and use dynamic where it saves a lot of hassle.

StrawMan. Haskell code is remarkably dense and certainly not complicated. O'Caml is rumored to be quite similar, though I can't speak from experience.


You cannot prove everything, so it is better to be prepared to see what cannot be proved. Strong testing > Strong typing. The halting problem is one example.


An example of what static typing can help catch? XSS attacks, for one.

Take Joel's article Making Wrong Code Look Wrong: http://www.joelonsoftware.com/articles/Wrong.html

He goes through all this effort to develop a naming convention to help make it clearer when he's using a "safe" or "unsafe" strings. This relies on every developer following it perfectly every time.

All this is only needed because he's in, apparantly, a language without a useful static type system. If you make safe_string and unsafe_string different classes, then all you need to audit are the easily-found locations where the protection is explicitly bypassed (with some function of long and ugly name), rather than every single use of string.

I'll assert that this would give me far greater confidance just than a unit test can, because I know the compiler has proven it regardless of input.

-- ScottMcMurray


see BugFreeSoftware


Category HolyWar


EditText of this page (last edited April 13, 2009) or FindPage with title or text search