Null Is Benign

Why does NullPointerException exist?


Note: the following discussion is meant for runtime-enabled languages like JavaLanguage and CeeSharp: languages that have a concept of object references that is distinct from the notion of a memory address. See also NullObject, BottomPropagation, ExceptionalValue, and NotaNumber.


The basis of this discussion is the observation that (a) NullPointerException (or NullReferenceException) is quite common; and (b) most of the times it is "fixed" by writing code to check that offendingObject != null before whatever triggered the NPE.

This common "fix" basically says: in the case of null, do nothing. In other words, automagically implement the NullObject pattern. Calling a method on a null reference would either does nothing or returns null, false, 0 or 0.0, according to the return type of the method in order to ensure that NullIsBenign. If instead of firing NPEs, dereferencing of null references could just as easily do nothing.

Claim: doing so results in the following:

It might seem radical, but a simple solution to NPE is to replace NPE with a default "behavior" for all/any methods that are invoked on null object references: do nothing.

There are 2 conditions under which one encounters null in code. The first one, as mentioned above, is when one checks for null and then skips performing some block of code - this is done to for one reason only: to avoid NullPointerException, e.g.,

Object foo = someList.get(0);
if (foo != null) { // to avoid NullPointerException
foo.doSomething();
}
The second one is generally there to take specific action precisely because a value is null - for example to complain (to the user perhaps) that this value isn't set and it needs to be set (e.g., a required field on a form). There are 2 ways to code for this second case: with conditionals and by catching NullPointerExceptions, with the former being generally (but not always) preferred.

Less brittle code

It's all too common to deploy code and have it tested/stressed in some new way that results in an NPE. And, in my experience, the "fix" is generally to bracket the offending code to have it not execute when/if the offending object ref is null. Sometimes, it means implementing a default value in such cases. Conceptually in those cases nothing was wrong with the code, rather there was implicit assumption to: only do this (only dereference) if this instance isn't null. If it is null: do nothing.

The proposal is to add a compiler command line switch to change default behavior. In the same way that CeeSharp has its /checked+ and /checked- switches for changing the way overflow conditions are handled there could be /relax or a -dontPanic switch to indicate: use default null behavior rather than throwing NPEs. And, in a similar way to one of the uses of /checked+ and /checked- one could not use this switch when developing and then use it for deployment.

The current approach causes and "explosion" of the code. The new approach would be a more relaxed one: worst case: nothing happens. The analogy that comes to mind is having your car not start vs. having your car explode. One might not be too useful, but at least it's not destructive. The latter approach gives you no choice: guaranteed spectacular failure.

This seems to me to be an advocation of DefensiveProgramming vs OffensiveProgramming, but sometimes Offensive is better. If the failure is as early and obvious as possible, it takes less time to track down the problem. If the code does nothing, a symptom may be found at a later point in the code execution such that the original cause is more distant and difficult to locate.

[I encountered an NPE last week trying to re-use a component from a web page that most often exists inside of another, parent component. In this case I was using the component outside of the parent and so the parent was null. The component has default state and checks its parent for overrides. If there's no parent there's nothing to do. The code had only been tested with a parent set, so using it in a new scenario caused the usual: NPE. If, instead, there were non-ops and default values instead of NPE this (and lots of other folk's code) would be more robust (less brittle) and more graciously adjust to new uses by not requiring the overly common checks for null values.]

The NPE here is the programmer's fault, not the language's. This particular NPE reveals that the component wasn't properly tested for the scenario you're using it in: never a good omen. It's certainly not a component I'd want to use in a production environment. Take it as a warning that the code may have other creeping problems as well--a code smell, if you will. NPEs are not always evil. It's better that the program send up a red flag if the runtime sees a possible problem than silently fail and possibly cause more errors down the road. It's almost like telling a developer to turn off all their compiler warnings. I can't say I'd want my program to crash in all instances, but silent failure is not an option either. Compilers have warnings; why can't runtime environments?

A grammar that doesn't impose its (artificial) rules to limit a program's semantics

Under the current situation the language's grammar by considering the de-referencing of null object references to be an exception is imposing an artificial rule onto the logic/semantics of programs written in that language. To whit: NPE basically amounts to saying that ALL FIELDS ARE REQUIRED. That is, the current default behavior

Here's the caveat: you can't use NullPointerException to handle program flow: i.e., you must explicitly check for values that, from a program logic/semantic point-of-view, aren't supposed to be null (e.g., required fields). This is, after all, a proposal for how these (and similar) languages could happily exist WITHOUT NullPointerException, so it's contradictory to then write code where the logic requires NPE.

Further, if you know some code doesn't use NPE in this secondary program/logic way, one could conceive of a runtime switch which substitutes default behavior for NPE. This is generally more tricky 'cause unless its your code you don't know how it might internally be using NPE to check for required fields.

Enforcement of strong typing rules

This leaves open some issues: how to handle return types and how to handle access to instance variables.

Since we have a runtime (Java, C#, etc.) we have metadata and know the return type for an invoked method. If the method returns an object reference the answer is simple: return null. So,

Object reference = null;
Object copied = reference.clone();
results in copied == null.

For the case of return types that are not referenced objects (primitives and value types) this is less clean cut. One option is to un-box null as the already existing default value for the destination type. In Java, this default value is the initialization value of a type when no value is specified (in C# it is illegal to access a member of an uninitialized value type). So we could have:

/* Object */ anObject; // assume this exists
Object reference = null;
boolean isEqual = reference.equals(anObject);
is always false. Similarly, for:
String nullStr = null;
int sz = nullStr.length();
sz is 0 (zero). NOTE: this does NOT break strong typing. There is no type conversion happening here. The following, for example, would not compile as it violates the type constraints:
boolean fail = nullStr.length();
The type on the left is boolean and on the right is int. Simply stated: null is NOT A TYPE. Null is a typeless value. This is already the case in both Java and C# (and, for that matter in C++) as it can be assigned to a reference to any object and yet it is not an instance of Object. It can't be for if it were it could never be assigned to the reference to any sub-types of Object as the type cast rules explicitly forbid casting down the inheritance tree.

Similarly we can return default values for instance variables: null for object ref's, 0 for int, false for boolean, etc.

As for setting of instance (member) variables, this is a non-op. Just as:

Object foo = null;
foo.setBar("bar is a string ...");
should do nothing - there's no object foo after all, so should:
foo.bar = "this goes nowhere.";
should do nothing.

The mind accustomed to NPE balks. But what about the case where it makes no sense to have foo == null, where that would violate the program's logic. Precisely, the PROGRAM's logic, NOT the language's grammar. Why should the language either take upon itself a program's logic (it can't and remain a useful language) or (just as bad) impose its grammar onto the logic of a program. IF (and I mean IF) foo must not be null for the purposes of the logic of the program THEN it is the programmers responsibility to assure (assert) that foo is not null at whatever point the logic of the program dictates. There is no intrinsic language-based reason to have NPE. In fact I've never once encountered in my own code, nor anyone else's, nor heard a reason where NPE is justified on grounds of language grammar - people justify it in terms of program logic. But that's not the language's business! I've yet to see a grammatical need for NPE. [PLEASE INFORM ME ONE IF YOU'VE FOUND ONE.]

This seems to be confusing language grammar with language semantics.

The only issue is the code-base problem: as some coders rely on catching NPE to enforce required fields (object references) if one removes NPE that code might not perform as intended - hence the suggestion of using a compiler rather than a runtime flag to get rid of NPE.

Less code

In all of the above having default (non) behavior for null results in fewer lines of code. This makes sense: if nothing happens to null object references there's no need to write the code to ensure that nothing happens (is executed) in the case of a null object reference. Which currently, after all, is the main reason folks write such code... we see it ALL the time:

  if (foo != null) { foo.doSomething(); } 
such code is a work-around for the imposition of the languages grammar into the program's semantics. Of course there's the implicit:
if (foo == null) { /* do nothing */ }.
Conceptually foo is null, null is nothing and can't do anything so the proposed default (non-) behavior is logically consistent.

Various Details

One detail: Note that in the following:

Object anObject = null;
Object reference = null;
boolean isEqual = reference.equals(anObject);
The value of isEqual is still false. In some sense this means that the equals() method is not reflexive over null references. Another way to pose this is that not all null references are equal. Since a null reference is not typed and can be the value of any object reference, there is some grounds to this seemingly odd situation. Note that under current Java and C# rules, the proposed default null behavior is equivalent to:
Object anObject = null;
Object reference = null;
boolean isEqual = false;
try {
isEqual = reference.equals(anObject);
} catch (NullPointerException npe) {
isEqual = false;
}
so this side-effect of making methods invoked on null non-ops with default return values really just follows the current status. Of course one is free in one's own code to define null equality differently by explicitly checking for null, the proposed default behavior doesn't change this.

Practically speaking, with the proposed rules if one wants different behavior one needs to write:

isEqual = (reference == anObject) || reference.equals(anObject);
which is reflexive over null.

The "do nothing" rule is logically equivalent to the common idiom:

Object reference; // assume this exists - but it might be null
Object copied = null;
if (reference != null) {
copied = reference.clone();
}
Except, of course, it requires less code. It also preferable to the other common idiom:

Object reference; // assume this exists - but it might be null
try {
copied = reference.clone();
} catch (NullPointerException npe) {
copied = null;
}
Yet all three have the same result (if the "do nothing" rule is supported for dereferencing null object refs).


In a Java-like language, this proposal does not, and cannot always "just do nothing". Consider the following pseudocode:

  Something foo = happens_to_be_null;
  if (foo.isFrotzed()) {
bar();
  } else {
baz();
  }
The proposal would have foo.isFrotzed() implicitly return false, causing baz() to be executed. Is this what we want? Who knows? One of the design principles of the JavaLanguage (and where it differs in principle from CeePlusPlus) is that conversions should be explicit, rather than implicit, to insure that they conform to the programmer's expectation. In keeping with that design principle, it would be more consistent to explicitly use the NullObject Pattern.

(OTOH, in an EverythingIsAnObject language like SmalltalkLanguage, the equivalent code

  foo isFrotzed ifTrue: [bar run] ifFalse: [baz run]
[or something like that] would indeed do nothing - violating the programmer's expectation that either the ifTrue or the ifFalse will be executed.)

[Smalltalk doesn't have a NullPointerException, so you wouldn't apply this to Smalltalk. Smalltalk doesn't have a null - it has nil, which is an object with its own class that you can implement methods on if you really want to. In Smalltalk, if foo was nil, "foo isFrotzed" would return false under most idiomatic Smalltalk implementations of isFrotzed, which include an Object implementation of isFrotzed to return false. If this implementation were not in place, you'd get a "doesNotUnderstand" message sent to nil, rather than an exception. Basically, this page applies to Java and languages that handle null the way Java does.]

This "doesNotUnderstand" is essentially Smalltalk's equivalent of NullPointerException. The proposal is equally applicable to Smalltalk; it's just not a good idea.

Consider also:

  SomethingElse xyzzy = new SomethingElse();
  Something foo = happens_to_be_null;
  foo.setup(xyzzy); // implicitly ignored

// oops, what state is xyzzy in here? xyzzy.betterHaveBeenSetup();
A truly awful idea. Give me an exception that I can catch in general outer-level error recovery code, rather than blithely continuing from wherever a mistake was made.

Or, even worse:

  InterestCalculator interest_calc = get_calculator(my_account);
  my_account.balance *= interest_calc.get_interest_rate();

If that was my bank acount, I'd want the code to fail with an NPE if interest_calc came through as null, rather than multiply my bank balance by zero...


Philosophy

Philosophically, null (or NULL) is the surrounding space (as in null and void). It has no intrinsic meaning without a consumer. That is to say, its meaning is contextual and applied by the "perceiver", system context or programmer. Hence it's (apparent) multiplicity of meaning. I think of it as something akin to wave-particle duality, but, being a metaphor of space it affords a multitude of values. [In short, while I would differ with the implicit etymology of the comment on WhatIsNull that equates null with the Chinese character "mu", I agree with that semantically - which is the context of the original comment on WhatIsNull.]

Generally speaking I find null harmless. (Space is non-threatening.) In terms of a pedantic form of strong typing there are problems. Notably while a reference is still valid (in most languages) even though its value (not its type) is null, in those languages it isn't valid to dereference a null-valued object reference. As null is a value it can be assigned to object references. If we were to treat null as a type, however, to be (perhaps pedantically) consistent with the language, we end up needing to define it as a sub-type (sub-class) of all other types for assignment rules to still hold (this is done in some languages).

Beyond the general and the philosophical I'd like to focus on the practical. For this I'll focus on the Java language - though I think it applies to the languages that are based or inspired by Java, notably C#. My claim - and I'd like to be enlightened on this - is that NullPointerException is indeed a hang-over from C++ (and no, not for its name) and should be done away with. That is to say, regardless of how we view our friend null, and whether one chooses to build collections that can contain null and thus must implement a distinct contains(key) method, or whether one decides that no key's in your collection(s) can be associated with null values and so a null return for some get(key) method is sufficient ... is, to me, mostly a question of what conventions a library designer has settled upon.

What I find generally annoying and not useful is NullPointerException. Just this week I was writing a bit of testing/debugging code and wrote:

if (httpResponse.headerForKey("content-type").equals("text/html")) {
//  dump some info here...
}

None too surprisingly there are cases where (even) the "content-type" header hadn't been set and my test blew up with an unhelpful NullPointerException. So I re-write the code as we're all too familiar with:

String ctHeader = null;
if (httpResponse != null) ctHeader = httpResponse.headerForKey("content-type");
if (ctHeader!= null && ctHeader.equals("text/html")) {
//  dump some info here...
}

This of course took care of the "problem". (I added the httpResponse!=null for good measure - but isn't that what we end up doing?)

This way of doing things doesn't make sense to me. I've heard some of the case for strong typing with regards to null, but I find it a very pedantic application of strong typing rather than a useful one.

Simply put, invoking a method on null (or sending null a message) could just give a null result (for methods that don't have a void return). That is to say: invoking a method on null does nothing and that need not be a problem. For methods that have return types that are subclasses of Object this would work fine.

This leads to one issue: what about methods that don't return objects but return primitive types (an issue that only arises in languages that have non-object primitive types - and since I'm concerned with Java, this applies). This can be handled by having standard implicit type-conversion rules for null (or null-value object references). The basis for this type-conversion is the notion of how all types that aren't explicitly assigned values are initialized. That is the type's null value. For instance, int == 0; boolean == false; etc.

If null is good enough to be a value for ANY object reference why not take it all the way: let it be a valid value for ANY type. Basically null is the "unassigned" value for a reference. All other types have such default (unassigned) values. Would it be entirely unreasonable to let null play that role for all types? What this implies is that null in its multiplicity is "perceived" as "false" when viewed through the lens of boolean, as zero (0) when viewed through int, etc. This frees us from a machine-induced C++ hang-over: the NullPointerException. To wit: we're not risking accessing an invalid memory address (as in C or C++), rather we're simply invoking a method on null, which, conceptually, needn't be a problem.

Strong-typing issues:

int x;
if ( x = 3 ) { }
is valid in C (though new compilers will produce a warning: didn't you mean: "if ( ( x = 3) != 0 )..." ), but not in Java. And that.s generally a good thing. The requirement that only boolean types be used in the evaluation of conditionals saves one from a variety of reasonably common programming errors. As a result an expression like:
if ( aString.equals(anotherString) ) { ... }
is only valid if the equals() method returns boolean. Note that there's no inherent requirement on the value of the someObject reference. If someObject were null and the language had the well defined rule that invoking a method on a null object returns the default (null) value for that method's return type, and that the boolean "view" of null is false, then
aString = null;
boolean isEqual = aString.equals(anotherString);
The value of isEqual will always be false regardless of anotherString's value.

As an aside, this leads to the interesting result that:

((String)null).equals((String)null) == false; // !
while
((String)null == (String)null) == true; // of course
That can be viewed as either an effect of the rule that applies if you (try to) invoke a method on a null-valued reference or as another example of the multiplicity of meaning that null allows. Regardless, the result is deterministic and therefore well-behaved.

Beyond that aside, the rationale for this whole inquiry into the utility of NullPointerException is based on my experience that that particular exception isn't useful. Rather it's a hindrance to simple, clear code. In my code more than 90% of the time (closer to 99% I'd say), the default (null) behavior is desired: to wit, do nothing. All sorts of checks are added to ensure a reference isn't null before doing a trivial method invocation simply to avoid the possibility of a NullPointerException. Nulls are not exceptional: they are beyond common place, they are default values for object references. Note that various types of refactoring so that instead of:

if (person != null) {
if (person.credentials() != null) {
if (person.credentials().get(0) != null) {
 if (person.credentials().get(0).password() != null) {
if (person.credentials().get(0).password().equals(enteredPassword)){
// authorize access with non-encrypted credentials...
}
 }
}
}
}
One can have:
if (person != null && person.getPassword(0) != null ) {
if (person.getPassword(0).equals(enteredPassword)){ 
// authorize access with non-encrypted credentials...
}
}
This does help the code somewhat, though one still has to do the checks in the getPassword(int ) convenience method.

Still it doesn't compare in simplicity, elegance and efficiency to:

if (person.credentials().get(0).password().equals(enteredPassword)){ 
// authorize access with non-encrypted credentials...
} 
or, refactored with a convenience method
if (person.getPassword(0).equals(enteredPassword)){ 
// authorize access with non-encrypted credentials...
}
this is all we wanted in the first place.

The number of instances of code like this, or the number of times spent debugging random NullPointerExceptions is part of everyone's experience who's written Java (or C#, or VB, or ...) code. My question is: WHY? Why do things this way. I.m quite convinced already that getting rid of this exception is both easy and makes for a better language. But I'd like to be enlightened as to why this perhaps ain't so.

I should admit to a bias. I've heard the arguments for stringent type-checking. I've not bought them in this case: with proper rules, invoking methods on a null reference and casting null to primitive types is type safe. Perhaps I need to confess my bias. I woke up to O-O in 1992 on a NeXT box. The language was Objective-C and it had null behavior similar to what is described here.

The other argument I've heard for NullPointerException is the software reliability argument. This to me is potentially more credible (I care about that, vs. the strong typing argument I view as merely one approach to achieving high quality, high reliability software and an approach that is not always the best). But I.ve not yet encountered a clear instance where having NullPointerException delivers on that goal. Indeed, just the opposite, instead of defaulting to in a graceful way to expected behavior, one forgotten value check can cause an entire application's call stack to unwind to the first exception handler. It is both irritating and disruptive of what would otherwise (in 90+% of the time's I've seen) have been proper (albeit default) program execution.

On the issue of quality, neither approach is any replacement for the need for thorough testing. One approach, however, gives default behavior that is correct almost always and saves quite a few lines of code that are doing nothing other than checking for null values because Java (and related languages) don't have any default behavior for nulls.

Here's my challenge: show me a piece of extant code with null-value checking that wouldn't work just fine with default (non-op) null behavior (code where reverse-default behavior - e.g., "true" instead of "false" is desired - only counts for partial credit). Further, show me some code where you'd actually need to write MORE logic if there were null behavior. I haven't found any such sections in my own code...

How about "object.updateSomething(value)"? Throwing a null pointer exception is the correct response, because the update failed. No-op for null-value only works for comparisons, if you are trying to actually do something with the object, a silent no-op is the last thing you want.

Actually I think that trying to update a null and having nothing happen is precisely what one wants. The method "dissolves into space." This, to me, is no problem. If, however, you wanted to ensure that indeed there was an object, then by all means do the check and either throw a suitable exception or take some other appropriate action. But it is a mistake to assume that this needs an exception. You could similarly have:

newUser.dispatchEmail()
well if there ain't no new user, there's no need to do anything. I don't see in this case (or the good example you provide) why a NullPointerException is either the way to go or what it buys you. A simpler case might be the common form/field validation. If some fields are required, certainly the logic should check for those fields. But, on the contrary, to throw exceptions if one tries to query against (or invoke any other method on) a field that was left null is backwards to my thinking.

BTW, your comparisons above can be written like this instead:

if ("text/html".equals(httpResponse.headerForKey("content-type")) { ... }

Thank you, I appreciate that useful tip, in some sense the example was gratuitous, but at times I've overlooked this elegant solution.

which takes care of the no-header case properly. And if your httpResponse is null, you got a bigger problem which should not be treated simply by returning a false. Similar to your credential example, if either person or his credential is null, you got a problem that should be handled separately.

Not necessarily. If the default is return nothing, or in the case of credentials: disallow access, you're fine. Clearly in most situations, that's not sufficient, but then neither is letting the NPE be thrown uncaught. The NPE approach is, at best, inefficient. Worse (to me) it requires much more code.

The other point here is that a language, while containing rules for a grammar, cannot in itself determine the meanings (semantics) of phrases. Yes, the phrases must be constructed to accord with the grammar. However, when a grammar implicitly imputes an extrinsic and arbitrary meaning to a phrase that weakens rather than strengthens the language. It is, simply put, trying to solve something that's not solvable, nor helped by the constraints (training wheels?) required to impose the constraints. The openness of the meaning of a phrase was shown by Goedel and again by Turing. After some consideration, the above reason for having (desiring?) a NPE for an: "object.updateSomething(value)" method imputes meaning to the updateSomething "phrase" when the real meaning will be entirely determined by the method itself and can't be determined by the language a priori. Simply put: what the "correct" behavior is for any instance of object as well as for the case when object is a null reference can only be determined by the method itself. So, once again NPE is the wrong thing. Consider:

String requiredField = null;
String optionalField = null;

// get some (optional) user input for these fields..

requiredField.doInsert(); // need to check and not allow this if requiredField is null optionalField.doInsert(); // just fine to do nothing

In other words, the context of when "object.updateSomething(...)" might need to fail may well depend not only on the updateSomething() method, but on the particular instance of object and when/where/how it is used. The correct behavior in the above is to check requiredField before invoking doInsert() and take appropriate action if it is null. Appropriate action might be: (a) interact with the user and request a re-entry for that field (b) log some info to a file (c) throw an exception: presumably to be caught/handled elsewhere. I see no reason at all that an exception MUST be thrown (certainly not for the optionalField - it's optional after all). I see no way that the language can "know" a-priori that an NPE should be thrown.


The above proposal is simply flawed. Because it makes no sense in mathematics to equate null with 0. In mathematics, 0 is too important a number to be considered as good as any logical programming mistake that left a variable uninitialized.

Actually it makes perfect sense to equate null with 0. Null is synonymous with "none" which, numerically, is zero. [Indeed the definitions given for null as a noun (rather than as an adjective) on dictionary.com are: "1. Zero; nothing. 2. An instrument reading of zero." Also, in mathematics null is used for the empty set and the size of that set is zero.] But, that aside (for I feel it is important, but of secondary importance) the problem I have is with the assumption (the Java language's assumption, most programmer's assumptions, etc.) that an uninitialized variable must be a programming mistake. It need not be!!! That's the gist of my whole point. It is certainly not a mistake if that variable is, say, an optional (not required) field such as, "middleName" in an address form. Nor is it a mistake if you then say: "person.middleName().length()" and get a result of 0 (zero) when there is no middleName. And it is only the programmer who can say whether middleName is an optional field, not the language [unless the language provides a way to specify "non-null object reference" ... but that's a tangent].

It makes perfect non-sense. Null is in fact a marker of absence of information, if programmers want 0 they can put 0. If middle name is not important you can mark the default being an empty string. There's no use of null in mathematics. You cannot equate the absence of information with the information that something is 0, this simply doesn't fly.

Then why do you see it as OK to equate a null string reference with an empty string. They certainly aren't the same in most languages.

Further, using zero as the numerical value for null makes perfect sense in most (yes, not all) contexts. Consider the most basic use of numbers: doing sums. If you sum a null it shouldn't count for anything toward the sum. That is what the value zero means and so in that context it makes sense. It makes sense because zero is additive identity. As zero is not the identity element for multiplication, encountering a null anywhere in a multiplication gives a null (zero) result. And with division there's the possibility of divide-by-zero error.

And doing cos(x) returns 1 just as cos(2*pi).

Not having NPE is still cleaner and faster even in this case. Basically if you have a place where null has meaning, or needs to explicitly NOT have meaning in the primitive types, to add an explicit check for null. Then rather than having to ALWAYS check for null (as one does now) you only check when and where it is relevant. Consider the above where x is taken from some array (that can contain nulls) that is being enumerated. If x were an object with a toDouble() method you would currently have to check every object for null or risk NPE.To avoid having null treated as zero in this case you have to check for null and skip those entries or do whatever else is needed. IN OTHER WORDS, in this "worst case" scenario there is no more code or overhead than in the everyday scenario of all current Java (and other) programs.

The MUCH more common situation is containment. One object contains another. If the container is null and you dereference you get NPE. WHY!? Why not just return null! It is sooo simple and it makes code soooo smooth.Yes, for some numeric computations you need an explicit check for null before converting to a base type, but hey, guess what, in ALL current conversions from an object to a base/primitive type you are required to check for NULL or risk NPE.

Sure, it's a side-effect of the fundamental problem of un-boxing. In the world of object references null makes sense: it says "this is a non-reference" or as you accurately put it: "this lacks value". There's simple no such notion in the world of value/primitive types. So when moving to the world of value types and primitive types one has this problem. The way Java and C# solve this problem is to throw NPE - and declare that it is illegal/meaningless for a null reference to move across this boundary. However it is perfectly valid to instead define a default for moving across this boundary. From a pedantic view of typing it gets the strong-typing folks in a tizzy. From the point of view of solid programming it actually works just fine - that is, certainly no worse than the language that throws NPE in a deployed system.

From http://mathworld.wolfram.com/Zero.html: "The integer denoted 0 which, when used as a counting number, means that no objects are present. It is the only integer (and, in fact, the only real number) which is neither negative nor positive."

That's a triviality. It doesn't support your argument that 0 should logically be equivalent to null.

I never said (or intended to say) that null is or should be logically equivalent to zero (or vice-versa). They are different domains. Null does not exist in the set of integers. The argument "against null" is strictly formal in this sense. But it misses an underlying reality/truth: we are inside of a machine here! The rules we make up/impose need to be logically consistent, they do not need to adhere to some abstract formalism purely for formalism's sake. I believe in practical programming. The computer should move in the direction of the user/programmer not the other way around. I'm not proposing to dispel the logical barrier between references and values - that barrier was imposed and the only way to "remove" it is to either treat everything as references (very inefficient and you wind up with 3-fold logic to boot) or treat everything as machine addresses (ok for trivial hardware-specific code but that's about it). What I'm saying, rather, is that at a certain point the strict formality of the definitions of value types vs reference types gets in the way of utility. There's a simple work-around: allow a single point of intersection between all types: to whit NULL.

In a formal sense one can say that any value chosen is arbitrary (all definitions are (somewhat) arbitrary and yet they are the basis of what underlies all mathematics) - nevertheless, there are "natural" values for null. These are the very same values that the JVM (for Java - doesn't work in C#) defines for the various primitive types.

There's another way to pose this question: Why is NullPointerException thrown so often?

I think that's a good, basic question. I am interested in someone who has studied code to see why this happens. There are some reasons I've heard for it that I don't buy, namely: Some folks say "because programmers are stupid and write bad code" - I don't agree with that. Others have said: "because the languages are bad for allowing null in the first place" - but null is a fact of data sets in the real world. My own opinion - based on my own code and so I'd love to see other's experiences - is that the mind assumes a default no behavior. One thinks: if there's nothing there, do nothing... and of course the language wants you to be explicit about that and first check that there is indeed nothing there. Why do you thing NPE is thrown so frequently.


If you equate null with 0 you might as well equate it with absolutely anything, consider that 0 is in a system of coordinates obtained through a translation of axes. It can be (1,1), or (2.5, 3.5). It is totally unsafe to equate null with 0, because programmers will then loose the possibility to have an "in your face" blow in case of a logical error. If null is transformed in 0, then logical errors will propagate far and remote from the context where mistakes are made.

Understanding that any variable's value might be contextualized (e.g., by a coordinate system) does add a degree of arbitrariness to any choice for a default value. Since any value is as good as any other, then zero is certainly no worse a value than any other. The thing is that a computer IS (has) a default context and zero means zero in that default context. Therefore zero is a better default value than others.

But it is no better, and it is much worse than being what really null should be: a marker that some structure is uninitialized.

Basically this is a trade-off. There is no way to guarantee lack of logical errors. My premise is that NPE doesn't guarantee it either AND it makes the most common situations - which by themselves are inherently benign - into explosive exceptions. It's a question of approach. Do you trust the programmer enough to give a benefit of the doubt (e.g., maybe it IS OK that middleName is null) or do you impose a stringency that makes code more brittle? It's this difference that leads me to propose a compiler flag. Some programmers actually like NPE. And in some cases like the one you mentioned, it does help one track down logical errors closer to their origin. However it doesn't guarantee that and it most often (in my experience) considers something to be an error when instead - and to the betterment of software quality - it could just as easily consider the situation benign. Hence the title of this page: NullIsBenign.

No, it is not benign. There should be no default values mandated by language, there can be default values specified by user, and those will be context specific. Claiming that 0 is the best default value is arbitrary. There's absolutely nothing magic at all about 0. NULL is definitely not a default value, it is the lack of any value.

The fact that we cannot eliminate all logical errors is the classic RedHerring argument. It doesn't mean that we should do no effort to eliminate any error. Indeed, logical errors related to null can be easily eliminated by simple, logical, elegant and effective language design. See ObjectiveCaml, HaskellLanguage, NiceLanguage.

It's not meant to be a RedHerring, but the truth. I'm not arguing against trying - that's why I say the /relax or -dontPanic flag should be used for deployment, not development. ObjectiveCaml et. al. haven't solved the problem. In a way they've shifted it. They've added types that can't be null.

You're factually incorrect. Don't project your misunderstanding on ushc languages. What they prevent is having uninitialized structures. With one exception, NiceLanguage, that forces you to test for null. In the same

I shouldn't have used Caml, but instead Nice. Nice with its option types: String vs ?String does exactly what I said above.

While this is helpful in a way (a built-in assert utility) it still leaves NPE in place for the types that one (now explicitly) says can be null - as soon as one dereferences such a null-allowed reference there's NPE. How does that make code more logically consistent? NPE is, after all, a runtime check and unless we can guarantee all possible pathways have been checked (which we should TRY to do, but we should also recall Turing: it can't conclusively be done) we're left with deployable code that might suddenly blow up, rather than calmly continuing.

Again, factually wrong. Checking for null can be done strictly at compile time, and purely and simply in many functional languages you don't get the chance to create havoc (uninitialized structures).

WHAT!? As in "what type of systems are you building"? How, if reading from a relational db, say, can you possibly know a priori which fields of which records will be null? You can say which ones MIGHT be null, but that's not the same thing. You're talking about closed, self-contained systems here. I'm talking about real-world programs, not a university research project. I don't mean to descend this way, but let's be a bit real.

I am interested in Functional Programming. I don't see it as a viable alternative to "imperative programming languages" but then, I fully admit, I've only a slight acquaintance with ML and its ilk.


At that point it depends on where you're using it. If you're in a financial app and you want to say: the correct result or NO RESULT (i.e., NPE) then, certainly, use NPE. If, however, you're controlling a launch or flight of the SpaceShuttle I'd say: please try to make things work if you can: that is, treat null as benign. In all honesty I've built systems that have both approaches to null and simply from experience I'm saying that getting rid of NPE has made systems more robust (less brittle) and has not introduced new logical errors - on the contrary, the default values do a fine job so often that it is the exception to find a place where they come as a problem (even in financial apps - and yes, I've built a few).

Maybe it's your experience that needs revised. There are better way of doing things.

To turn the tables a bit: the fact that we can find an example where (hopefully during testing) NPE would be thrown and we'd modify our code IN SUCH A WAY THAT NOW IT SHOULD WORK WITHOUT EVER THROWING NPE is a strange argument to make for a deployment system. Neither approach touches the core issue: if you (you the programmer, NOT the language) requires a variable to be set (not NULL) then you're code better check for it or suffer the consequences. Those consequences are variously: NPE or a non-op (and possible default value substitution if invoking a method that returns a primitive/value type). For me, I'll take the non-op: do nothing.

If you equate NULL with 0, first of all, NULL is logically the absence of information, second cos(0) is 1 and add that in a series and all of a sudden your blissful ignorance of NULL has transformed a non-op into an op.

(1/first), see below. (2/second) Taking cos(x) doesn't make x into an op. You've mixed operators/operations with parameters. But there's an interesting underlying point. If you have

foo.doSomething();
where foo is null, it makes sense to speak of a non-op, however as soon as you have
someValueType = foo.getValueType();
I am NOT proposing a non-op, I'm proposing returning a default value so this is definitely an op. A non op would have nothing on the stack and someValueType's value wouldn't change. That's an interesting idea, but it would lead to inconsistent results. I'm confident of the consistency of the rules that I've mentioned here, I've not thought through and so am not confident of rules that would lead the left-hand-side unchanged.

(1/first)Note on the above: of course there's no inherent way to represent null over the integers (or the reals), just as there's NO WAY to represent 1.4 over the ints - yet most languages convert between these 2 groups just fine

Nitty-gritty details. The best way to implement non-NPE behavior is in the runtime (at the virtual machine level). Note, the following is Specific to the JavaVirtualMachine. The JVM spec lists 26 operations that throw NPE (if the index is accurate). Of those 23 would be supplemented by versions that would not throw NPE. [The 3 that wouldn't be supplemented are athrow, monitorenter and monitorexit.] 16 of the 23 involve loading/storing the primitive types from/in arrays; 2 involve the same with object references; 2 more: getfield & putfield; 2 for method invocation; and 1 for the length of the array. If Java 3 is in the works perhaps there will be consideration of adding these 23 operands that don't throw NPE. The non-NPE version of the operands are trivial to implement. And the compiler changes for a -dontPanic switch (or nonNPE { } code blocks) would be trivial to (map the 23 ops to 23 different ones).

As a debugging tool having NPE during development and testing is fine. It has places where it can help and therefore, if it helps, use it. But during deployment it is a dubious exception. Borrowing another page from C# one could add nullIsBenign and nullIsNotBenign sections in the code itself (a la the checked/unchecked sections in C#). This would keep the development/testing environment identical to deployment and only turn on (or off) NPE type null de-referencing enforcement if/when needed.

Every day, I write code to check for null just to avoid NPE. A null that is otherwise benign. Code like:

if (someString != null && someString.length() > 0) {
// string manipulation logic goes here.
}
Yet it's very rare that I find code where considering null harmless causes me any troubles. Just the opposite - if I care that an object isn't null I write code to perform the appropriate checks. But most of the time I'm checking not because it makes sense logically but I have to because of the way Java and C# treat dereferencing a null. It feels like the language is arbitrarily imposing this on the code.

And again, your comparison with Java and C# is irrelevant. By all standards those are obsoleted language designs. If you want a good solution for a Java-like language have a look at NiceLanguage treatment of NULL, it forces you to either: prove that a variable is not-null or, check before dereferencing, but strictly the minimum amount of checks necessary, and all is fool-proof verified at compile time.

PerlLanguage does almost exactly what you suggest here with its undef (with a warning for primitive types; its still an error for objects). I found I had to force it to an error so that it failed rather than produce incomplete results that caused errors later. In a code generator, if you replace the variable name with an empty string it doesn't compile anymore. What's benign in one case is not going to be in another. I'd say an explicit shortcutting operator (say .? instead of .) would solve your coding elegance issue without risking undetected errors.


If a system has a reasonable exception system - like CommonLisp or SmalltalkLanguage - then NullPointerException works quite well. By "reasonable exception system" I mean the ability to do at least the following:

This behavior allows the ExceptionHandler? to do something truly useful, like gracefully recover from the sorts of circumstances that cause a NullPointerException to be thrown. In those environments, the performance impact is also not objectionable.

In my view, the major problem with null handling in Java is the exception handling mechanism, not the NullPointerException itself. The other exceptions are just as bad - they are simply less frequent.


Note: Special emphasis placed on the Java context for this discussion because NULL is used in many languages for quite legitimate purposes.


The construct:

if( x != NULL ) {
// do something 
}

almost always fixes the symptom rather than the disease. If it doesn't make sense for that object to be null at that point, there must be an error somewhere else in your code, and you should fix that error. If it does make sense for x to be null, you will typically want to take some special action on that basis. In correct code, then, the proposal to treat null as a no-op will generally be useless; in incorrect code, it will hide errors. Sometimes, brittle code is good code, and this is one of those cases (if you have a programming error, your program should crash, rather than continue and make subtle mistakes).

Also note that an exception, even a RuntimeException for which there is no hope of continuing the original operation, does not necessarily cause a crash. In a program with robust error handling, it just aborts the current user action, or in the case of a transaction-oriented system like a server or database, it rolls back the current transaction. Above the OP says: "The analogy that comes to mind is having your car not start vs. having your car explode." If any exception can cause a program to do something analogous to "exploding", then the program's error handling is badly broken, and NPEs are not the cause (or even a significant contributing factor).


It's hard to find good default behavior for every possible null object

if( ! collection.isEmpty() ) {  // if collection is null this would be true
 //get some thing with the collection (BUT collection is null, so it get() return null, now null get spawn all over)
}

or

if( ! user.isAnonymous() ) {  //same as above, will be true
 //authorize operation. Wow null user can gain authorized access!!!
}

When return value that is primitive, even though we don't have the null value to indicate undefined value, the exception is usually thrown in this case.

Return value must have a unique value to indicate an 'exceptional' circumstance. Can also pas a ref to a bool or other type to indicate various response actions [i.e., many times just return a 2nd value (success/failure)]. -- gl

Actually, you can solve half of the above problem, by having the not operator explicitly throw an exception, instead of converting null to true. Then your only problem is the else case, where you have: if true ...; else (false or null).


NullPointerException is indeed a hang-over from C++ and should be done away with

The problem is not some association with C++, but is based on at least two issues. One is the representation of an unknown, particularly with computer language primitives. Two concerns statefulness associated with library calls.

Most languages do not have a way to represent an unknown value. In fact, there can be several types of "unknown" value. For example, in a system that collects the U.S. Social Security Number, one can have a range of valid numbers, an "unknown" indicating that the Social Security Number has not yet been entered, and an "unknown" indicating that the person does not have a Social Security Number.

Many object oriented library classes are stateful and require a specific sequence of steps. One cannot read from a file until the file has been identified and opened. In Java, one must pass a java.io.file into a java.io.FileReader? before reading the file (although I believe Java throws a java.io.IOException error instead, but the meaning is really the same as a Null Pointer exception).

A Null Pointer Exception (as well as divide by 0 errors) is a way to handle an unknown value.


As I see it, NullPointerExceptionIsBenign?. Whenever a NullPointerException appears, I've found a bug in my code and I quickly fix it.

Also, there are two ways that a NullPointerException can occur:

  1. Null is a valid value for a variable, but the code can't handle it.
  2. Null is not a valid value for a variable, but the variable is null anyway.

In my experience, 2) is much more common than 1). The NullPointerException gives me the information I need to fix my code. And the fix generally involves initializing the variable correctly, not simply checking for null.


Some people, including AdamBerger, fear that the BenignNull? will just result in worse problems, such as infinite loops. In the following code snippet, if sb is a BenignNull?, and sb.length() always returns 0, the code will enter an infinite loop rather than fail with a NullPointerException.

StringBuffer myContent;
String padToLength(int length) {
StringBuffer sb = myContent.clone();
while(sb.length() < length) sb.append(" ");
return sb.toString();
}

Throwing a NullPointerException may FailFast, if it occurs. Sometimes, however the NPE was thrown far from the offending code which left a value null. The BenignNull? certainly decreases the liklihood of FailFast in broken code.


A robust system would handle the previous example by passing nulls upwards, as does SQL. sb.length() would be null, the comparison would be null, the while would be passed over therefore without being run (null != true) and sb.toString() would return null as well.


Java/ESC uses DesignByContract annotations to help it eliminate the possibility of NullPointerExceptions. This is very similar to the NiceLanguage technique. It uses a theorem prover to detect bugs. I've been using it and it is very thorough (but a little quirky). It can also prevent IndexOutOfBoundsException and general misuse of specs for an API.


Null is undefined, uninitialized, not yet known. That is a part of data's state. It should be innate in all data, not a programming exception in C, not a separate attribute in SQL. -- PeterLynch


Discussion moved to JavaExceptionSystemLacksFunctionality

RenameMe to BenignNull?, as this page is talking about such a strategy of treating nulls as benign objects, not claiming that null is currently benign.


AugustZeroFive


CategoryNull


EditText of this page (last edited April 23, 2011) or FindPage with title or text search