Zero Means Null

Name: ZeroMeansNull

Type: Design

Category: AntiPattern

Problem: Implementing an optional field.

Forces:

Laziness or optimism.
Ignorance of nulls, e.g. "they're too inconvenient to deal with".
Using a storage implementation that doesn't support them.

Supposed solution: Nobody will ever set this to all zeros, so let that represent "omitted."

Resulting context:

When it really needs to be zero, it can't be.
When you need to convert from ints to some form of real, you can't reliably detect zero (RealNumbersAreNotEqual).
You can forget to compare with zero in some cases, and let it through as a nonsense value. You're more likely to get an exception (e.g. NullPointerException in Java) if you use a proper null.

Example:

Designer thinks: "Nobody will ever be at latitude zero, longitude zero."
User reports: "I hope you can help me! I'm trying to draw grid lines on the globe and for some reason one is always missing...."

-- DavidFlater (BeenThereDoneThat) et al.

Variations:

A tax form that treated failure to enter a number as if the number 0 hade been entered.

Some programmers use NegativeOne, 999, or an empty string to represent an unset, meaningless, or inapplicable value.

For text fields, some programmers will use special literal strings like "N/A", "NOT SET", "XXX", "(null)", etc. to indicate nullness. (I have BeenThereDoneThat and discovered, to my surprise that 'Null' is actually quite a common last name. Check the phone book.)

I am not sure the use of "N/A" or "Not Set" falls into this category. Often times, these two values are precisely what the user wants, and it is the programmer who uses insists on using a NULL or empty string instead.

Enumerated types often include a "null" value as one of the valid values. This is not as much of an anti-pattern, but can still lead to confusion and errors.

9/9/9999 as a Null date.

Zero being used to store positive infinity (no limit). When a finite target was entered and the process started and run to completion. In some unusual situation resulted in an alarm following completion, the user would fix the problem and press start not realising the process had finnished. This would result in the process continuing until some one pressed stop.

For the actual null object, see NullObject.

I like 02/31/99 as null date.

So, what are the alternatives?

You can create an additional Boolean variable to indicate whether a value is null. You can use a pointer to alloc'ed data, and set the pointer to NULL if no such data exists.

And obviously you can use a class that wraps the pointer or boolean solution.

You could use a sane language with a proper type system:

HaskellLanguage:

  data Maybe a = Just a | Nothing

ObjectiveCaml:

  type 'a option = Some of 'a | None

SmlLanguage:

  type 'a option = SOME of 'a | NONE

CommonLisp:

  (declare (type (or fixnum null) foo))

DylanLanguage:

  foo :: false-or(<integer>)

CsharpLanguage:

  Foo? foo;  // syntactic sugar for Nullable<Foo>

And the issue doesn't even arise with DynamicTyping, because you can just add in a value of a completely different type. I frequently use "false" to mean "argument not supplied" in PHP, because it doesn't allow computed default arguments. A null/unset value means I'm propagating a null variable from upstream (like a monad!), a 'false' means "compute the default for the function", and anything else is that value. -- JonathanTang

PerlLanguage: use "undef" to indicate null. Both "undef" and 0 evaluate to false in a boolean context. You can tell 0 from "undef" using the "defined" function:

 if (defined $x) { } # $x is not undef, but might be 0
 if ($x) { }         # $x is not undef, or 0, or some other false value
 $x = undef;         # empty out $x
 undef $x;           # "undef" can be a verb, too

Note that there are subtleties to it depending on what you're checking. See "perldoc -f defined" and "perldoc -f exists".

RubyLanguage: you can use "nil". "Nil" and "false" evaluate to false in a boolean context, everything else (even 0) evaluates to true.

PythonLanguage: you can use "None". "None" evaluates to false in a boolean context, but can be explicitly tested for ("is None" or "is not None").

And, of course, beyond language there is the matter of storage. I've worked with database engines that used " / / " to represent an empty date, and the usual 0 (no number) and <spaces> (no string). The only detectable truly unassigned values were dates. There was no way to retrieve a record and differentiate between "has received no votes" and "we have not counted his votes yet" without establishing additional flag fields to indicate "yes, I've counted the votes, and he has none". It was having to solve these design problems that finally brought home the value of "NULL" or "NIL" as worthwhile representations.

Certainly they can be a highly pragmatic way to solve missing-value problems, however those who hate the NULL approach would just do further expansion of the schema so that, if there was no value, there was no row, either, and thus no need to represent "missing value", it just wouldn't be there. This is always possible, albeit not always convenient. See ChrisDate.

This is an anti-pattern, but is sometimes useful. In many languages, if you search for a substring in a string, the result is -1 if the substring is not found. I haven't found that to be too much of a problem.

It's true that this is generally not a problem, but it can be if someday somebody decides that an index of -1 is valid (to mean the index of the last character in the string, or to mean the entire string, for example).

Actually, this is quite a bit of a problem. I have seen far too much code where the return type is ignored.

Example: (From Unix) time_t mktime() converts a year, month, etc. time spec into seconds since 1970-01-01 00:00 UTC. Somebody decided to return (time_t)-1 if the time spec is invalid, presumably before somebody else extended the time functions to handle times before 1970. So now, if the programmer ignores the return, you mysteriously get lots of things happening at 1969-12-31 23:59:59; but if the programmer checks the return, any time series that happens to touch on 1969-12-31 23:59:59 is *cursed*. The man page goes on to say that you can check errno to see if there was really an error, but I've found losing platforms that don't even bother to set errno. -- DavidFlater

I don't think that's a good example. time_t is meant for representing times within the lifetime of the computer system, not the whole range of human history. Negative values are not supported; in fact, time_t may even be unsigned. -- BenHutchings

You have converted a newbie. I am now changing my website user authentication script in php so NULL passes when they don't enter anything, instead of the default "UNSET." -- AnonymousDonor

UNSET in PHP is actually the same as NULL. This code:

    <?php
    if(is_null($x)) echo '$x is null<br>';
    if(!isset($x)) echo '$x is unset<br>';
    echo 'Setting $x to 1<br>';
    $x = 1;
    if(is_null($x)) echo '$x is null<br>';
    if(!isset($x)) echo '$x is unset<br>';
    echo 'Setting $x to NULL<br>';
    $x = NULL;
    if(is_null($x)) echo '$x is null<br>';
    if(!isset($x)) echo '$x is unset<br>';
    ?>

prints this output:

  $x is null
  $x is unset
  Setting $x to 1
  Setting $x to NULL
  $x is null
  $x is unset

See also Appendix L of the PHP Manual (Type Comparisons) and Chapter 6 (Null). -- JonathanTang

Besides which, the point isn't that there is something evil about the letters N U L L; the points argued on this page apply to ALL exceptional values -- any value that is actually a status code represented via type punning as if it were an actual value.

So if PHP happened to have different representations for NULL versus UNSET, nonetheless if you decided that it was philosophically wise to always avoid using NULL, the same decision should apply to UNSET.

Mathematically all of this is usually in regard to the BottomType (which usually has only a single possible value, but can have multiple values that all behave the same as the single-valued Bottom), but as applied to programming, the primary issue is one of InBandSignal versus OutOfBand signalling of errors.

In-band error signalling tends to be more convenient, so it is widely used, but OutOfBand signalling tends to be cleaner.

I don't really know about PHP, but in Ruby at least, nil is of a different class from a number; it cannot possibly be a legitimate value.

"undef" in perl is often used the same way, but if you use "undef" where a number or string is expected, it defaults to the value of 0 or "" respectively, so it is possible to confuse "undef" with a legitimate value.

So, you get the convenience of InBandSignalling, and the cleanliness of OutOfBand signalling, like this perl code:

 while (defined ($c = getc(FILEHANDLE))) {
    print $c;
 }

Note: getc returns undef at the end of the file, or on an error.

No. This is all just standard InBandSignalling stuff, period. There is no "cleanliness of OutOfBand signalling" here. You're dragging in the issue that these languages have typed values rather than typed variables, that's all.

People who are opposed to use of NULL (famously, ChrisDate) would see no difference between what you're talking about, versus any of the other NULL schemes.

In C, 'getc()' returns an 'int', rather than a 'char', so that the EOF InBandSignal can be used. (But this doesn't help with 'getw()', where data (two byte UniCode characters) can be the 'int' size on some older CPUs.) It is still OK, because the FFFF code point is reserved.

---

One objection raised by critics like ChrisDate is the issue of whether NULL=NULL is treated as TRUE, FALSE, NULL or an exception. (Strictly speaking this is not the same as ZeroMeansNull issue but is related to the general issue of representation of NULL). This varies between languages and even within languages (I know at least one language that treats NULL=NULL as NULL for basic types, but equality if comparing objects or empty collections/sets/arrays).

At least as far as SQL is concerned, NULL=NULL would remove some (but only some) of the relational theory objections. It's certainly one of the biggest pitfalls most developers fall into with SQL, but can be largely avoided by good database design patterns.

If something has a property that's not a number or a string, why overload the value with that property?

isMissing, isNotCalculated, NaN, doesNotCompute, etc. are all perfectly good ways to say something about an associated variable. You can even have more than one such property without getting ingenious about exception values.

--MarcThibault

We were debugging like madness until we found out that the customer actually WAS born on 1970-01-01.