What is this thing called "NULL"?!?
Null or Nil Characteristics:
- Etymo: Latin nullus "not any, none," from ne- "not, no" (see un-) + illus "any," dim. of unus "one."
- Semantically: Very difficult. In most cases, NULL or nil means "no value" or "not applicable". Sometimes (like, when passed to a routine as an argument) it means "no explicit value". In SQL, it often means "not known". Sometimes it might mean "whatever". Given this overloading of meanings, it is a little problematic - you sometimes encounter problems related to confusion about the meaning of NULL. Even so, it is a very useful concept - but heavy use of NULLs is often a sign that you should be using lists of values or tagged unions instead.
- Semantically: Very simple. It is the same as the Japanese character "mu" [Chinese "wu"; see MuAnswer]. It means "no thing". In logic it means "neither yes nor no" or "dumb question". You call that simple? The character has at least four different readings in different languages. In addition, it has a native reading in JapaneseLanguage of "nai" which means "does not exist" or "there is no". Depending on the context, this means at least six different things. Hell, that makes the computer nil concept seem pretty well-defined... Yes, I call "mu" simple.
- NilInLisp?: The LispLanguage defined NIL (and perhaps invented) the use of "NIL" as a special object that is both a Symbol and a List - it is the only such object. NIL is the same as (). () is the empty list. NIL is a constant that has NIL as its value. Lists in Lisp are built from ConsCells. Thus, if there is some list (1 2 3), the "FIRST" function answers the first element, which can be either a Atom or a list (in this case the number 1). The "REST" function answers "the rest of the list" (in this case, "(2 3)"). Every list has an implied "NIL" at the end, so that the "REST" of the list "(3)" is NIL. This use of NIL to terminate a list is reminiscent of the use of "NULL" to terminate a char array in CeeLanguage (which came later). It is also reminiscent of the use of TheNilObject in SmalltalkLanguage (not surprising, given DanBobrow?'s influence on the SmalltalkLanguage team). The type of NIL is NULL. NULL is also a predicate that returns T when its argument is NIL. NIL is also (NOT T). It is also the only object that is (NOT T). So NIL serves as the value for logical false. NIL is also the empty datatype.
- NilInScheme?: SchemeLanguage doesn't have a value named nil, but nil as generally taken to be the empty list (as in other Lisps). However, unlike CommonLisp, the booleans in Scheme are #t and #f (instead of t and nil). The empty list is written as '(). Scheme advocates claim that this clarifies case expressions.
- This is a strange position to take, given that the Lisp community has happily, and without any problems whatsoever, been punning on the multiple meanings of NIL for forty years. Perhaps the real problem lies elsewhere? -- AlainPicard
- It bugged enough Lispers that SchemeLanguage finally completely split the boolean concept and list concept. [rest of argument about different treatment of nil in CommonLisp and Scheme moved to IsSchemeLisp]
- NullPointer: Pointers in some languages, like C, C++, and Pascal, can be given the value "NULL," which indicates that this pointer does not currently point to a valid location in memory. Dereferencing a NullPointer causes UndefinedBehavior. [Often NullPointer is declared as 0 (cast to the appropriate type), which causes some machines to generate a hardware interrupt when it is dereferenced. Usually the runtime (when configured to do so) catches this and throws an exception, often through a software interrupt. In older C implementations the results are undetermined - in other words, it blows up in your face.] NULL is commonly used to terminate lists and other complex structures, or to indicate that the given object is absent or unknown.
- NulCharacter?: The NUL character, with code point 0 in most CodedCharacterSet?s (ASCII, ISO-Latin-x, Unicode, etc.). This is not a pointer.
- NulCodeUnit?: In some CharacterEncodingFormat?s, in general more than one CodeUnit? is used to represent a character. Nevertheless, the zero CodeUnit? is almost always reserved for terminating a string, and cannot occur in any character other than NulCharacter? (this also applies to CharacterEncodingFormat?s in which CodeUnit?s are more than 8 bits). It can therefore be used to terminate string representations (e.g. CharStar strings in C).
- NullString?: The string containing no characters; EmptyString?. (This is typical usage for C, where some people call an empty "C style" string ("") a "NullString?". Physically, it is a pointer to a character array that starts with a NulCharacter?, '\0'.)
Is this standard terminology? In my C days, I thought of {char *s = NULL;} as a NullString?, and {char *s = "";} as an EmptyString?.
Yes, this is pretty standard.
- TheNilObject: The SmalltalkLanguage uses "nil" - a reference to a unique object (SingletonPattern) of the class UndefinedObject; used to represent the absence of an object.
- None: PythonLanguage uses "None" in much the same way that SmalltalkLanguage uses TheNilObject
- Void:In the EiffelLanguage, object references are initialized to Void. Void is a constant reference to a singleton object of a class that is derived from all other classes in the system (the BottomType in type theory). Therefore any reference can be assigned the value Void or tested for (in)equality to Void without breaking or complicating the type system. The Void reference does not implement any methods (Eiffel allows a derived class to undefine base class methods or hide methods from client objects).
- NullObject: A design pattern (CategoryPattern) for providing an object with "do nothing" functionality.
- RelationalNullValue?: The "NULL" value, as it's used in relational databases. The rules for using nulls appropriately are quite subtle, e.g. two different nulls are never equal. See NullsAndRelationalModel.
- VariantNull?: The NULL type (value?) of MicroSoft's ComVariant type. (Variants can hold values of several selected types, including integers, strings, and arrays.) In the case of a NULL ComVariant, the vt member of the structure will be set to VT_NULL and will contain no valid data in the union member. Thus a VariantNull? is primarily a type, not a value.
- Nothing: In VisualBasic - a NullValue. (Technically the variant type is 'vtObject' and the value is null. Is not the same thing as VariantNull?.)
- Actually, I think it's closer to VB's NullPointer equivalent. The keyword Null in VB is a VariantNull?.
- VariantEmpty?: Used for "clean" newly initialized Variant variables that "haven't yet been given a type or value." 'Empty' keyword in VisualBasic. 'VarType?(unintializedVariant) = vbEmpty'
- Missing: In VisualBasic, 'Missing' is used for optional Variant parameters that have no default value. Missing is a particular value representing an error.
- UnDef?:In PerlLanguage, "for functions that can be used in either a scalar or list context, nonabortive failure is generally indicated in a scalar context by returning the undefined value, and in a list context by returning the null list", according to the perlfunc documentation. The undefine function - undef() - undefines a scalar variable, an array element or an entire array.
- Nothing: In HaskellLanguage the other possible value of the 'Maybe' datatype. If you type something as 'Maybe String' it can contain either 'Just textValue' or 'Nothing' where Just and Nothing are type discriminants to be used in typecase expressions (or pattern-matching). There's an equivalent type in OcamlLanguage, called option, instead of Maybe, and the cases are called None and Some instead of Nothing and Just. You can have several levels of 'maybeness' without losing type information, that is if you have a 'HashTable Int (Maybe String)' (a hash-table with Int keys and nullable String values) the lookup function has a correct type signature (via type inference): 'lookup :: HashTable keyType valueType -> Maybe valueType', in this case the result is 'Maybe (Maybe String)'.
- StatisticsMissing?: In statistical programs such as Splus/R and SPSS there is a 'missing' value for each type of data, representing data that is unknown. (almost) Any operation applied to this value returns 'missing'. This has the effect that you need not take missings into account when transforming the data.
- FailuresInIcon?: When an operation in Icon "fails" (but in an expected way, such as looking up an element in an array and finding it's not there) it returns a "condition". These can be used to cause backtracking (the feature d'ĂȘtre of IconLanguage); but they are returned in contexts where Null might be used instead. Note that failures are different from exceptions.
A few other things that act like NULL in some contexts.
- end():
- Not really NULL, but similar. CeePlusPlus StandardTemplateLibrary containers all implement a method called end() which returns an ExternalIterator which points to one past the endof the container. (For vector<T>, this might actually be a pointer off the end of an array; for other containers it might be something else). Each container (and each instance of a container) has its own unique "end" iterator; but all of them have behavior similar to NULL. Such an iterator is not dereferenceable; trying to apply unary *, -> or [] usually (hopefully) throws an exception. One difference between end() iterators and NULL is that some pointer arithmetic on end() is meaningful; end()-- is a valid iterator (for bidirectional and random-access containers at least).
- Uhh... I'm fairly certain that no iterator operation is allowed to throw - EVER. Could you imagine trying to make something exception-safe if you couldn't even count on being able to move around iterators? Also, that checking would impose overhead, which is unacceptable. In any case, proper use of STL algorithms will keep you from illegitimately using a past-the-end iterator, in the same way that [] is safe on built-in arrays if your for loop through the indexes is done right. (Note that none of what I just said applied to a debugging checked-STL, which everyone should at least be using if NDEBUG is undefined.)
- [Dereferencing an end iterator has undefined behavior, which means that it certainly can throw an exception, or perhaps cause the Yosemite supervolcano to erupt. End iterators may be somewhat like NULL in that similar sorts of operations are defined or undefined on them; however, they're much more like past-the-end pointers when you consider the role some of them can play in arithmetic expressions.]
- EOF:
- End-of-file macro in C/C++. Usually an integer value guaranteed not to be a valid value for a char type (since char is signed in some C/C++ compilers). If passed to a function expecting a char argument (which is really an int in C, but not C++, due to integer promotion rules), might have strange and wonderful effects.
- nil:
- Represents of lack of positive or negative value in a numeric field.
- Self Reference
- It is an old trick to use a self-reference (this == this.next) to terminate lists instead of a NULL pointer. It may be an old trick, but I find it rather dubious. Not only it risks an infinite loop rather than FailFast on a badly written recursion or loop condition, but it also destroys the uniformity of representation: if you use this trick the empty list is a special case. [More precisely, one wants the empty list (if allowed) to be a special case, but in a sensible, elegant way. It's ugly to use a trick which invites a coding error to show up as a non-ending loop rather than immediate failure.]
EDS had a "OWL," a research language related to the SmalltalkLanguage.
It supported several NULL-related concepts:
- Similar to SmalltalkLanguage "nil". Normally used to represent relational database NULL values.
- NotSet?: Initial value for variables that had not been initialized. Often used to trigger lazy evaluation methods to go find an appropriate value.
- Unknown:After due consideration, the system has determined that it cannot find a value. Typically, after prompting the user, and finding that they don't know, this value would propagate to all dependent values.
(I think there were two or three others too, but I no longer have access to the documentation.)
I often found relational database ThreeValuedLogic to be annoying. This system's 5-value "boolean" logic could be really confusing... Like "true and unknown ==> unknown", "true or unknown ==> true". -- JeffGrigg
Yes. The problem of trying to add semantic to null is that the list goes on forever. ChrisDate (from the database community) wrote a paper a few years ago that showed a huge number of possible meanings for null in a given context (e.g. not applicable, not known, not declared, and so on). If I remember, the essence of the article was that trying to enumerate different forms of null was basically impossible, since somebody will always come up with a new meaning. The way we went at AT&T (where I was working at the time) was to use different NullObject classes, each with different meanings. When an application needed a new meaning, we added a new NullObject class. So, there were no fixed semantics. -- AnthonyLauder
[argument about different treatment of nil in Lisp and Scheme moved to IsSchemeLisp]
Will it not always be the case that a separate return value is required to indicate "not found" in a hash lookup? PerlLanguage has "exists" as well as "defined". The instant one tries to allow storage of a value meaning "does not exist" (as opposed to merely "undefined") in the hash, you go back to needing another return value to distinguish the cases.
It is just the problem of QuotingMetaCharacters? all over again. -- MatthewAstley
Null String and Null String Pointer a problem
One of the problems with C/C++ type languages is that a Null String and a Null String pointer are handled inconsistently in libraries. Most take either as meaning "no value", but some require a NullString? and fall over if passed a Null pointer or reference.
With the addition of Unicode or STL strings this becomes even more confusing, where there are now three possible Null values:
- A null string pointer or reference.
- A zero length string
- A string with a single NULL character.
When calling system functions it is often necessary to convert the string to
old fashioned CharStar strings, and the zero length string needs to be converted to one of the others, again inconsistently.
According to a recent interview with Joshua Bloch, Java 1.5's automatic unboxing will convert null to 0. Oh dear. There is some hope, though. It's not settled, and it could change to throwing an exception.
After some discussion they decided to throw an exception instead. There's a mention about it at LambdaTheUltimate.
In type theory, NIL/NULL/whathaveyou is usually handled in one of several ways:
- As a special unit type; thus a pointer or reference to T which can be NULL is really UNION (T, UNIT(NULL)). A dereference of such a pointer or reference involves (implicitly) checking which case of the union is active, and performing the cast if not NULL, doing something else (throwing an exception, one hopes) if it is NULL. Java references kinda act this way, even though Java doesn't have unions (typesafe or otherwise). Use of a distinguished address (like zero) which points to something that will cause a page fault if peeked/poked is a convenient optimization of this mechanism.
or
- As the BottomType (an empty type which is the subtype of all other types). This is essentially what Eiffel does with its Void type.
Myself, I prefer the former. For one, it allows multiple NULLs with different meanings; though I'm not aware of any languages which define multiple NULLs like that. For another, many
FunctionalProgrammingLanguages use the
BottomType to indicate a raised exception (which I think is also braindead) or to indicate divergence (which kindasorta makes sense - "returning" a non-existent type to indicate a condition which, according to the
HaltingProblem Theorem, we cannot detect...) For a third reason, it allows us to define two types of references; those which can be NULL and those which can't. (C++ has this capability; though it's easily bypassed... a functional Java variant called Nice (
NiceLanguage) also provides references which are guaranteed to not be NULL). If we use
BottomType for NULL, then **ALL** objects in the system might possibly be NULL (including things like ints and bools where it doesn't make sense in most cases).
-- engineer_scotty (ScottJohnson)
Also see NullConsideredHarmful.
What is NULL? Baby, don't hurt me...
I've found this helpful. Given the other discussions on site about Null in databases, I thought I'd contribute several sections of definitions from ISO's evolving set of Geographic Information standards.
ISO/PDTS 19103 Conceptual Schema Language (also in 19107 Spatial Schema):
NULL means that the value asked for is undefined. This Technical Specification assumes that all NULL values are equivalent. If a NULL is returned when an object has been requested, this means that no object matching the criteria specified exists. EMPTY refers to objects that can be interpreted as being sets that contain no elements. Unlike programming systems that provide strongly typed aggregates, this Technical Specification uses the mathematical tautology that there is one and only one empty set. Any object representing the empty set is equivalent to any other set that does the same. Other than being empty, such sets lack any inherent information, and so a NULL value is assumed equivalent to an EMPTY set in the appropriate context.
But ISO/CD 19126 Geographic Markup Language (an XML encoding) introduces a "convenience type" gml::NullType which is a union of several enumerated values: inapplicable, missing, template (i.e. value will be available later), unknown, withheld and anyURI (for which the example is the lovely but non-existent http://my.big.org/explanations/theDogAteIt. This allows much more clarity of meaning than merely allowing 'minOccurs=0'. We're investigating it for a large database model we're developing.
(PeterParslow?)
The problem with NULL is that it is context-sensitive and each context can change the definition. To makes NULL consistent and remove context-sensitivity, you would need several types similar to NULL (uninitialized vars, pointer to nothing, empty object, empty var, etc). In Math it is even more of a problem since the possible meanings of NULL change the actual result. Does A+B+Null = A+B, or undefined, or null, or does it throw an error? This is confounding for developers and language designers, almost a HolyWar (then again everything in language design is).
- Does A + B + null = A + B? If a counter has the value of null, then the counter has not yet had anything placed in it, so its value is 0. So in that case, I would answer "yes". I can think of many situations where the assumption of 0 for null would be appropriate - for instance, for a stock quantity from a database. -- PeterLynch
Donald Rumsfeld has a profound understanding of null -
"There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know."
All of this, and nobody's mentioned NaN (in FP math)? (As of January 2007)
Nice, schmice. You can have non-nullable references in stock Java now, with generics:
public final class Ref<T> {
T obj;
public Ref (T obj) {
set(obj);
}
public void set (T obj) {
if (obj == null) throw new NullPointerException();
this.obj = obj;
}
public T get () {
return obj;
}
}
For an immutable version, just make the set method private.
The downside is that the way Java generics work, the Ref<T> may be more cumbersome to use than the T. For example, if Bar derives from Foo, you can assign a Bar to a reference of type Foo, but not a Ref<Bar> to a Ref<Foo>. So you need those variables to be of type Ref<? extends Foo>, which is SyntacticSalt at its finest. (This assignability rule is for a simple reason: otherwise you could assign a Ref<Bar> reference to a Ref<Foo> reference, use a Foo that is not a Bar in a set method invoked on the latter, and then pull a non-Bar out via the get method on the Bar reference. This does mean that you can't use the set method on a Ref<? extends Foo> though, although you can assign a new Ref object with a new referent.
WhatIsNull you ask?
- Null is what you thought your freed pointer was reset to.
- Null is what the function that -should- always return a valid pointer just gave you.
- Null is how much bureaucracy a system needs on top of "nothing".
- Null is the alpha and the omega, everything begins as null, is initialized, then reset and reused as Null.
- Null is the absence of a number. Some say Null is the absence of a zero.
- The Null that can be named is not the true Null.
- Null is the value of x before and after it had a value.
- Because Null is the value of all things we do not yet know, Null's domain may be the largest domain.
In some languages (OzLanguage being an example) one may introduce a single-assignment variable without actually assigning it. You may then utilize this unassigned variable inside a data structure. One may unify two unassigned variables - i.e. asserting they are the same, such that if one is assigned, then so will be the other. Similarly, you may assign such a variable to a structure that contains yet more unassigned variables. All this is a weakly expressive form of ConstraintProgramming. More expressive ConstraintProgramming would allow you to restrict variables as well as assign them, i.e. to say that X > Y without knowing X or Y. In a secure language, the authority to assign a variable or manipulate constraints may be separate from the ability to view the variable. OzLanguage added security via 'X=!!Y', which says that X can view Y but cannot assign Y.
I've long believed that Nulls - at least in the context of data (SqlNull) - should really be identified by these sort of 'unassigned free variables'. This would allow us to make very interesting observations, the way we do with algebraic and geometric expressions in mathematics. It would be possible, for example, to perform proper joins between tables containing these free variables. If we were using a TableBrowser, one might distribute authority to assign these 'variables', such that one could meaningfully assign to a field that was previously unknown... and update the proper record.
Of course, it would still lead to interesting issues, such as: the sum of 10+X+12 can only be reduced to 22+X, and a join between two tables limited by X>Y may return some interesting 'contingent' records.
See Also: CantHideFromNulls
CategoryNull CategoryDefinition