Fence Post Error

The number of posts needed to build a fence is always one more than the number of sections in the fence. A 16 foot fence with two eight foot sections therefore needs three posts.

When drawing an array, I always think of the indexes as being the *lines* between the cells. This is a bit like the old Macintosh standard that defined the coordinate system to be the lines between the pixels, instead of the pixels themselves. Using this metaphor, the lines look like posts, the cells (or pixels) look like fence sections.

Some examples where this applies:

  1. Arrays and indexes
  2. Strings and character indexes (the same in C)
  3. Pixels and coordinates
  4. Glyphs and character cells (when dealing with fonts)
  5. Lines and baselines (vertical page measures)
etc...

-- TomStambaugh

This is a subset of OffByOne errors.

Note that this class of error is practically eliminated with CollectionOrientedProgramming.

I believe OffByOne errors are encouraged by Pascal, which lets you set arbitrary array bounds (but see comments below), and C, which doesn't let you define index variables in the initialization of for loops.

(nb: C has let you do this since the C99 standard, most, if not all, C compilers have supported this for years.)

In C++ and Java, you can write for (int x=0; x<MAX; x++), and you can't use x in a context where it might be OffByOne.

In C++ and Java, you hardly ever iterate through an array using any other way, and because you write that little piece of code so many times, you don't get it wrong. -- (unsigned ¶)


Array Indices in Algol (admittedly a stubby section)

Algol-68 (AlgolLanguage) defined 'lwb' (or 'lob'?) and 'upb' functions (or maybe they were keyword operators) that either yielded the lower and upper bounds of subranges, or the lower and upper bounds of array dimensions. (This is from fading memory, so someone with a reliable source handy is welcome to edit in the details.) This could certainly be used as a model for an extension to Pascal. -- ClayPhipps


Array Indices in C and C++

Andrew Koenig give a really nice discussion in section 3.6 of C Traps and Pitfalls He says to "Express a range by the first element of the range and the first element beyond it." -- PierceMcMartin

Thanks! You might also look at the discussion in section 2.6 of Accelerated C++. -- AndrewKoenig

Also Dijkstra tells us WhyNumberingShouldStartAtZero.

When ordering is irrelevant, count backwards.

 sum = 0;
 for( int i = sizeof array / sizeof *array; i--; )
     sum = sum + array[i];
In fact, I have a macro I almost always define to do the sizeof magic:

 #define NUM_ELEM(x) (sizeof (x) / sizeof *(x))
-- SunirShah

Shouldn't it be:

 sum = 0;
 for (int i = (sizeof array / sizeof * array) - 1; i--; i >= 0)
     sum = sum + array[i];
I thought that arrays started at zero in C++ hence requiring to be accessed until num-elements - 1...

No, this works: i-- returns the value of i before decrementing, as opposed to --i. So, when i-- returns zero, we've already run the loop cycle where i == 0.

Even though it works, this form is unclear and gives the impression that the writer has accidentally swapped the increment and the test part of the for statement. Some other programmer might "fix the typo", and produce the infinite loop below. An easier-to-read form of the same loop would be:

 sum = 0;
 int i = sizeof(array) / sizeof(*array);
 while (i--)
     sum += array[i];


Your initial feeling that you need to count backwards from

  (sizeof array / sizeof * array) - 1
is correct. It's not that you might miss zero, it's that
  (sizeof array / sizeof * array)
points one past the end of the array. Oops. Fencepost error.

-- GarryHamilton


Be careful! Using backwards-counting loops on unsigned data types can cause infinite loops, e.g.:

 for( unsigned int i = sizeof array / sizeof *array; i >= 0; i--)
     sum = sum + array[i];
You might get lucky and the system will throw an exception, but it's not likely. Most will decrement zero to INT_MAX, and keep on loopin'. -- ChrisFay

Uhh.. if you forgot, the conditional is evaluated before the loop body on loop entry. Unless you're paranoid enough to believe the code will magically cross over zero from 1, it'll never happen. Additionally if (i == 0) on entry to the loop, the body will not be executed. The obvious larger problem is the blatant off-by-one. I see this type of idiom in lots of code (the >= 0 part, be it unsigned or signed) and it never ceases to amaze me how if any real logic were put into it, one would see that only a simple test for zero or non-zero is all that is needed (in cases where one is incrementing or decrementing by one only).

This is why "while (i--)" or "for (i = x; x--; )" work so concisely in orderless cases.

-- clayne


numberof is a popular alternative name for this macro, since it matches sizeof and offsetof

      #define numberof(A) (sizeof(A)/sizeof((A)[0]))
Unfortunately, everyone has such a macro, but everyone names it something different.


Array Indices in Pascal & Modula (WirthLanguages)

In the better statically typed languages (I believe this was first done in Pascal), you write:

  FROM i := FIRST(array) TO LAST(array) DO
and let the language take care of it. -- SteveFreeman

Neither traditional nor standard PascalLanguage (ISO 7185: 1983, 1990) has such a feature. Perhaps the original writer was remembering this feature from one of the 'larger' languages whose popularity has since declined, e.g.: PL/I or Algol-68? Please see elsewhere herein. At least one extended Pascal, designed for systems programming on the ELXSi supermini (1980s), adopted the feature more in the Algol-68 style; however, distribution of that manufacturer's computers were so limited, that most Pascal programmers would've been unlikely to encounter it. Not even TurboPascal (at least as of its version 5.0: 1989) had added such an extension. Beware that TP's LOW() and HIGH() are completely different: they're low-level primitives for extracting the halves from a 16-bit word.

The run-time libraries for (at least) 2 instances of what their collective advocates call ModernPascal, i.e.: DelphiLanguage: an outgrowth of TurboPascal, and FreePascal, both have Low() and High() that return the corresponding bounds for Ordinal, Array, or string types, and a Length() that returns the element-count or character-count for Array and string types, respectively. (FreePascal, but not Delphi, uses Lo() and Hi() as its byte-plucking legacy functions.)

Although it might seem that "OffByOne errors are encouraged by Pascal, which lets you set arbitrary array bounds" (unsigned opinion above), disciplined use of named constants allow problems to be avoided for almost all ordinal types, e.g.:

const ALob = 1970; AUpb = 2038;
type  ARange = ALob .. AUpb];
type  AType  = array [ARange] of integer;
var   A: AType; i: ARange;
...
for i := ALob to AUpb do begin ... A[i] ... end;
The (sub)range type definition above is provided for compilers that rely on named equivalence of types instead of structural equivalence. Alas, the programmer is out of luck for enumerated types, unless using a compiler extended to allow an extra const-definition part following the type-definition part, to provide the opportunity to define a const each for the first and last names in the list of an enumerated type previously defined. And even then there's the risk of bugs if the enumerated type-definition is changed in the future.

Programming in Modula-2 (ModulaLanguage)(1982) shows a HIGH(a) [Report on the P.L. Modula-2 § 10.2] corresponding to Algol-68's 'upb', but inexplicably, no LOW(a), even though the language does allow arbitrary low bounds for arrays [§ 6.4]. It seems not to be used in examples [e.g.: § 6.1: Repetitive Statements], so perhaps it was a late addition to the language. -- ClayPhipps


Array Indices in Perl

Arrays in Perl are zero-based (except if you or somebody else sets the (deprecated) $[ variable). Off-by-one errors can occur if you confuse

        scalar @array  # the number of elements in array
with
        $#array        # the last index in array
on those occasions when you need to dissect an array manually.

It should be noted that this is needed extremely rarely in Perl due to the for/foreach list looping construct. In my experience, this accounts for > 95% of all looping needs. And if I need a loop index, it's easier and less error prone to just declare $i and increment it "manually" -- JpL


Array Indices in PL/I

PL/I (PliLanguage) maintains the numeric index bounds for arrays as a fundamental aspect of the language. They're available as the result of the built-in functions LBOUND(A,D) and HBOUND(A,D), where A is the name of the array variable, and D is the ordinal number of the dimension (1-origin, as in DeadLanguageFortran 66). It also has a DIM(A,D) that gives the cardinality (e.g.: row or column count) for the dimension specified. So in PL/I, if one were determined to code the C countdown above as an explicit loop:

sum = 0;
DO i = HBOUND(array,1) TO LBOUND(array,1) BY -1;
  sum = sum + array(i);
END;
However, using a non-conflicting variable name, the desired value is probably best calculated in a single expression, by invoking the SUM built-in function:

total = SUM(array);
PL/I makes indexing errors less likely in called routines, because the bounds of array (or string) parameters are available within them. As a matter of implementation, PL/I passes a descriptor, not merely some canonical address of the array. If one does run afoul of a fencepost error with an array, the SUBSCRIPTRANGE 'condition' (i.e.: exception) will be 'raised' by default, and PL/I allows a generic handler (i.e.: an ON statement or block) to be written for it. Named exceptions can be enabled or disabled as desired throughout the program, or limited to particular scopes (e.g.: a BEGIN; ... END; block).

-- ClayPhipps


Array Indices in Smalltalk

Collections (including arrays) in Smalltalk are usually iterated using "do:", like this:

 anArray do: [:each | each doSomething].
The collection that receives the #do: takes care of its size and indexing, so that you don't have to. The contents of the block are called once for each element, with #each bound to that element.

The "for (mumble, mumble, mumble) {actions}" construct of Java, C, C++ and Pascal is seldom needed or used.


DonaldKnuth on Array Indices


EditText of this page (last edited April 29, 2012) or FindPage with title or text search