Linear Search

Linear search (aka Sequential Search) is the most fundamental and important of all algorithms. It is simple to understand and implement, yet there are more subtleties to it than most programmers realize.

The input to linear search is a sequence (e.g. an array, a collection, a string, an iterator, etc.) plus a target item. The output is true if the target item is in the sequence, and false otherwise. If the sequence has n items, then, in the worst case, all n items in the sequence must be checked against the target for equality. Under reasonable assumptions, linear search does O(n) comparisons on average. In practice, this is often too slow, and so, for example, BinarySearching or hashing or TreeSearch?ing are speedier alternatives.

Here's an implementation of linear search on an array (in Java):

  // returns true iff there is an integer i, where arr[i] == target and 0 <= i < arr.length
  boolean linearSearch(int[] arr, int target)
  {
int i = 0;
while (i < arr.length) {
  if (arr[i] == target) {
return true;
  } // if 
  ++i;
} 
return false;
  }

Linear search can be sped up through the use of a sentinel value: assuming there is a free cell at the end of arr to use, then copy target into that free cell. This lets you move the test ``i < arr.length'' out of the loop. For example:

  // PRE: arr[arr.length - 1] == target
  // POST: returns true iff there is an integer i, where arr[i] == target and 0 <= i < arr.length - 1
  boolean _fastLinearSearch(int[] arr, int target)
  {
int i = 0;
while (arr[i] != target) {  // only one comparison per iteration
  ++i;
} 
return i < arr.length;
  }

  // returns true iff there is an integer i, where arr[i] == target and 0 <= i < arr.length
  boolean fastLinearSearch(int[] arr, int target)
  {  
int n = arr.length;
if (arr[n - 1] == target) {  // is target at the end?
  return true;
} else {
  int last = arr[n - 1];  // remember the final value of the array
  arr[n - 1] = target;
  boolean result = _fastLinearSearch(arr,target);
  arr[n - 1] = last;// restore the array
  return result;
} // if
  }

In one of his interviews, AlexanderStepanov (developer of the StandardTemplateLibrary) says that linear search is one of the basic algorithms he used to test languages as candidates for implementing his view of algorithms and data structures. He claims that only C++ is able to properly implement linear search on all sensible sequences; other languages alway require you to make separate linear search algorithms for different data types. For instance, in SchemeLanguage, you must have a different linear search for lists, arrays, and strings.

URL to the relevant interview? That sounds like quite an extreme position to take, and I doubt it can be supported. Are you sure that's not a misquote?

I can infer the topic. This would have been on the general subject of iterators, and the above is confusingly phrased, but not outright wrong. Iterators are very heavily used, not so much for LinearSearch per se, but to traverse the entire collection for some reason. LinearSearch is a subset of that larger topic.

So the above should be paraphrased to talk about complete traversal of collections, rather than its current reference to "linear search", and then I'm sure that, yes, Stepanov has said similar things any number of times.

The "only C++" part is both dated and a bit of an exaggeration; he's talking about generics in C++.

The comment about SchemeLanguage is outright incorrect, though, as is obvious if one considers that Common Lisp supports generics, and therefore there's nothing preventing one writing a generics module in Scheme (and which has in fact been done, here and there). -- Doug

Rather than do a sentinel, just do it length times, and break on a match.

That misses the point. If you do that, as the code shown above does, then you have to do two tests each time around the loop: one to see whether you've reached the end, and one to see whether you've found the item you were looking for. With a sentinel, you only have to do one of those tests.

You misunderstand me. You do the loop a maximum of length times, with a break on a match. No sentinel required. Single test only e.g.:

 int target = ...;

 boolean found = false;
 int i = arr.length-1;

 while (i >= 0) {// comparison 1
if (arr[i] == target) {// comparison 2
found = true;
break;
}
i--;
 }

This code does two comparisons per iteration - it checks "i >= 0" and also "arr[i] == target". The fastLinearSearch code does only 1 comparison each time around the loop, since using a sentinel moves one comparison outside of the loop, hence speeding it up. Count the number of comparisons done in fastLinearSearch and you'll see.

I don't understand the advantage of searching the entire array. If the array values have no order, then on average only half the array will need to be searched to find the target value. On average, the simple linear search will do two tests on half the array values, for a total of N tests. The fast linear search given above will do one test per array value, again requiring N tests. They appear to do equivalent amounts of work, on average.

Moreover, if the test for equivalence to the target value is more expensive than the index test (e.g., if the array values are strings or a composite data type), the simple linear search seems to be the clear winner. Doing many expensive tests to avoid doing simple tests seems counterproductive, particularly when the code to do so is less clear.

I must not understand part of your argument. Perhaps an answer to this question will clarify things: how is always performing N (possibly expensive) tests more efficient than on average performing N/2 (possibly expensive) + N/2 (cheap) tests?

I've re-written the sample code above in a way that hopefully clarifies things; the original version of linearSearch was implemented so that it always searched the whole array, which is both inefficient and, for this example, misleading because fastLinearSearch does not search the whole array if the target is in it. Note that in linearSearch, two comparisons are done on each iteration of the loop, while in _fastLinearSearch only one comparison is done per iteration.

This sort of trick was much more common in the old days of assembly language. For example, even on a simple machine like the Imlac, the search loop would code up in two instructions ...

cmp i index 
bne .-1

CategoryAlgorithm