Needing Binary Search

In CommentingChallengeTwo, we got into a discussion of whether one would ever really need to use a BinarySearch. I assume there will be times when. Time to check the assumption. --AlistairCockburn


I was brought up thinking that there are big problems. Putting the telephone directory for Salt Lake City, with 1.5 million entries (not to mention LA or NYC). Tracking social security payments, with 400 million records. Weather systems, telecom systems encountering traveling salesman problems.

I wrote a network routing program, which worked great for the first 10 nodes. Turned out I had an O(N**2) implementation of the graph data structure, which sat inside an O(N**2) algorithm. So, my implementation got really slow all of a sudden. So I am wary of the cost complexity of the functions I use from the delivered class library. Don't I see people warning against overusing Java's Vector class, and Smalltalk's Dictionaries? Running 8 Gemstone processors should only allow you to move from N paychecks to 4*N paychecks if you are running a linear algorithm. But if you need to pay 20*N paychecks, you need a different algorithm.

Having said all that, I still find it useful to question the assumption. Looking for a convincing example of NeedingBinarySearch. --AlistairCockburn


The C++ STD library provides a binary search algorithm, so in that language it is often the simplest thing to use. -- DaveHarris

See also NotNeedingBinarySearch.


Um, perhaps I have a different perspective here. First, I don't think binary search is a complex algorithm. Second, of course there are times when you need it, or something like it. It doesn't take all that long before linear search becomes expensive.

One simple example. Something that lets you choose from a reasonably large list, e.g. all classes in the system, all method selectors, all customers, whatever. When the user types a character you want to go to the appropriate place in the list *very quickly*. If the list is sorted, you can do this with binary search very quickly and easily, and it becomes faster than linear search about the time the list hits 50 elements.

It's all very well to say that you should avoid searching, and of course you don't want to do unnecessary work, but sometimes you do need to search. -- AlanKnight

What about hash tables? Binary search tends to be O(logN), and hash tables tend to be O(1) if you throw enough memory at them. They can also be more generic in that they don't require an ordering relation. In the old days when I wrote my own data structures in C, I'd start with a linear search and switch directly to hashing if speed proved to matter. Binary search was a special technique for esoteric circumstances. -- DaveHarris

Absolutely, and I would say hash tables are more generally useful than binary search, which is why we get by fairly nicely in Smalltalk with dictionaries but no default binary search. However, there are problems where dictionaries don't work, e.g. the one I described above. Another case was where I was doing search in a large diagram. again, there wasn't an exact key lookup, but I wanted to find the components bounding the displayable region. Binary search was very fast and quite simple. -- AlanKnight


A real-world use I've had for binary searching is playing an AI robots game, where you program a robot to seek and destroy other robots. The robots find each other by using a scan(start_angle, end_angle) function, which returns the number of robots within the scanning range. The algorithm I used was to scan the whole field to find out how many opponents there are, and then use a binary-search to find the most densely populated region which is small enough to lob bombs into with a fair chance to hurting somebody. -- LukeGorrie


Simple example: in a 3D modelling package, I had an algorithm where removing a child chunk from a container took linear time dependent on how many children it already had. Thus, removing all children took N*N/2. We used this software for several years, with meshes in the order of 50,000 to 100,000 faces, with no problems. Last year we did a project involving upwards of 300,000 polygon meshes, and all of a sudden, loading and saving files took nearly 10 minutes. Change the algorithm to a binary search, and this goes down to less than a second.

I second the "binary search is not a complicated algorithm" point. If this (or a corresponding tree or hash structure) is any more awkward than a vector and linear search, you need a better container and algorithm library!


EditText of this page (last edited June 8, 2010) or FindPage with title or text search