Not Needing Binary Search

NeedingBinarySearch asks for examples where [a complex algorithm like] BinarySearch is needed. Here are examples of places where something like binary search might have seemed like a candidate solution, but in fact something simpler worked. This is not to say that binary search is evil. The purpose is just to provoke people. Er, no, to provoke people to think about simple solutions.

Do you really think binary search is a complex algorithm? I happen to think the opposite. Now if you compare it to linear search, well, almost anything is more complex than that. But I agree this discussion motivates people to look for simpler solutions whenever possible.


You have a large collection of records, sorted on your search key. The collection is so large that it is spread over many disk pages.

Sure, this is a great example but only because you have to be loading the records. Now in a similar case, when you do not need to load the records I would not even dare to do that. An example is a pool-base allocation library (for embedded systems). If you have to see if the address is part of one of many pools, I would definite use binary search to get to the pool that _may_ contain the address and then binary search again to validate that the address is indeed the start of a buffer in the pool.

An easy solution to quickly finding the record of a given key, if it exists, is to binary search the file, buffering in new pages as needed. The search will only read a few pages, then stay within one page and quickly find the desired record if it's there.

Another easy solution is to keep an internal table of the first key on each page. Search the table linearly in memory to determine the only page that could possibly contain the key you need. Read that one page, search it linearly.

This search will out-perform the disk-based binary search.

Ummm, you just suggested replacing a binary search with a BeeTree. OK, it's a special case of a B-Tree (it never has to deal with a lot of B-Tree details like page splits), but it's still more complex than a binary search. B-Tree generally kicks binary search butt in performance contests for all but a few special cases (where they are equal). Was that intentional?


Something has been bothering me and I just now figured out what it is. The assumption that has been made by many people is that the binary search O(logN) is actually faster than the built-in O(N) search. This may not in fact be the case. It has been my personal experience that if you stick to the XP practices (all of them) performance is not a problem. In one case I was writing code similar to something that was already written and took 8 hours to run. My code was going to do twice as much work so the estimate was 16 hours. Now this existing code was optimized up, down, left, and right, but it was built JangIt style. I stuck to XP as I coded the problem creating a MethodObject to simplify the complex algorithm. I ran my code in 2 hours. 8 times faster. I did absolutely no optimization. Another case was when we replaced a parsing algorithm that was hand crafted by one of our programmers to be spectacularly optimized to read in a serialized database. We threw that away and instead serialized out as a simple Topaz (GemStone) script that could then be filed in using the built-in GemStone parser. It was about 24 times as fast.

I guess the point I am trying to make is that if you find yourself searching an array with thousands and thousands of neatly sorted numbers maybe there is a simplification you could make elsewhere so that you don't need to reach to the bottom of the bag of tricks. --DonWells

If you did something simple that was 24 times faster than a supposedly optimized algorithm, then the point I would take away from that is that either the other guy had constraints that you didn't, or he wasn't very good at optimization -- and that's all.

Well, would it help if we called it a *system*, not *algorithm*? After all, the point about this is that over-aggressive optimization confuses the structure of systems and reduces their overall performance: a step that is locally productive becomes globally counter-productive when combined with too many other such steps.


As far as I know, Smalltalk doesn't have a binary search. It does, however, have a number of hashed collections, which are of course O(1). And they're used rather commonly. That tells me that faster-than-linear access is useful.

To Don's point, however, I suspect it's likely that many of the uses of binary search or hashing could be eliminated altogether by other means. (See for example <http://citeseer.ist.psu.edu/cai95using.html>.) When a program is too slow due to doing a search, the best thing to do is to eliminate the search. This saves 100% of the time, and will usually result in a simpler system.

Other than as an example, which I believe is Alistair's case, actually needing to implement a binary search seems unlikely. Alistair's point, of course, was about commenting complex code, not about whether one woulda/coulda/shoulda implemented that particular complex code. --RonJeffries


EditText of this page (last edited August 29, 2006) or FindPage with title or text search