I just had the occasion to write the following function, and wondered how the non-commenting folk out there would write or structure this so the comments wouldn't be necessary.
The question is a matter of exploring the commenting issues. Here is a delicate piece of code. What are some ways to write it and comment it? Put code only here, put your commentary on BinarySearchCommentary. -- AlistairCockburn
Alistair's opening C code version, with comments.
int indexIn (int *Array, int top, int value) { // Given an Array[0..top] of integers, sorted low to high, // Return i such that Array[i] = value, if Array contains value. // Return -1 if value is not in Array. int lower, middle, upper; if top<0 then return( -1 ); lower = 0; upper = top; if ( Array[lower] < value && value < Array[upper] ) then { // Following algorithm is a binary search. // loop invariant: expects and maintains Array[lower] < value < Array[upper] // exits loop when upper = lower + 1. // exits function when Array[middle] = value. // needs separate test for Array[lower]=value, Array[upper]=value. while ( upper > lower + 1 ) { middle = ( upper + lower ) / 2; if Array[middle] == value then return( middle ); if Array[middle] < value then lower = middle else upper = middle; }; // if you got here then never was Array[middle]=value // so Array[lower]=value or Array[upper]=value or value not in Array. if Array[lower] == value then return( lower ); if Array[upper] == value then return( upper ); return( -1 );
Alistair revised his algorithm to match Don and Tom's improved and simpler algorithm, O(log log N) faster.
int binarySearch (int *Array, int top, int target) {
// Given a low-to-high sorted Array[0..top], // Return i such that Array[i] = target. // Return -1 if target is not in Array. int lower, middle, upper; lower = 0; upper = top; // loop invariant: if there is actually an i, A[i]=target, // then upper >= i >= lower. Otherwise don't care. // exits function when Array[middle] = value. // must find value using middle, otherwise eventually upper < lower and quit. while ( upper >= lower ) { middle = ( upper + lower ) / 2; if Array[middle] = target then return( middle ); if Array[middle] < target then lower = middle + 1; else upper = middle - 1; }; // never found. return( -1 );}
A little shorter, no-frills C implementation, and one more easily adaptable to generic iterators (as long as it is possible to get the difference between two iterators, and add integers to them).
// Search the ascendingly sorted interval [start, end) for value, returning a // pointer to the found element or NULL if the value could not be found // Undefined behaviour if either start or end are invalid pointers, point // to non-integer data or point to non-sorted data ;-) int *binsearch(int *start, int *end, int value) { // Loop while the interval of possible matches is non-empty while (end > start) { int *middle=start+(end-start)/2; if (*middle == value) return middle; if (*middle > value) end=middle; // Value in [start, middle) else start=middle+1; // Value in (middle, end) } return NULL; }-- SimonBrenner
Don's single-method Smalltalk version (not their preferred version, written for comparison sake).
presortedAscendingIndexOf: anInteger ifAbsent: aBlock | lowIndex highIndex middleOfRange | lowIndex := 1. highIndex := self size. [lowIndex <= highIndex] whileTrue: [middleOfRange := lowIndex + highIndex // 2. (self at: middleOfRange) == anInteger ifTrue: [^middleOfRange]. anInteger < (self at: middleOfRange) ifTrue: [highIndex := middleOfRange - 1] ifFalse: [lowIndex := middleOfRange + 1]]. ^aBlock value-- DonWells
XP preferred version is a MethodObject. Written jointly by Don and Tom:
anArray
presortedAscendingIndexOf: anInteger ifAbsent: [-1]Now lets create some unit tests.
((( #( 1 3 5 7 9 11) presortedAscendingIndexOf: 3 ifAbsent: [-1]) == 2) & (( #( 1 3 5 7 9 11) presortedAscendingIndexOf: 2 ifAbsent: [-1]) == -1) & ((#( 1 3 5 7 9) presortedAscendingIndexOf: 9 ifAbsent: [-1]) == 5) & ((#( 1 3 5 7 9 11) presortedAscendingIndexOf: 1 ifAbsent: [-1]) == 1) & ((#( 1 3 5 7 9 11) presortedAscendingIndexOf: 5 ifAbsent: [-1]) == 3) & ((#( 1 3 5 7 9 11) presortedAscendingIndexOf: 0 ifAbsent: [-1]) == -1) & ((#( 1 3) presortedAscendingIndexOf: 2 ifAbsent: [-1]) == -1) & ((#(3) presortedAscendingIndexOf: 2 ifAbsent: [-1]) == -1) & ((#(3) presortedAscendingIndexOf: 3 ifAbsent: [-1]) == 1) & ((#() presortedAscendingIndexOf: 2 ifAbsent: [-1]) == -1))should return true
Now we can create:
Array methods
presortedAscendingIndexOf: anInteger ifAbsent: aBlock ^BinarySearch indexOf: anInteger in: self ifAbsent: aBlock Object subclass: #BinarySearch instanceVariableNames: 'array target notFoundBlock lowIndex highIndex middleOfRange ' classVariableNames: '' poolDictionaries: ''BinarySearch class methods
indexOf: anInteger in: anArray ifAbsent: aBlock ^(self new setArray: anArray target: anInteger notFoundBlock: aBlock) indexBinarySearch instance methods
setArray: anArray target: targetInteger notFoundBlock: aBlock array := anArray. target := targetInteger. notFoundBlock := aBlock. lowIndex := 1. highIndex := anArray size ^self index self doesNotIncludeTarget ifTrue: [^notFoundBlock value]. self isTargetAtMiddleOfRange ifTrue: [^middleOfRange]. self adjustRange. ^self index isTargetAtMiddleOfRange middleOfRange := lowIndex + highIndex // 2. ^(array at: middleOfRange) == target adjustRange target < (array at: middleOfRange) ifTrue: [highIndex := middleOfRange - 1] ifFalse: [lowIndex := middleOfRange + 1] doesNotIncludeTarget ^lowIndex > highIndex-- DonWells and TomKubit
XP variant on MethodObject not using recursion, from Don:
index [self doesNotIncludeTarget] whileFalse: [isTargetAtMiddleOfRange ifTrue: [^middleOfRange]. self adjustRange]. ^notFoundBlock value "or if you prefer positive logic replace the first line with-> [self couldStillIncludeTarget] whileTrue:"
StanSilver offered:
!Array methodsFor: 'accessing'!
presortedAscendingIndexOf: anInteger ifAbsent: aBlock "Return the index of anInteger in the receiver. Assume the receiver is presorted in ascending order. Use a binary search algorithm for efficiency. If anInteger is not found, return the result of evaluating aBlock. Example usage: #(5 6 7 8) presortedAscendingIndexOf: 7 ifAbsent: [-1] returns 3. #(5 6 7 8) presortedAscendingIndexOf: 9 ifAbsent: [-1] returns -1. " | lowIndex highIndex middleIndex | lowIndex := 1. highIndex := self size. self isEmpty ifTrue: [^ aBlock value]."Check if empty" (self at: lowIndex) = anInteger ifTrue: [^ lowIndex]."Check beginning" (self at: highIndex) = anInteger ifTrue: [^ highIndex]."Check end" [middleIndex := lowIndex + highIndex // 2."Look at middle index" (self at: middleIndex) = anInteger ifTrue: [^middleIndex]."Success" (highIndex - lowIndex < 2) ifTrue: [^aBlock value]."Failure" (self at: middleIndex) > anInteger"Narrow the range" ifTrue: [highIndex := middleIndex] ifFalse: [lowIndex := middleIndex]] repeat"Repeat"! !
Sometimes girls just gotta have fun. I felt sure it wasn't necessary to double-check the endpoints. I awarded myself the PropellerBeanie for the following. -- RonJeffries
Array methods
binarySearch: anObject ^self binarySearch: anObject over: (1 to: self size) binarySearch: anObject over: anInterval | center testValue | anInterval isEmpty ifTrue: [^-1]. center := anInterval center. testValue := self at: center. testValue = anObject ifTrue: [^center]. ^self binarySearch: anObject over: (anObject < testValue ifTrue: [anInterval start to: center - 1] ifFalse: [center + 1 to: anInterval stop])Interval methods
center ^start + stop // 2Or, for minimum lines but not maximum clarity:
binarySearch: anObject over: anInterval anInterval isEmpty ifTrue: [^-1]. (self at: anInterval center) = anObject ifTrue: [^anInterval center]. ^self binarySearch: anObject over: (anObject < (self at: anInterval center) ifTrue: [anInterval start to: anInterval center - 1] ifFalse: [anInterval center + 1 to: anInterval stop])Source: W.H.Burge, Recursive Programming Techniques, 1975. -- RonJeffries
and the C/C++ recursive version:
int indexln(int* array, unsigned top, int value) { return indexln(array, 0, top, value); } int indexln(int* array, unsigned lowerIndex, unsigned upperIndex, int targetValue) { if (lowerIndex > upperIndex) { return -1; } unsigned middleIndex = (lowerIndex+upperIndex) / 2; int middleValue = array[middleIndex]; if (middleValue > targetValue) { return indexln(array, lowerIndex, middleIndex-1, targetValue); } else if (middleValue < targetValue) { return indexln(array, middleIndex+1, upperIndex, targetValue); } else { return middleIndex; } }Any good compiler will perform a tail recursion optimisation, so the generated code will be iterative, not recursive. For example (cleaned up for readability only):
; TriCore C compiler v1.1 r1 SN00087771-184 (c) 2000 TASKING, Inc. ; a4 = array ; d4 = lowerIndex ; d5 = upperIndex ; d6 = targetValue ; d2 = result indexln: jge d5,d4,_3 mov d2,#-1 ret ; targetValue not found -- return -1 _3: add d2,d4,d5 sh d2,d2,#-1 ; d2 = middleIndex addsc.a a15,a4,d2,#2 ld.w d15,[a15] ; d15 = middleValue jge d6,d15,_4 addi d5,d2,#-1 ; middleValue > targetValue: repeat with lower half j indexln _4: jge d15,d6,_5 addi d4,d2,#1 ; middleValue < targetValue: repeat with upper half j indexln _5: ret ; Found -- return middleIndex!-- DaveWhipp
DaveHarris wants to rename some functions from the DonWells and TomKubit MethodObject version, so that the main routine becomes:
index self isRangeEmpty ifTrue: [^notFoundBlock value]. self isTargetAtMiddleOfRange ifTrue: [^middleOfRange]. self halveRange. ^self index
In C, the simplest form would be something like:
#include <stdlib.h> #include <assert.h> static int CompareInt?( void *pKey, void *pElement ) { int iKey = *(int *)pKey; int iElement = *(int *)pElement; assert( pKey ); assert( pElement ); if( iKey < iElement ) return -1; else if( iKey == iElement ) return 0; else return 1; } int IndexIn?( int *aiSearch, int iSize, int iTarget ) { int *pResult; pResult = (int *)bsearch( &iTarget, aiSearch, iSize, sizeof *aiSearch, CompareInt? ); if( !pResult ) return -1; return pResult - aiSearch; }Unless you had some dire need to recreate a library function??!?!
I'm guessing you probably don't even need IndexIn? if you used bsearch() properly. Why return the index into the array when you can return a pointer to the element?
If you can get rid of the function, then you don't have to comment it!
-- SunirShah
I just recently wrote a BinarySearch routine that returned both a pointer to the element (or NULL if not found) and the index in the array (either where the element was found, or where it would be if it was in the array to begin with). That makes it easier to insert the element into the array if it's not already there.
-- SeanConner
LukeGorrie threw in this ErlangLanguage version, possibly years after everyone else had moved on..
%% Find the index of the term Goal in Tuple. %% Returns: {found, Index} | not_found bsearch(Tuple, Goal) when size(Tuple) >= 1 -> bsearch(Tuple, Goal, 1, size(Tuple)). bsearch(Tuple, Goal, Start, End) -> bsearch(Tuple, Goal, Start, End, middle(Start, End)). bsearch(Tuple, Goal, Start, End, Index) -> Value = element(Index, Tuple), if Value == Goal -> {found, Index}; Start == End -> not_found; Value < Goal -> bsearch(Tuple, Goal, min(Index+1, End), End); Value > Goal -> bsearch(Tuple, Goal, Start, max(Index-1, Start)) end. middle(Start, End) -> trunc(Start + (End - Start) / 2).
;; It's better than bad, it's LOOP! (defun binary-search (vector item &optional (low 0) (high (1- (length vector)))) (loop for middle = (truncate (+ low high) 2) for middle-obj = (aref vector middle) while (<= low high) if (< middle-obj item) do (setf low (1+ middle)) else if (> middle-obj item) do (setf high (1- middle)) else do (return middle) finally (return nil)))It's funny how similar a recursive solution can look:
(defun binary-search (vector x &optional (low 0) (high (1- (length vector)))) (if (> low high) nil (let* ((middle (truncate (+ low high) 2)) (middle-obj (aref vector middle))) (cond ((< middle-obj x) (binary-search vector x (1+ middle) high)) ((> middle-obj x) (binary-search vector x low (1- middle))) (t middle)))))-- TayssirJohnGabbour
There is also a BinarySearchInJava