Better For Loop Construct

This is a performance JavaIdiom. (But we also seem to be talking about loops in C and C++ and Smalltalk and Lisp).

The common way of coding an for statement is as follows:

 for (int i=0;i<upperbound;i++) {
... do something
 }

(Even the second example that is concerned with performance in CheckDontCatch uses this idiom).

When you know upperbound is invariant, sometimes a different idiom can speed this loop up a little.

calculate upperbound once

Sometimes you might measure to see if it would be faster to

count down

When you know upperbound changes (is not invariant), perhaps it would be less confusing to use while instead of for.

calculate upperbound once

In the vast majority of cases, upperbound is an invariant value obtained though some function call such as array.length, string.length() or arraylist.size().

Most compilers are so dumb that they call that function on every iteration, just in case it has changed.

Don't know about Java, but C/C++ compilers are prohibited (in general) from optimizing out such function calls; as there is no way for the compiler to know whether or not the function has SideEffects or depends on external state (and might return different values on successive calls). [in the case of java strings and arrays, strings are immutable, and arrays are fixed-size, so the compiler can confidently hoist the upperbound calculation out of the loop (the same holds for strings and arrays in .net languages, too, although in c# I'd just use a foreach, since the compiler will produce IL exactly equivalent to the for loop)]

Most of the time,the programmer knows it is an invariant. (Sometimes a SufficientlySmartCompiler will also figure this out).

When upperbound really is invariant, the following syntax causes the invariant upperbound value to be calculated once.

 for(int i=0,iMax=upperbound;i<iMax;i++) {
... do something
 }

This format can conveniently be templated with an editor (such as MultiEdit http://www.multiedit.com) such that the construction of the preferred idiom is automatically done just by entering "fo ".

-- StevePritchard

Is the gain significant? What should be indicated is whether or not the upperbound needs to be recalculated.

The significance of the gain is a function of what is the cost of calculating the upperbound and how many times we will go through the loop. For array.length it would eliminate a dereference. For string.length() and arraylist.size() it would eliminate a function call and dereference. In other cases it could involve an expensive function call. The main point is that since there is no penalty for using this construct in simple variable cases, acquiring the habit to use it in all cases improves performance for the situations where it really matters.

-- StevePritchard

True. But setting iMax in a separate line makes it easier to add a comment as to why you're doing it. Though it needn't apply in this case, it can be handy to group critical variables together just to make it slightly easier to check their values make sense while you're verifying all is well.

Count down

(sometimes called "counting backwards")

 for( int i=upperbound; i--; ) {
... do something ...
 }

or equivalently (to make the test for zero explicit)

 for( int i=upperbound; 0 != i--; ) {
... do something ...
 }

See LinearShuffleSummary for another example.

This is a common idiom for (former?) assembly language programmers, because it makes the executable code smaller and faster in most assembly languages (what about Java bytecode?). 68000 assembly language has dedicated instructions for counting down - dbcc = "Decrement and Branch if this doesn't carry, unless (Condition Code)" - though these are relatively slow. DSP processors often have "zero-overhead looping".

Besides ReduceUnimportantInformation, this has the additional benefit of making the conditional-expression more efficient, as the test for zero is automatic with the decrement. The assembler can jz immediately after dec i.

If upperbound is some parameter value to the function or other variable that you won't need after the loop, you can use that variable name directly instead of declaring the new variable i.

A common idiom for string parsing functions that are given the length of the string as a parameter, such as memcpy() or substring(), as upperbound is typically the length of the string. This has much greater parsimony than the normal for triple-expression, and I think it's consequently more readable since I'm used to the idiom. Then again, I recall losing a mark on an exam because the TA thought I was off-by-one. -- SunirShah

I gather this idiom is indeed too confusing. I recommend sticking with for( int i = 0; i < upperbound; i++ ). -- SunirShah

There seem to be two quite separate issues - avoidance of unnecessary recalculation, and efficient loop construction.

The original is certainly not too confusing if explained with a comment.

: If it needs a comment, it's too confusing.

Most code is too confusing without comments.

: Do you remember seeing code like ... ?

 mov ax, 3; ax = 3
 dec ax ; decrement ax
 imul bx; ax *= bx

: What's wrong with commenting a for loop expression?

 // for (upperbound-1) .. 0 by -1
 while( upperbound-- ) {
... do something ...
 }

: Why not just write SelfDocumentingCode?

 for( int i = 0; i < upperbound; i++ ) {
 ... do something ...
 }

: Or one could argue that programmers who aren't familiar with (or at least unable to understand) while( upperbound-- ) are in over their heads. Depends on your perspective and your coworkers. By the way, the "SelfDocumentingCode" has a higher probability of having bugs than the parsimonious while loop. Trust me on that one; I've measured it a few times privately. -- SunirShah

Sorry, I may believe you or agree with you, but I won't trust you. When you did this measure, did you eliminate confounding variables, use random assignment to control and test groups, utilize a double blind, ???? or are you in over your head? If you did not do these things, you have not disproved any particular NULL hypothesis, but have collected quantitative anecdotal evidence that is open to a variety of interpretations. -- AlanChristiansen

It's fine if you don't believe me. It's not an important claim. -- SunirShah

I would be curious to see the measurements and analysis. Can it be provided?

Very few C/C++ programmers really understand the difference between "upperbound--" and "--upperbound." Also, it is confusing to have an implied test rather than an explicit test for zero. Finally, it is also confusing if the test variable is decremented before the code it is controlling. It is better to stick with the more common for loop that most programmers know and recognize than to throw in constructs to cause programmers to "be in over their heads." -- WayneMack

Or maybe the vast army of C/C++ programmers are really Pascal programmers in costume. Or maybe almost all programmers are Pascal programmers searching for a Pascal. Doesn't that mean we should all be using Delphi? -- SunirShah

While the first advice is good advice (hoisting invariant expressions out of the loop that the compiler is not allowed to hoist itself), this second section smells like OutsmartingTheCompiler. Advice as to whether incrementing or decrementing is faster, or pre- or post-increments, etc. is very hardware specific in general. Unless a) you are doing the inner loop of a weather simulation, b) you know the dirty details of the supercomputer you are running on, and c) profiling has demonstrated that the compiler's loop optimization is inadequate, don't do this!. It's PrematureOptimization. It makes the code less understandable to the reader. And unlike the first case, where the compiler often cannot help you - in this case the compiler usually can help you, and will do a better job with the loop than you will.

Going downwards through an array generates horrible amounts of cache misses - x86/x86-64 cache is optimized for looping forward (see intel/amd optimization manuals); therefore, downward counting is really MUCH SLOWER. When not looping through an array, though, downward counting may be a little faster, but the difference is negligible. -- pz

That sounds as though a whole article could be written on the topic of optimal string-handling on the PC... takes me back to when I knew about such matters for the DEC PDP10! (It had pointers to "characters" within a "string", where the number of bits per character was specified.)

for each - focus on making programs easier to *write*, not HelpingTheCompiler

A lot of this page seems like PrematureOptimization to me.

I think that a "BetterForLoopConstruct" would be...

  For Each oItem In oCollection
// Body of loop
  Next oItem

Or even:

 oCollection do: [ :oItem | "body goes here" ].

I agree that would be a BetterForLoopConstruct ... but how do I do that in Java? -- AnswerMe

In Java, [] would translate to an anonymous inner class with one method. [] is a function literal or block. In JavaScript, for a syntax example, it would look like this.

aList.Do(function(anItem){"body here"});

Using the word function is basically declaring an inline function with no name, more importantly, it's body has access to all variables in the scope of where it was defined, this is called LexicalScope. I think smalltalk got it right by having a special syntax for literal functions, make writing them more idiomatic.

Another way to do it in Java is to use the new (in Java 1.5.0? or before?) for-each construct:

// Returns the sum of the elements of array a int sum(int[] a) {

int result = 0;
for (int i : a)
result += i;
return result;

}

(blatantly plagiarized from http://java.sun.com/j2se/1.5.0/docs/guide/language/foreach.html)

Interestingly, the literal Smalltalk translation of the "for"-loop is

  0 to: upperbound do: [:i | i doSomething]

Note that the loop upperbound gets evaluated only once, at the beginning of the loop. This again means that for a sufficiently expensive upperbound, the Smalltalk version will outperform its "equivalent" C version. This fact can in turn be used to craft a micro-benchmark that will "prove" that Smalltalk is more efficient than C. Which teaches one something about the value of micro-benchmarks... -- StephanHouben

Suppose the upperbound isn't invariant? Suppose it shrinks and we're testing for the intersection between it and i?

If the size of the collection is changing in size as you process it, you have a very complicated problem and any defined looping method will not handle all cases. There is a limited set of changes that can be applied even with the upper bound being re-evaluated each pass.

Yes this is premature optimization. This optimization is important IFF

'do something' compiles into about 10 or fewer machine instructions and when profiled they occupy a significant fraction of the compute time and the process is compute bound.
the function call obtain upper bound is something significant like strlen()

In all these cases, my (and I believe it should be your) preferred solutions is to assign the upper bound to a variable declared on a separate line. I think we observed some time ago that cramming as much logic as you can in one line of C is obtusification.

Notes:

: even the STL's vector's V.end(); operation is not good in the conditional of a for loop IFF the loop is very short and calls no other functions, and ...

To all the people attempting to turn this into language war (as if we need another). Better loop constructs are not created by changing flavour of syntactic sugar from C++ to PERL or smalltalk, cobol or basic. Changing to Malbolge however is not a better loop construct. -- AlanChristiansen Refactoring note. I am not attached to anything I said on this page; it has been said elsewhere anyway, and I am happy to lose it in an effort to reduce the noise in here.

Array-based languages like JayLanguage and AplLanguage have the BetterForLoopConstruct because it's the only construct. -- SunirShah

Why hasn't anyone suggested the use of iterators as opposed to indices within the for construct? The advantages of iterators are that the evaluation is often directly based on the data, not an abstract size, and the resulting value can be used directly, no index table look-up. I have found this eliminates any question of off by 1 looping errors and eliminates some obscure array size calculations (for example sizeof(array)/sizeof(element)). In three years of using this approach, I haven't found a for loop that could not be written with iterators as opposed to indices. Examples may exist, but I believe they occur very rarely in practice. -- WayneMack

In Java, for loops are more efficient than iterators. Also, I've written so many for loops over collections that off by 1 errors don't arise. Finally, I sometimes need to treat certain elements differently, according to index, and with iterators I'd need an additional variable.

std::for_each, anyone? [Available to C++ programmers only, your mileage may vary, void where prohibited by law.]

and for the BearSkinsAndStoneKnives contingent:

  while(count--) {

dst++ = f(*src++);

A better loop construct?

(map nil #'do-something my-sequence)

Or even the prosaic

(loop for x across my-vector do (something))

Really, what's all the fuss? Looping should never have been confused with array dereferencing in the first place.

A quick test in C++:

#include <vector>
#include <string>
#include <cstdlib>

int sumArray(const std::vector<int> &vec)
{
int sum = 0;
for (int i = 0; i < vec.size(); i++)
sum += vec[i];
return sum;
}

int sumArrayEfficiently(const std::vector<int> &vec)
{
int sum = 0;
int upperBound = vec.size();
for (int i = 0; i < upperBound; i++)
sum += vec[i];
return sum;
}

int sumArrayWithIterator(const std::vector<int> &vec)
{
int sum = 0;
for (std::vector<int>::const_iterator i = vec.begin();
i != vec.end();
++i)
sum += *i;
return sum;
}

int sumArrayCountDown(const std::vector<int> &vec)
{
int sum = 0;
for (int i = vec.size(); i--; )
sum += vec[i];
return sum;
}


int sumArrayCountDown2(const std::vector<int> &vec)
{
int sum = 0;
for (int i = vec.size(); i-->0; )
sum += vec[i];
return sum;
}

const int iterations = 100000;

int main(int argc, char *argv[])
{
std::string flag = argv[1];
std::vector<int> vec(100);
for (int i = 0; i < vec.size(); i++)
vec[i] = rand();

if (flag == "-1")
for (int i = 0; i < iterations; i++)
sumArray(vec);
if (flag == "-2")
for (int i = 0; i < iterations; i++)
sumArrayEfficiently(vec);
if (flag == "-3")
  for (int i = 0; i < iterations; i++)
sumArrayWithIterator(vec);
if (flag == "-4")
  for (int i = 0; i < iterations; i++)
sumArrayCountDown(vec);
if (flag == "-5")
  for (int i = 0; i < iterations; i++)
sumArrayCountDown2(vec);
}

Timing results with GCC 2.95.2, compiled with -O2:

time ./test -1 # sumArray

real0m0.082s user0m0.080s sys0m0.000s

time ./test -2 # sumArrayEfficiently

real0m0.066s user0m0.070s sys0m0.000s

time ./test -3 # sumArrayWithIterator

real0m0.066s user0m0.060s sys0m0.000s

time ./test -4 # sumArrayCountDown

[Not yet tested]

time ./test -5 # sumArrayCountDown

[Not yet tested]

"pretty" code "i-->0" from CatchDontCheckRefuted added to the above test as "sumArrayCountDown2()".

Do we really need to drag C obfuscation into this discussion? Why would anyone want to strip the whitespace out of what was a clear expression in order to simulate a non-existent syntax? A writer should not make code unclear to future readers just because the writer thinks it looks cool.

Care to test also this? Surely it expresses the programmer�s intention (to have a sum of all the elements of the array) most clearly. (Of course, inside it should look very much like sumArrayWithIterator� though this one evaluates end() OnceAndOnlyOnce.)

inline int sumArrayWithAlgorithm(const std::vector<int>& vec)
{
return std::accumulate(vec.begin(), vec.end(), 0);
}

Also, this test will not show much difference between indexes and iterators, because for std::vector iterators are usually pointers; but this one will be a completely different story:

int sumListWithIndex(const std::list<int>& list)
{
int sum = 0;
for (size_t i = 0; i < list.size(); ++i)
{
sum += list[i]; // Warning! [] has linear time complexity
}
return sum;
}

int sumListWithIterator(const std::list<int>& list)
{
int sum = 0;
for (std::list<int>::const_iterator i = list.begin(), e = list.end(); i != e; ++i)
{
sum += *i;
}
return sum;
}

// almost equivalent to sumListWithIterator internally
// but uses sum = sum + *i
int sumListWithAlgorithm(const std::list<int>& list)
{
return std::accumulate(list.begin(), list.end(), 0);
}

// should be exactly equivalent to sumListWithIterator
// not sure about the syntax, though�
int sumListWithBoostLambda(const std::list<int>& list)
{
using boost::lambda::_1;

int sum = 0;
std::for_each(list.begin(), list.end(), sum += _1);
return sum;
}

I really dig the way BASIC does it (minus goofy block ending conventions):

  for x = 3 to 50 step 2 {....}

  for x = 3 to 50 step 2
....
  endfor

It is easy to remember and English-like. I am not normally a fan of English-like syntax, but the for-loop is an exception. (Step 1 is the default of no step clause). Note ColdFusionLanguage:

  <cfloop from=3 to=50 step=2 index=x>...</cfloop>

I like that Basic layout too - but there is something which annoys me about loops in general. More often than not, there is a need to exit at other than the beginning or the end of the loop. This is a common requirement across all languages. Yet the syntactic structures we use all hide the exit condition within the body of the loop, not at the same level as the loop. Like -

  for x = 3 to 50 step 2
....
If Not(Condition) leave
....
If (Condition) leave
....
  endfor

I am used to, and prefer, this sort of syntax -

for x = 3 to 50 step 2 Code .... .... Until Condition Code .... .... While Condition Code .... .... endfor

or, using RubyLanguage style for the exits,

  for x = 3 to 50 step 2
Code
....
....
  Leave If Condition
Code
....
....
  Leave Unless Condition
Code
....
....
  endfor

The main reason I prefer this is that it gives a clearer visual scan of the code. It makes loops and their exit conditions clearer. -- PeterLynch

In BASIC, a "for" usually has a matching "next", not an "endfor".

  100 for x = 1 to 10
  110 print x
  120 next x

It's kind of reworked BASIC based on TheRightWayToDoWordyBlocks.

I read something about optimization once that said to iterate over an array on a Pentium, it was (at least in certain situations - I forget which) more efficient to do this (for clarity, I'm decompiling assembly here):

  element *end = array+n;
  int i = -n;
  do {
/* Do stuff with end[i] */
  } while (++i);

I think if n might be zero it's better to test it outside of all that, and avoid the initialization.

See UseEnumerationsInsteadOfForLoops, RefactorMatchLoopToUsage

CategoryOptimization, CategoryLanguageFeature, CategoryIdiom, CategoryLoops