Grok Branching

Branches are special places in any abstract representation of code. It is necessary not to move ANY operations with side effects to the wrong side of a branch. In general with my observation of compiler optimizers the state of the commercial art interpretation of this is to move nothing across a conditional Branch or in fact any branch. Thus to a compiler a branch is a brick wall and many of its optimizations operate strictly inside the space bounded by that wall. (This observation is now quite old and speculative execution in the newer CPUs specifically does move code across branches, but I sitll suspect compilers are not good at it.)

Why would you ever want to anyway?

Super scalar processors are often capable of doing more than one thing at a time and stall far less often if the machine instructions are presented in one order rather than another.

Compiler writers know more about this than I do. I try to let the compiler do this bit for me but there do exist people who know more about specific processors than the compiler writers.

The point is however that if we maximize the length of pieces of code in which the compilers optimizer can shuffle machine instructions it does abetter job.

Branches also can cause stalls in the pipeline. Even processors that implement branch prediction can have severe problems if the branching behavior does not fit the assumptions that the processor designer made.

One way to fix this is to move the machine instruction that set the flag as far from the actual branch as possible. This gets to be quite hard as many machine instructions affect flags and it is only the ones which do not effect flags that can be shuffled in between the test and the actual branch.

What should you do? Recode your algorithm so that there are less branches that occur inside inner loops.

To see the motviation for this GrokBranching page see ArraySumInManyProgrammingLanguages it conatins the following code

template<class T>

 vector<T>::value_type ArraySum(vector<T> &array)
 {
   T retVal;   int cntr;

for(cntr = 0 ; cntr < array.size() ; cntr++) { cntr==0?retVal =array[0]:retVal += array[cntr]; } return retVal; }

Note the ? operator deep inside the inner loop.

Also note the return of an undefined value when array length is zero :(

The following code by way of contrast removes the branch.

 vector<T>::value_type ArraySum(vector<T> &array)
 {
   T retVal = 0;   int cntr;

for(cntr = 0 ; cntr < array.size() ; cntr++) { retVal += array[cntr]; } return retVal; }

If an error is acceptable on the foolish request to dot product zero length vectors.

 vector<T>::value_type ArraySum(vector<T> &array)
 {
   T retVal = array[0];   int cntr;

for(cntr = 1 ; cntr < array.size() ; cntr++) { retVal += array[cntr]; } return retVal; }


What's a branch? Do you mean goto in C++, or do you mean the implicit branches in a for(;;) loop?

The implicit branches can be, and are, optimized around by compilers. LoopInvariantAnalysis (from compiler optimization 101) is a great example.

-- EddieEdwards

A Branch is all forms of branching at the machine code level.

Yes compilers do have a specific set of optimisations that they can make across the branch that is a for loop. These optimisations typically amount to moving precomputable constants (loop invariants) out of the loop.

Having compiled and disassembled a large amount of code by a variety of compilers I believe that I am in aposition to make a judgment call. Compilers get confused by branches. If you take the branches out they often generate good assembly code (often nearly ideal).

Consider the following code.

pD[i+1] = 8;
if (pX ) { pD =pX; }
pE [i+1] = 9;

compilers have atendecy to forget everything from before the branch after the branch. Specifically the code in this branch will probably not have trashed all the registers, but compilers are unlikely to analyse that no matter which way the branch went, after the branch registr A still contains i+1;

Note 'good' compilers could do this. It is just that in my experience they dont.

PS it is probably the destination of the branch that is most critical I suspect that compilers would keep the knowledge of the precomputed values inside the if it is after the if that the compiler says 'got no idea whats in any register' start again compute anything we need from the RAM.

AlanChristiansen

The ARM chips have conditional move instructions which make brances unnecessary for simple if statements like that above. Unfortunately, the ARM compilers are worse than most, and don't make good use of this fact ;) -- EddieEdwards

Not only that but when you GrokAliasing you will find out that if the conditional store modifies memory that may alias to somewhere and so it may be illegal (not ansi compliant) for the compiler to optimise later code.


CategoryOptimization


EditText of this page (last edited February 12, 2013) or FindPage with title or text search