Collection And Loop Vs Selection Idiom

Collections and loops are better than complex if-else constructs and switch statements.

Rationale:

It's easy to lose control of a large number of connected if-else blocks as they break the one-printable page rule pretty quickly. While the one-printable page rule is largely an artificial constraint, it does have a very good spirit: reduce the size of information a programmer must parse to understand the program. Gigantic logic statements don't help. If-else statements also have big readability problems concerning the unknown case handling code: it's way at the bottom.

Switch statements are similarly weird and complicated. Especially with fall-through cases. (See DuffsDevice) Switch statements are very indented code as well, and the more indents, the more overwhelming code can become.

A much better idiom is to use a collection of valid case values and their associated resulting cases (e.g. a thunk, strings for a look up table) then iterate through it.

Not only does this reduce the case-selection logic to a short loop, but the collection in most languages is simple to extend, especially if it is an array. Plus, it is very obvious what the result of the case selection is supposed to be from the array, instead of forcing someone to read through a large amount of code to trace all the possible cases.

Arguments:

"It's slower."

Good argument. However, it shouldn't be much slower than an if-else construct as you are still doing the same number of comparisons. You will lose some on the iteration. However, the performance hit shouldn't bother you in most cases. You can consider the larger if-else construct to be an unrolling of the loop. The construction of the collection can usually be done once and only once through class-level or static storage classes, reducing that speed penalty.

switch() statements are fast, but act like gotos in very annoying ways. Plus, they are very hard to read.

"It eats memory."

Most selection lists are small anyway, plus you reduce the code footprint. It's quite possible that by extracting the actual information from the if-else construct and removing the code skeleton surrounding it, you reduce the overall memory consumption. Plus, you don't need to construct a new copy of the table every time you need it. Make it a class or instance member or a static.

Many implementations of switch() statements store a jump table of selection values to code offsets, which is why you cannot use non-integral types in C++ switches. So, you don't necessarily save anything here either.

"Arrays of function pointers are hard to read!"

So, use typedefs, which are arguably clearer anyway.

 typedef void (*pfnHANDLER)( void );
 pfnHANDLER apfnHandlers[] = {
...
 };

In Lisp, there's not much difference. Use cond. If the nesting's too deep for you, break the arguments out into a separate list.

Exceptions:

It's not necessarily to replace all selection statements with arrays and loops. Just complex, likely-to-change selection lists should use arrays and loops. Clearly, (??? There are always other solutions, sometimes better) the fictional code

 // The current element is the requested element.
 // We need to synchronize the current element before
 // doing operations.
 if( iCurrentElement == iIndex )
 {
...
 |

 // The sentinel node is the requested element.
 // Throw an out of bounds exception.
 else if( iSentinel == iIndex )
 {
...
 }

 // An uncached node is requested. Load it from disk.
 else
 {
...
 }

would be better left alone. Initializing a new array for the possibly changing values of iCurrentElement and iSentinel is probably not worth the effort. Also, it is unlikely (m)any more cases will be added. Further more, by breaking it up into blocks, it has become easier to document each case individually.

The reason this case should be left alone, then, is that the cases aren't very orthogonal. That is, they aren't separate, compartmentalized, yet similar enough that they can replace each other.

Other if-else constructs that should not be replaced with loops are ones that take ranges of values or complicated logic:

 if( Friday == eToday && I.HaveDate() )
BuyFlowers();
 else if( Saturday == eToday && I.HaveHomework() )
Procrastinate();

Furthermore, you can completely forego the loop if the selection value maps neatly and (sometimes even just approximately) densely into a range of values that can be used as an index into the associative collection. You can even just use a map (aka dictionary) to do the lookup. The loop/array is just a poor man's dictionary that is easy extend with new values. For example, here is some code that transforms a decimal digit into its Roman numeral equivalent:

 const char *RomanDigit( int iDigit )
 {
assert( iDigit > 0 && iDigit <= 9 );

// Static to avoid multiple constructions/destructions
// and to preserve the life of the string outside this
// function.
static const char *aszDigits[] = {
 "i", "ii", "iii", "iv", "v", "vi", "vii", "viii", "ix"
};

// Dumb Romans never had a zero, so adjust for the off-by-one error.
return aszDigits[iDigit-1];
 }

This code would be hideous as a switch() statement or if/else construct.

[I'm not really convinced the above code snippet is an exception, though. Maybe a sibling.]

Examples:

A common form:

 // Returns the number of elements in a static or automatic
 // array declared in visible scope.
 #ifndef NUMELEM
 #define NUMELEM(x) (sizeof (x) / sizeof *(x))
 #endif

 // Returns pointer to the extension within kszFilename.
 // NULL on error.
 const char *GetExtension?( const char *kszFilename );

 enum FILE_TYPE
 {
UNKNOWN = -1,

GIF,
JPEG,
TARGA,
 };

 FILE_TYPE GetFiletype( const char *kszFilename )
 {
static struct
{
 const char *kszExtension;
 FILE_TYPE eFiletype;
} akszExtensions[] = {
 ".gif", GIF,

 ".jpg", JPEG,
 ".jpeg",JPEG,
 ".jpe", JPEG,

 ".tga", TARGA,
 ".targa",TARGA,
};

const char *kszExtension = GetExtension?( kszFilename );

if( NULL == kszExtension )
 return UNKNOWN;

for( int i = NUMELEM( akszExtensions ); i--; )
 if( !stricmp( kszExtension, akszExtensions[i].kszExtension ) )
return akszExtensions[i].eFiletype;

return UNKNOWN;
 }

[Some people understandably balk at seeing the loop condition as i--. I suggest that it is irrelevant because you know I intend to go through the entire array. But then again, maybe it is harder to read. It does reduce bugs in my code, as I'm used to the idiom, but your mileage may vary. More discussion of this sort of loop at BetterForLoopConstruct.

I also agree that stricmp() is very badly named. The code would have been clearer with a good method name. --SunirShah

Notice that many extensions mapped to the same file type. This is fine. This is good.

Notice how tabular the data is, making it easy to read the data and how the keys (the extensions) map to the values (the file types). Making it tabular also centralizes the data; therefore, you don't have to look everywhere for what is going on and it is easy to add new cases like PCX files. (".pcx", PCX,)

Notice how the array has no size. When it is first initialized, with the table, it will be given a size equal to the number of entries in the table.

Notice that both the enum and the array have trailing commas. In C and C++, the language allows trailing commas in both array and enum declarations in order to make it easier for machines to generate code. This it makes it easier for people to add new rows to the table. Just copy and paste, modify the actual data, not the formatting, and compile.

Notice the struct is anonymous (it is lacking a type name). There is no need to name this one-time struct.

Notice the struct is static. What? Actually, the static modifies the storage class of aszExtensions, thus making the array static. This reduces construction/destruction overhead at the expense of memory. This tradeoff can be made if you can assume this method will be called frequently.

Finally, notice the array of structs is decorated like an array of constant ASCIIZ strings. It was just a matter of choice to decorate these kind of associative-mapping arrays with the prefix of array of typeof(key). The key in this case is a const char * (ksz), and a array of these things is aksz. Choose whatever is best for you.

Another example. Say you have the following code:

 if( Monday == eDay )
 {
BuyDonuts();
DrinkCoffee();
DriveCarefully()
 }
 else if( Tuesday == eDay )
 {
HoldMeeting();
 }
 ...
 else if( Friday == eDay )
 {
do
{
 int iHour = CTime::GetCurrentTime().Hour();

 StareAtClock();
 ReadEmail();

} while( iHour < 17 ) // 5pm
 }
 else
 {
// Weekend
Party()
 }

All the cases are simple thunks (cf. WhatIsaThunk). They don't return values. [Actually, it's irrelevant if they return values or not, so long as they all have the same types of inputs and outputs.] Thus, we could conceivably write functions/methods with congruent signatures to deal with each case. The cases should've been split up anyway, typically programmers are reluctant to create a new function to deal with a case if the selection happens nowhere else. Instead, they just stuff all the work between the braces.

However, if there did exist functions for each case that each had a signature like "void Handler(void);", we could do this:

 void (*apfnHandlers[])( void ) = {
MondayHandler,
TuesdayHandler,
...
FridayHandler,
WeekendHandler,
WeekendHandler,
 };

 apfnHandlers[eDay]();

The assumption is that eDay is an enumerated type whose range is 0 to 6. (0 being Monday, 6 being Sunday)

Not only do you simplify the code that does the selection greatly, but each separate handler now becomes slightly more readable and simpler.

This technique also works in other languages and with more more complicated forms. Play with your language's syntax. Don't go overboard though! Clarity is key!

I used to try doing things like this using MFC:

 struct
 {
...
CString sFoo;
...
 } aBar[] = {
...
 };

This does not work. The struct has to be built from primitive types in order to to use the array initializer.

Actually, you can do this. All you need is to have a constructor that takes a single argument, like this:

  class A
{
private:
int value;
public:
A (int value) : value (value) {}
};
  A a[] = {0, 1, 2, 3, 4};

If your object has multiple values to initialize, you can do it with this syntax, provided that your object has ValueSemantics (and that you're not too squeemish about all those temporaries):

  class B
{
private:
int i;
float f;
public:
B (int i, float f) : i (i), f (f) {}
};
  B b[] = {B (0, 0.0), B (1, 1.0), B (2, 1.41), B (3, 1.73) };

To do this trivially in Java, it's more work (as is everything in Java), but it may still be worth it. Create a named class outside the class you are working with. Make sure the constructor has a parameter for every data member. Then, just create an array of these helper classes, initializing them through the constuctor. An example from real life:

 // The helper class
 class ElementEntry
 {
     ElementEntry( String elementName, Class elementClass, Class whiteboardClass )
     {
         this.elementName = elementName;
         this.elementClass = elementClass;
         this.whiteboardClass = whiteboardClass;
     }

     String elementName;
     Class elementClass;
     Class whiteboardClass;
 }

 // The real class
 class WhiteboardModel
 {
     // The collection
     static final ElementEntry elementTable[] = {
          new ElementEntry( "line", Line.class, WhiteboardLine.class ),
          new ElementEntry( "rect", Rect.class, WhiteboardRectangle.class ),
     };
     ...
 }

In my mind, case or if statements involving a single variable are simpler than the previous examples. Once you move the branching logic into a separate method, it's easy to read and maintain. DoTheSimplestThingThatCouldPossiblyWork.

See: DataStructureCentricViewDiscussion CollectionOrientedProgramming EmployeeTypes ControlTable

CategoryComparisons CategoryIdiom CategoryConditionalsAndDispatching CategoryLoops