Polymorphism Vs Selection Idiom

From SelfDocumentingCode:

Polymorphism is better than selection according to "strict" OOP philosophy.

Compare CollectionAndLoopVsSelectionIdiom.

Rationale:

We're object oriented programmers. Switch()es and switch-like constructs should disgust and horrify us! Well, maybe not that bad, but we know other (better?) ways.

Instead of using a metalevel structure to associate the key to a value, why not just ask the key for the value?

Keys are objects too, you know. Thus, you can simply invoke a method to return the value instead of going through all this hocus-pocus. Heck, the function pointer example above looks very much like a dispatcher for an object's function table.

key->GetValue?() is the answer!

Arguments:

"Yeesh, you mean I have to make a whole new class type for each key?"

Yeah, this sucks. This can quickly cause code bloat. However, if you already have a class type for each key, go nuts. Also, you can do it per instance, not just per class.

"I can't see all the key->value mappings anymore!"

Yes, this also sucks. The nice table is now distributed throughout the entire code base. However, if the value is something that is inherent to the key object, it would probably make more sense to just ask the key for the property.

"I want to make different sets of key->value mappings depending on the context."

Yes, this is another big problem. You see, what you're really doing with the array/loop construct is making a map of a key to a value. This looks very similar to and works very much like the map of method names to methods. This is sometimes called the dispatch table (especially in dynamic environments like Smalltalk), or the function table (the virtual function table in C++). The selection in this case is the selection of the right code fragment to execute. In dynamic environments, the method invocation algorithm is very similar to the array/loop construct except that methods are typically stored in an actual dictionary to keep the lookups close to O(1).

Thus, by using methods to do the selection for us, we are simply just using this map instead of our own. Unfortunately, there is only one method map. A given symbol can only map to one code fragment. So, when we want different kinds of mappings between key and value, we're stuck.

We can always pass in context to the method, but this is just faking it. (If the context is a Visitor, maybe not...) We might as well just do the table thing as above.

By the way, the array/loop technique allows us to add mappings to a class hierarchy we are unwilling or unable to change as well, which is a very valuable tool in flexible programming.

"It's harder to maintain."

It can be if the given values for the given keys aren't natural extensions of the key class. If the methods are special cases grafted onto the class definition, then maintenance becomes a nightmare. No one is really thinking about these methods and they become easy to ignore.

However, it is equally difficult to remind people to extend such and such table in such in such file way off in another project when adding a new class. You can force new "rows" to be added to the table by adding abstract (pure virtual) methods to the base class of all the keys.

Not only that, but you can get rows for free by allowing subclasses to masquerade as their base classes, or to do more complex, two-dimensional mappings by passing the lookup up the class hierarchy.

"It actually reduces the readability of the code."

Yes, by hiding the values behind an interface, you lose sight of what's going on. This is called information hiding and encapsulation and is usually trumpeted as a good thing. This helps protect against assumptions made about key->value mappings as now you only know that there does exist a mapping from the key to the value, but you don't know exactly what it is.

However, there are cases where this doesn't help, like the one mentioned in CollectionAndLoopVsSelectionIdiom with the extensions to filetypes. This is unlikely a good candidate for information hiding as file extensions are well-documented standards. Besides, the extensions probably don't map well to specific classes or instances.

Exceptions:

See the Arguments for a rough idea of when this technique can prove to be worse than the simpler array/loop idiom.

Examples:

A very good example is any polymorphic behaviour, like serializing to a device. This should be obvious to any good object-oriented programmer because the natural place to put the ability to serialize would be as an (abstract) method in a class hierarchy's base class that is specialized by each derived class. One would not use lookup tables and global functions to get the job done!

Another example would be a IsFileSupported?() method. We could do something like:

 enum OBJECT_TYPE
 {
    Text,
    Hypertext,
    Graphic,
    Video
 };

enum FILE_TYPE { PLAINTEXT, HYPERTEXT,

GIF, JPEG, TARGA,

MPEG, ASF, }

bool IsFileSupported?( CContent *pContent, FILE_TYPE eFileType ) { assert( pContent );

static struct { OBJECT_TYPE eObjectType; FILE_TYPE eFileType; } aeObjectTypes[] = { Text, PLAINTEXT, Hypertext, PLAINTEXT, Hypertext, HYPERTEXT, Graphic, GIF, Graphic, JPEG, Graphic, TARGA, Video, MPEG, Video, ASF, };

for( int i = NUMELEM( aeObjectType ); i--; ) if( pContent->GetType?() == aeObjectTypes[i].eObjectType && eFileType == aeObjectTypes[i].eFileType ) return true;

return false; }

but, this is ridiculous. It would completely miss the point of object oriented programming! The "object type" is built into the instance!

Also, notice how Hypertext supports plaintext as well just just Text. Say Hypertext derives from Text. It is a maintenance nightmare to represent a whole class hierarchy's worth of information in this table. Instead, recursive ascend the hierarchy through base class calls. This is what is meant by two-dimensional mappings above.

Consider this class hierarchy instead:

 class CContent
 {
 public:
     ...
     // Returns true if this object supports the given
     // file type.
     virtual bool IsFileSupported?( FILE_TYPE eFileType )
     {
         return false;
     }
     ...
 };

class CText : public CContent { public: ... // Returns true if this object supports the given // file type. virtual bool IsFileSupported?( FILE_TYPE eFileType ) { return PLAINTEXT == eFileType || CContent::IsFileSupported?(eFileType); } ... };

class CHypertext : public CText { public: ... // Returns true if this object supports the given // file type. virtual bool IsFileSupported?( FILE_TYPE eFileType ) { return HYPERTEXT == eFileType || CText::IsFileSupported?(eFileType); } ... };

class CGraphic : public CContent { public: ... // Returns true if this object supports the given // file type. virtual bool IsFileSupported?( FILE_TYPE eFileType ) { static FILE_TYPE aeSupportedTypes[] = { GIF, JPEG, TARGA, };

for( int i = NUMELEM( aeSupportedTypes ); i--; ) if( aeSupportedTypes[i] == eFileType ) return true;

return CContent::IsFileSupported?( eFileType ); } ... };

You can do the CVideo class yourself.

Notice a form of CollectionAndLoopVsSelectionIdiom used in CGraphic. ;)

By passing the call up the class hierarchy, derived classes can effectively add not only their own associations to the "table", but all their base classes' for free! This is much more maintainable.

Assuming the future change patterns are really tree-shaped, considering I find many things are often sets (or best represented as sets) instead of trees if you study them over the long-term. Multiple inheritance does not really solve this because it lacks an easy way to put complex handling logic where there is complicated or overlapping dispatching issues.

Plus, because it is a method invocation, and not a global function, it has all that wonderful object-oriented mojo going with it, like making the concept of supported files a property of the object (which it is).



(moved from InheritDontBranch? )

The basic idea is that a case statement, or a class test for that matter (Java's instanceof) can often be turned into a structure where overriding is used to execute the choice to be made. -- MarnixKlooster

"Almost always", not "often". I've been using OOP for nearly 15 years, and 90% of the case statements I've seen in that time should have been handled by message sending/virtual function calls. If statements are another matter; it is not usually not worth getting rid of them.

See ReFactor, SpecialCase


Where have you seen this before? The best example that comes to my mind is the State pattern. I really don't see using polymorphism to replace code like

if (x > 1.0 && x < 4.0)
// ...
or even
switch (c) {
case '\n':
// ...
-- BillTrost

See NatPryce's comments around FunExerciseAnswer below.


The classic example of this is event dispatch. Consider, for example, JDK 1.02. Wherein events were handled by a big conditional statement keyed on the sender. It's just plain nasty, but there's an awful lot of examples like the following

public boolean action (Event e, Onject arg) {
if(e.target == clear_button) {
....
}
else if(e.target == other_object) {
....
}
}

This amounts to, by and large, breaking message dispatch into two pieces: the part provided by the language and the part that you write yourself (the inside of an if block is the "real method" that is being called). And the essence of the pattern is this: don't encapsulate message dispatch code in an if block. Instead, use PolyMorphism, function overloading, or a command pattern or an amalgam of these various strategies

(I rather like the way Java uses inner classes as the way to route events).

Why on earth would you like it? It's a rather nasty and quite ugly hack. Simple anonymous functions with lexical scope would be far cleaner and simpler.

Why is the if structure bad? That's harder. Among the obvious reasons are that it kills compile-time checking (essentially, when you write your own message-dispatch code, you turn a CompileTimeError into a RunTimeError). It also encapsulates too much control-flow information in the object (and makes dynamic routing more difficult) and violates the principle of small methods.

But all of the above are, to some extent, aesthetic objections. Is there a killer reason not to do the above?

WilliamGrosso


I see two contexts in which branching can be replaced by polymorphism. The first is where statements such as:

Shape *shape = ....;
switch( shape->type() ) {
case SQUARE: 
drawSquare( (Square*)shape );
case CIRCLE:
drawCircle( (Circle*)shape );
...etc...

Can easily be replaced by polymorphic interfaces or the VisitorPattern.

The second context is where the program must perform different processing depending on input from external sources, such as GUI events from the operating system, messages received over the network. In this case, one can replace explicit tests and branching with polymorphism by defining an interface through which processing gets requested and storing implementations of that interface in some table indexed by possible input values.

I usually call such a structure a Demultiplexor; the pattern is widely used in the implementation of communication protocols.

For example:

//  The interface through which processing of messages gets requested.
class Message_Processor {
virtual void process( Message *m ) = 0;
};

// Implementations of the interface process different message types. class POST_Processor : public Message_Processor { ... }; class GET_Processor : public Message_Processor { ... };

// The table that maps message types to message processors. map<int,Message_Processor*> processors;

// Fill the table with implementations processors[POST] = new POST_Processor(...); processors[GET] = new GET_Processor(...);

void Receive_Message( Message *m ) { int msg_type;

// Extract the message-type from the message (*m) >> msg_type;

// Process the rest of the message depending on the message-type processors[msg_type]->process(m); }

(Error checking has been elided for clarity.)

Is this page meant to describe both of these styles? Or just one? If so, which one?

-- NatPryce

When processing incoming protocol messages, it is useful to extend the above idea by creating intelligent Message objects (not to be confused with the Message class used in the code above) in the processor and using polymorphism to do the processing. A FactoryMethod can be used to build an instance of the appropriate Message subclass based on the content of the raw message. Then the processing code is reduced to:

   Stream/String rawMessage = getNextMessage();
   Message message = Message.createMessage(rawMessage);
   message.process();

New message types/processors are easily added by subclassing without needing a table of processors or other such overhead.

Doesn't this just move the table into the factory? Somewhere we still need to decide which message subclass to create. --MichaelBanks

Not necessarily. I don't know what solution the original author was thinking of but one way of doing this is to ask each registered message class if it can handle the message. Here's an example using Visitor:

   public interface TypedMessageHandler {
      public boolean handledMessage(MessageHandler m,String msg);
   }
   public class MessageHandler {
      public void handleMessage(String m) {
          // SF and CF are implementations of TypedMessageFactory.
          if (SquareFactory.handledMessage(this,m)) return;
          if (CircleFactory.handledMessage(this,m)) return;
      }
      public void handleSquare(Square s) { ... }
      public void handleCircle(Circle c) { ... }
   }

But that's ugly because it can only handle a fixed set of message types (though knowledge of what types is concentrated in one place.). An alternative is to do something like:

   public interface TypedMessageHandler {
      public boolean handledMessage(Message m);
   }
   public class MessageHandler {
      public void handleMessage(String m) {
          for (handler in registered message handlers)
             if (handler.handledMessage(m)) return;
      }
   }
   public class SquareMessageHandler {
      public boolean handledMessage(Message m) {
          // type identification goes here
          if (not a square) { return false; }
          return handleSquare(new Square(m));
      }
      private boolean handleSquare(Square s) { ... }
   }

The point is to move type identification into the factory for that type. If you pass it something that it can't use to create an object of the correct type it won't do the processing. The second example is definitely something you'd see in webservers and the like.


The original statement I was asking about above was

New message types/processors are easily added by subclassing without needing a table of processors or other such overhead.

Your code has the lines

          for (handler in registered message handlers)
             if (handler.handledMessage(m)) return;

Isn't the list of registered handlers exactly the same as the table of processors first mentioned, with all the implied overhead?

Certainly the list of handlers has to exist somewhere. This refactoring decouples the registration of the handler from its execution, which allows us to wire things up on the fly with library functions or config files. It also introduces a seam that makes unit testing easier: we can test the message handling code independent of the message dispatching. -- AndrewCouch?


I say FactoryMethod here, but could this be considered a AbstractFactoryPattern or BuilderPattern? I'm still shady on the subtle differences between these patterns.

-- EricRizzo?


How do you implement the Demultiplexor when the msg_type is not scalar, but rather an (almost) arbitrary object? This should not happen in code that you've designed entirely yourself, but unfortunately happens when you need to interface to the world outside. The best I've come up with (in Java) to avoid cascades of if-instanceof is a Map from Types to Objects that takes inheritance into account. -- MichaelSchuerig


The "circles and squares" example above: I seem to remember reading (either in the old ARM, or in the C++ Programming Language) that BjarneStroustrup was trying to discourage this sort of code (branching on the class type of an object) by not adding RunTimeTypeInformation (RTTI) to C++. RTTI was later added, because it was better to have a single standard RTTI than to have many groups implement it themselves.

The "message processor" example above:

int msg_type;
(*m) >> msg_type;
Pardon my ignorance, but what does the second line of this really do? (Is the >> overloaded?)


Yes, it's an overloaded "extract" operator, as is used for ostreams, etc. I've added comments to the code example to make this clearer -- NatPryce


In TheDesignAndEvolutionOfCpp (ISBN 0-201-54330-3 ), there's a small yet satisfying discussion of RTTI. The basic point Stroustrup makes is that, if you're using a library, you're stuck with the return types that the library provides.

E.g. suppose the standard rendering library includes "triangle" and I create a subclass called "equilateral triangle." I can pass it in to the library routines (since it is a subclass of triangle), but I can't get it back (the library returns me triangles). Hence, the occasional need for RTTI (to see whether the library is returning me an equilateral triangle).

Of course, this is really a hack to avoid the true issue, which is: When is the right time to determine a method call's signature?

In C++, the signature of the function being called is determined at compile time. And, whenever possible, the function being called is also determined.

In Java, the signature is determined at compile time and the actual method being called is determined at run-time.

You could get rid of Stroustrup's reason for branching if you simply determined both signature and function at run-time (the role of the compiler would be "guarantee that there is some method which could be called at run-time").

Of course, this has other problems (speed being the biggie).

WilliamGrosso


Why is the if structure bad?, asks WilliamGrosso, and he finds only aesthetic objections.

I think that aesthetics are more important in software than that they are given credit for. BeautyIsOurBusiness.

KentBeck emphasizes in the introduction to SmalltalkBestPracticePatterns the importance of LotsOfLittlePieces?. That's what this pattern gives you: by introducing inheritance, pieces that were previously tied closely together are now untied.


And Beck's book is a fine one; well worth reading even for those of us who are not, and do not intend to be, Smalltalk programmers.

But the point I was making, the point I have gradually drifted towards over the past two years, is this: BeautyAintMyBusinessNoSir

WilliamGrosso


The if structure is bad because I cannot extend it without modifying it, thereby putting existing behavior at risk. Polymorphism can be extended at any time without affecting existing behavior. A second reason (more related to aesthetics, but with profound practical implications) is that the if structure generally is duplicated, sometimes many times. Introducing messages makes all that duplication go away.

Another implication is the reduced need for prose documentation. Much of my prose documentation that really was necessary was describing the cases in which a function was designed to work. Transforming it to a message made all that description completely redundant. If you have InMemory?, OnFile?, and OverNetwork?, you hardly need to write that the three cases in which this function works are in memory, on a file, and over a network.

--KentBeck

The "extend" issue seems to depend on the ChangePatterns one is comparing. There are changes, perhaps common ones, that make changing polymorphic structures more difficult. This is taken up in some of the 'see also' links below.


A fine answer. Thank you.

WilliamGrosso


Fun exercise. Try to write this class with no branches. You can add as many classes as you like.

class Silly { int function (int n) { return n < 5 ? n : 0; } };

-- MichaelFeathers

FunExerciseAnswer


One cannot easily use polymorphism to replace tests against ranges, especially if one is testing against a logically infinite range, such as the range of all integer values. However one can use polymorphism to replace tests against single values of a small, fixed set, such as message identifiers, characters read from a stream and so on.

-- NatPryce

...and I think this answers BillTrost's question near the top of this page:

One can use polymorphism on code like "switch (c) { case '\n': ..." by using FlyweightPattern+SingletonPattern; having one instance for each possible character. This would be overkill if you were looking for newlines, but could be useful if you're doing something really complex with characters, like drawing them in high resolution.

(There's nothing inherently wrong with switching/branching. But if you see the same select/case show up in 3 or more places in your program, it's time to give it a second thought.) -- JeffGrigg


As I show in my attempt at a FunExerciseAnswer, there are other options as well as polymorphism and switching/branching. I would probably state this as BranchRemoval. There's no need for either polymorphism or branching if the end result is just a "simple" arithmetic (or string etc.) function of the input, as is the case with the "Fun Exercise" above. -- FrankCarver


E.g. suppose the standard rendering library includes "triangle" and I create a subclass called "equilateral triangle." I can pass it in to the library routines (since it is a subclass of triangle), but I can't get it back (the library returns me triangles).

This is something the type system of ObjectiveCaml can solve, i.e. you have a function which you pass a type `a that is "at least" a triangle and that returns something of type `a. This is rather nifty. The main problem with this feature is illustrated by the fact that I cannot reproduce the type signature of such a method right away, but fortunately type inference means that you often don't have to. -- StephanHouben


As a dissenting view, DoTheSimplestThingThatCouldPossiblyWork suggests that you should you usually branch instead of implementing any of the DesignPatterns suggested above.

I agree. This comment should be at the top.


(moved from PolymorphDontBranch?)

 if user.preference == "polymorph"
write_page("PolymorphDontBranch?")
 else
write_page("InheritDontBranch?")

Shit, I mean:

 class user 
 {
abstract void write_page()
 }

user->PrefPage()

PolymorphPreferrer extends user { void PrefPage { write_page("PolymorphDontBranch?") } }

InheritPreferrer extends user { void PrefPage { write_page("InheritDontBranch?") } }

Sure, we can move it here. But you still haven't explained how you decided to instantiate PolymorphPreferrer over InheritPreferrer. Was it a table? I would guess it was a branch

 if preference_from_ui_form == "polymorph"
currentUser = new PolymorphPreferer()
 else
currentUser = new InheritPreferer()

you can also do it with a FactoryMethod.

  public class User
  {
// User class is singleton because singleton is good
private static User instance;

static { instance=null; }

public static synchronized User getInstance() { if (instance==null) instance=new User(); return instance; }

private User() { }

public static final int POLYMORPH_DONT_BRANCH=0; public static final int INHERIT_DONT_BRANCH=1;

public User getUser(int type) { switch (type) { case User.POLYMORPH_DONT_BRANCH: return new PolymorphDontBranchUser?(); case User.INHERIT_DONT_BRANCH: return new InheritDontBranchUser?(); default: throw IllegalArgumentException(); } }

void write_page() { throw IllegalStateException(); }

}

class PolymorphDontBranchUser? extends User { void write_page() { write_page("PolymorphDontBranch?") } }

class InheritDontBranchUser? extends User { void write_page() { write_page("InheritDontBranch?") } }

Then we have just moved the branch, not eliminated it.

It's a mistake to assume that the client doesn't know what class he's using. You don't need some generic factory that hides the class from the person trying to use it. What's important is that the algorithm working on the types is unaware of the exact type, hence the polymorphic call. Therefore polymorphism usually removes the need for any branch at all. Task Smalltalk for example, ifTrue: is a polymorphic call that can apply to the True or False class, an algorithm that uses ifTrue: doesn't care which subclass is being called, nor did the creator of either the True of False require a generic factory or series of branches to create the appropriate boolean subclass. Why? Because the client can simply directly create the appropriate subclass, which is usually the case when using large families of subclasses. In essence, the subclasses are used directly by the clients to specialize a generic framework; no need for branching.

[Ah, I am in the middle of applying this. I can say that one method which uses a branch to create an object is vastly more comfortable than 30 methods scattered about which all switch on the the same flag variable. Especially when I have to add yet another case. To everything. -- JoeWeaver]

OnceAndOnlyOnce covers that.

In a "proper" OO languages, branching should be polymorphism:

 (user prefersPolymorph) ifTrue: [
wiki writePage: "PolymorphDontBranch?".
 ]
 ifFalse: [
wiki writePage: "InheritDontBranch?".
 ].


Hey - have a little more class!

 public abstract class NoBrancher? //like "no brainer"?
 {
public static NoBrancher? getInstance(Class theClass)
{
 if (NoBrancher?.class.isAssignableFrom(theClass))
 {
return theClass.newInstance(); 
 }
 else throw new IllegalArgumentException(
(theClass.getName()+ " doesn't extend NoBrancher?");
 }
}
 }

public class PolymorphDontBranch? extends NoBrancher? { ... }

public class InheritDontBranch? extends NoBrancher? { ... }


(end move from PolymorphDontBranch?)


There are too many topics on this subject in my opinion. Perhaps they should be consolidated. However, it is such a popular topic that there is tons of material on it such that there is not room enough in a single topic unless heavy cleaning is done.


See also: CaseStatementsConsideredHarmful, SwitchStatementsSmell, ReplaceConditionalWithPolymorphism, PolymorphismLimits


CategoryConditionalsAndDispatching


CategoryPolymorphism CategoryConditionalsAndDispatching


EditText of this page (last edited April 30, 2013) or FindPage with title or text search