Switch Statements Smell

Switch Statements (AKA "Case Statements") is a canonical CodeSmell (at least, in ObjectOriented code) described in RefactoringImprovingTheDesignOfExistingCode. The alleged problem with switch statements is that of DuplicatedCode. Typically, similar switch statements are scattered throughout a program. If you add or remove a clause in one switch, you often have to find and repair the others too.

I am not sure the above is an accurate summary of Martin Fowler's words, it ignores the issue of dependencies. The following is a direct quote from RefactoringImprovingTheDesignOfExistingCode, page 256.

Polymorphism gives you many advantages. The biggest gain occurs when the same set of conditions appears in many places in the program. If you want to add a new type, you have to find and update all of the conditionals. But with subclasses, you just create a new subclass and provide the appropriate methods. Clients of the class don't need to know about the subclasses, which reduces the dependencies in your system and makes it easier to update.

The SwitchStatementsSmell usually co-occurs with EnumeratedTypes. Since JavaLostEnumeratedTypes, Java programmers who are fond of switches have had to invent replacements for EnumeratedTypesInJava. But, as MartinFowler observes, ObjectOrientedProgramming avoids them: "Most times you see a switch statement, you should consider polymorphism (InternalPolymorphism). The issue is where the polymorphism should occur." Fowler's recommendations for refactoring include ReplaceTypeCodeWithClass and ReplaceTypeCodeWithStateStrategy.

This is not to say that SwitchStatementsAreEvil, just that they aren't ObjectOriented. Please note also that not all programs that use switches exhibit the CodeSmell.

I don't see that many cases of duplicated switch/case statements in practice. If I do, then it often means something else is going on. Even if they are the same at the start, they tend to drift apart over time. Further, polymorphism tends to trade sub-type list duplication for operation list duplication. Can you objectively say that one type of duplication is better than the other?

Something else is going on, more often than not. That's why it is a CodeSmell. When you get the SwitchStatementsSmell, it's an indication that your code isn't ObjectOriented.

It could also be a case of following the rule DoTheSimplestThingThatCouldPossiblyWork.

Counter-Points

Summary of counter-claims:

It exchanges one form of duplication for another, not eliminates it.
- The real "problem" is that textual code is generally one-dimensional in nature, but we are interweaving two dimensions of option combos. It's allegedly a limit inherent to textual code, regardless of paradigm used to write the code.
Case statements handle certain kinds of changes better (less code movement/refactoring), such as loss of mutual-exclusion of choice.
Using a database or TableOrientedProgramming is sometimes the appropriate "fix", not polymorphism. For example, store product classications are best handled in a database with many-to-many category tables, not case statements.

OO often simply trades one "duplicate list" for another duplicate list - a "list" of operations (methods). Why is this form of duplication seen as superior to duplicate case statements? It seems a double standard to me.

Plus, I find that duplicate case statements "degenerate" more easily than polymorphism. For example, loss of mutual exclusiveness among the choices.

Mutually-Exclusive:

  select on x {
    case 'A' {
      foo
    }
    case 'B' {
      bar
    }
  } // end case 
  // Note that modern non-BREAK-using syntax is assumed. C sucks there.

Reworked when no longer mutually exclusive:

    if something1 {
      foo
    }
    if something2 {
      bar
    }

Changing case statements often does not require moving code to different named units, but simply changing the conditionals. But in polymorphism you have to move code up or down the hierarchy and/or have to toss the hierarchy altogether because the change pattern is no longer polymorphism-shaped. Polymorphism is a strong statement about projected future change patterns. Case statements are a weaker statement, and thus can change into other code structures with less large-scale shifting or moving of code if the original assumption about change fails.

The problem isn't the lists, it is the requirement, in a non-OO environment, that they be duplicated in each entity that they dispatch for, forcing the same change to be repeated for each duplication and violating OnceAndOnlyOnce. The use of inheritance or delegation, in conjunction with this, reduces the effort needed to make incremental changes to the cases being dispatched on (such as methods). Typically, such changes can be accomplished without touching existing code - often not possible when using a switch statement - and further improves code reliability. The class hierarchy is a strong statement about the problem, and therefore is a wager about projected future change patterns. Because those statements are denoted explicitly in the class graph, the developer is able to accomplish such changes directly by changing the class graph (with minimal changes to the embedded code) - precisely because there are NOT switch statements and their equivalents embedded in the code.

Either approach violates OnceAndOnlyOnce in different ways. Polymorphism repeats the "operation block" (method declaration) multiple times, while case lists repeat the dispatching list. You can say that classes repeat operation dispatching in multiple places while case lists repeat subtype dispatching multiple places. This is because we basically have a 2 dimensional structure (operations and subtypes), essentially a "decision grid", but code is one-dimensional. Thus, we are forced to repeat one or the other because code is only one-dimensional. We are trying to project a 2D structure onto a 1D surface, and have to make some grouping sacrifices in the process. As far as which one is more resistant to change, it depends on which "dimension" the changes are in (see below), or even if the clean "decision grid" remains a valid metaphor for our application. I have an opinion about which changes are more likely, but don't want to get into a long debate about it here. I am just pointing out that there is no slam-dunk victor.

[Wait a second. A 2D-table paradigm is a third choice, that I for one personally think is sometimes handy, however the usual discussion is about how the OO approach is indeed a slam-dunk victor over the procedural switch version. In writing interpreters in C, for instance, I end up with tree representation of a program, and in multiple places must do a switch on the operator type (in one spot to do type propagation, in another to do optimization by partial evaluation, in another to do full evaluation, in another to do register assignments for compilation, in another to do debug printouts, etc). This is extremely error prone; it's easy to leave out cases. Whereas in an OO language I can make each operator a subclass, and get a compiler error if I leave out a subclass anywhere. Ideally, anyway; sometimes one needs general PredicateClasses to make this work smoothly. But the point is that it reduces errors. -- DougMerritt (a Top partial-sympathizer and OO fan simultaneously ;-) ]

See PublicationsExample for a table-driven alternative to subtype polymorphism.

It may depend on how often you add a new type. If you add a new operation that operates on the existing type list, then you can add that without touching the other modules and their case lists. Each situation is different.

[No, not at all, often it is the rare cases that are important. If you add a new type infrequently, is it acceptable for doing so to be bug prone? What if some hapless future maintenance programmer is the one who has to do the adding and then the debugging? And even for the original author who understands the code well, having an accidental bug in the debug-print code will make debugging harder, and if the bug is in evaluation of an infrequently-seen (and under-tested) operation/data type combination, then the bug can end up manifesting in the field - which in fact does happen all the time.]

[So no, it is not a matter of how frequently you have to add new types.]

I am not sure what you are saying. It is a trade-off. One is more change-friendly adding new subtypes and the other one is more change-friendly adding new operations. Each is a mirror image of the other with regard to change impact.

Either approach violates OnceAndOnlyOnce in different ways. Polymorphism repeats the "operation block" (method declaration) multiple times, while case lists repeat the dispatching list.

A polymorphic solution never repeats the "operation block". Each unique block is represented as source code once and only once. The common blocks are refactored to common methods. Each polymorphic method call repeats the "dispatching list", but that is generated by the compiler or interpreter. -- EricHodges

[Yep; this is an important point.]

[It's also worth pointing out that OnceAndOnlyOnce is a rule of thumb, not an absolute standard; I recently pointed out (on RedundantDeclaration perhaps?) that sometimes the right kind of redundancy is a good idea, if the point is that it lets the compiler find more errors for you. Compiler-detected errors versus programmer-at-runtime-detected errors is one example of something more important than OnceAndOnlyOnce.]

Compiler detection of errors may not apply as much with dynamic languages. I am approaching this from a code structure point of view, not a static typing versus dynamic typing point of view. I view it as a block-nesting trade-off with regard to code nesting.

Procedural:

Operation 1 ---Subtype A ---Subtype B ---Subtype C Operation 2 ---Subtype A ---Subtype B ---Subtype C Operation 3 ---Subtype A ---Subtype B ---Subtype C

OOP:

Subtype A ---Operation 1 ---Operation 2 ---Operation 3 Subtype B ---Operation 1 ---Operation 2 ---Operation 3 Subtype C ---Operation 1 ---Operation 2 ---Operation 3

Both block groupings have repetition.

-- AnonymousDonor

This is a situation where you have the CartesianProduct of the set of operations and the set of types. Ideally, you would find an implementation which avoids the Cartesian product in some way. Barring that, figure out which block grouping is cheaper.

See EmployeeTypes for addressing similar such patterns on a larger scale.

But how do you make sure that the procedural code executes the correct switch before it delegates to the subtype procedure? If there's a procedure for each operation on each subtype and a procedure that dispatches to them, how will future developers know to call the dispatch procedure instead of the subtype/operation procedure? And what if you want to merge two subtypes into one? OO makes all of this easier. -- EH

{Each sub-block could be just a case statement element, I would note.}

But then you end up with monstrously long procedures for any significant number of types. Which gets back to our argument about OO methods being shorter and all of that. If you're going to dispatch on type, you might as well use some form of OO and delegate those switch statements to the language.

If the "type list" is long enough, then perhaps it should be converted to data instead of case lists. As far as whether long procedures are a maintenance problem, that is another topic. Comments on the usefulness of "types" are below.

The data conforms to types. That doesn't negate the need to dispatch operations based on types. No use of data avoids this problem.

[Mind you, mostly I was digressing about OnceAndOnlyOnce. But back to the topic, I think it should be a strong concern whether one approach gives compiler checking or not. In at least some languages, the subclassing of an abstract class approach will in fact yield a compiler error if you forget to create some of the required methods, and that is the thing that I consider most important.]

Wouldn't that only work if you never want to inherit? If you start out not wanting to inherit and then change your mind, then you have two places to change. We would have to tally up all the kinds of errors one can make in such endeavors. Sometimes more type checking means more things you have to fill in or undo. In other words, the calculation is getting tougher here.

[I disagree; the compiler finding bugs for me is extremely valuable, well worth some extra filling in/undoing. What I hate is when I have to over and over again tell the compiler things that it does or should already know, but that's different.]

But you could forget to remove the Abstract declaration in Java, for example, per above. It just seems to move around the kind of things you have to remember to do or undo, rather than do your thinking for you.

[If on the other hand the only consideration is whether the problem is implemented via your top diagram or your bottom diagram, then the usual OO issues aren't being raised, and I could only say that it depends on the problem.]

[I believe it is rare for there to be no other issues, though. In this case, the two diagrams that you offer implicitly assume a 2D table as the true underlying model, and as I said some comments ago above, I have no problem with taking a TopMind approach when that is a good reflection of the problem domain. Note, however, that a table oriented approach is not exclusive with an OO approach, especially if generic functions are available (I lump the topic of generic functions in with OO, although certainly not everyone does).]

In practice such things rarely keep enough regularity to use direct code-in-tables. Multiple orthogonal factors often start to come into play such that the dispatching for operation A grows apart from operation B. This is one reason I say that case statements "degenerate better". Task-based grouping is the stronger grouping in most cases, and "flat" subtypes are a temporary illusion that often don't last beyond the first few application iterations. ThereAreNoTypes, at least for domains I am familiar with. -- top

[I'm a pragmatist, so I would say it most definitely depends on the problem domain. But since you're speaking very generally, there's nothing here that says OO is inappropriate; you can do task-based grouping in OO. It's just usually considered a weaker way to do OO because of CohesionAndCoupling issues - and those issues mean it's weaker to do it that way procedurally, too. Nonetheless, if the domain really called for it, you could have an Operation master class with subclasses Edit, Delete, Move, each of which had methods for dealing with red_trucks, green_cars, and blue_motorcycles. The fact that people often say this isn't the best way to factor things in OO doesn't mean that you have to go procedural in order to do it that way.]

The original context was "duplication" and not type-checking or CohesionAndCoupling (to which I'd like to see a realistic scenario applied). But in general, converting conditionals or case statements to polymorphism is not an open-and-shut case. Whether the result is "better" depends heavily on external conditions, particularly future change patterns and frequencies. Analysis of the code alone at a given point of time will not give us enough information to determine which approach is the best course of action with regard to maintenance. As far as your "operation-based OO" is concerned, this may depend on how one defines OO. Is procedural wrapped in classes still OO? For the sake of discussion, I feel we should focus on the grouping rather than whether we call it OO or not. I would note that cohesion and coupling issues also tend to depend on estimated future change patterns. SoftwareDevelopmentIsGambling.

If I say that some code is not ObjectOriented, I am not making a value judgment about that code. Nor am I claiming the superiority of ObjectOrientation vis-a-vis other programming techniques (though others may). However, it is important to know whether your code is ObjectOriented or not. Many of the patterns, tools and techniques discussed in the literature and on this Wiki assume ObjectOrientation. They may not work as well in a non-OO context.

Okay, how about we agree that duplicate case/switch statements deserve closer analysis to see if some other construct should not be used instead. Things to consider include:

How likely both case-lists are to stay duplicate in the longer run
Frequency of additions to the lists
Frequency of new operations (methods) using the existing list
Size of the lists
Number of duplicate lists
If the options will stay mutually exclusive
If other orthogonal factors will further divide some or all lists

Possible "fixes" to consider include polymorphism or the creation of a new entity table or ControlTable.

New Types Versus New Operations

Another thing to consider is adding new subtypes versus adding new operations. It is sometimes stated that polymorphism allows one to add new subtypes without changing existing code, but that a case list would require visiting every instance of the case list to add the new type as a new item in the case list.

The flip side is adding new operations. If using regular polymorphism, then one would have to visit each class to add the new method to each one (assuming no inheritance). But a new operation could be added as a new function without disturbing the other existing case lists in other functions/modules.

Thus, there is no FreeLunch. Which one is more convenient depends (at least) on the which kind of change is more probable.

Re: But trades that for duplicated operation blocks (methods). In the end it is not less code.

Untrue, only the declaration is the same, the body is different, it's called messaging.

Implementation is different in both cases. If the entire case list is identical, then it should be a single function.

I don't think you understand. The list of cases can show up in multiple contexts. The same cases occur on each, but can't be moved to a single function because the operation blocks are different. Polymorphism reduces the amount of code by forcing the compiler to write the switch statements and making it easier for the programmer to share common operation blocks via inheritance.

I am sorry, but I don't see how it is less code. May I request an example? Note that switch/case statements usually have an optional "otherwise" section that can act as a default to produce similar results as inheritance.

You don't see how no switch statements is less code than many switch statements?

    method foo(int thingType) {
        switch (thingType) {
            case TYPE_ONE:
                oneFoo();
                break;
            case TYPE_TWO:
                twoFoo();
                break;
            case TYPE_THREE:
                threeFoo();
                break;
            case TYPE_FOUR:
                fourFoo();
                break;
        }
    }

    method oneFoo() {
    }

    method twoFoo() {
    }

    method threeFoo() {
    }

    method fourFoo() {
    }

    method bar(int thingType) {
        switch (thingType) {
            case TYPE_ONE:
                oneBar();
                break;
            case TYPE_TWO:
                twoBar();
                break;
            case TYPE_THREE:
                threeBar();
                break;
            case TYPE_FOUR:
                fourBar();
                break;
        }
    }

    method oneBar() {
    }

    method twoBar() {
    }

    method threeBar() {
    }

    method fourBar() {
    }

versus

  01  class one {
  02      method foo() {
  03      }
  04      method bar() {
  05      }
  06  }
  07  class two {
  08      method foo() {
  09      }
  10      method bar() {
  11      }
  12  }
  13  class three {
  14      method foo() {
  15      }
  16      method bar() {
  17      }
  18  }
  19  class four {
  20      method foo() {
  21      }
  22      method bar() {
  23      }
  24  }

(I numbered lines for later comparisons)

In the pseudo-language above: where 'm' is the number of methods and 't' is the number of types:

switch loc = m(4+5t)

polymorphic loc = 2t(1+m)

You don't necessarily have to have a separate routine for each case option. A similar kind of comparison done at http://www.geocities.com/tablizer/inher.htm shows no significant code size difference. Also, the "break" statement is a C baddy, and not necessary in real languages. See IsBreakStatementArchaic

You don't have to have a separate routine for each case option, unless you care about code maintenance.

I see no need to wrap every decision block in a routine. If you can show evidence that it always improves code maintenance, please do. I see no reason to have separate routines unless the blocks get large or it would remove duplication (called from multiple spots).

They aren't "decision blocks", they are method implementations. The switch statements in the above example are already too large for my taste. I wouldn't fill each case with method implementations.

The polymorphic example has them so the switch example does, too. If you don't have "break" then you have no way to let one case fall through to another and the switch loc becomes m(4+4t).

Falling through is the "old style". See IsBreakStatementArchaic for alternatives.

m(4+4t) is still more than 2t(1+m) for any reasonable values of t and m.

Well, using my coding style, they are almost dead even.

Hmm, solving the inequality m(4+4t) > 2t(1+m) holding t constant yeilds (4+4t) > ((2t/m)+2t). This means that as m increases, the polymorphic implementation will yield an average number of lines approaching only 2t, while the case-statement implementation remains at a constant 4+4t. Thus, the limit as m approaches infinity is 4+4t > 2t, which is trivially proven by simple inspection. Polymorphism wins here.
If we hold m to be constant, we get 8m > 2+2m. Once again, as the number of methods increase, independently of the number of types, polymorphism results in a factor of four fewer lines of code as m approaches infinity. Once again, polymorphism wins, hands down. This is true for any value of m and t, not just reasonable ones.
These calculations are for the pseudocode used above. Different relations will exist for different languages. Forth's metrics are very, very different from the above, for example. But, it must be known that ForthLanguage coders have been using neither polymorphism nor case-like constructs since at least the mid-70s; instead, that which could be identifyably labeled as "methods" are typically written to have their own jump tables, using computed offsets based on a number of predicates. Thus, in a sense, each method implements a cartesian product-based dispatch. CommonLisp coders will choose to use CLOS and generic functions, which achieves a similar goal, but is run-time extensible.
I think the reason that top thinks his code shows parity is because representing code as a table fundamentally removes the 2D to 1D mapping that both case and polymorphism mechanisms aim to address. When expressed as a table, there is a cartesian product involved, and is expressed literally. --SamuelFalvo?
For one, I didn't agree to the original equation for reasons given. My tests show code-size between them to be pretty even. Second, indeed if there is a fairly likely chance of a CombinatorialExplosion of types and methods, a refactoring of some or all of it to tables (or other data structure) is indeed warrented. But I could't recommend a solution without seeing the actual scenario. DoubleDispatchExample may suggest some ideas. Most real-world things I encounter outside of device drivers don't fit such a nice pattern when they scale anyhow. The instances tend to share bits and peices rather than be all mutually-exclusive. Thus, set theory is in order to manage the feature assignements (FeatureBuffetModel). --top

Let me see if I understand this correctly. In my example, you would rather place all 8 of the method bodies directly inside case blocks than use polymorphism? How does this improve your code?

{It is at least not objectively worse. Remember the allegation is that it s a "smell". That smell has not been identified thus far.}

It is objectively worse. You have one method containing one switch statement that describes the details of a given method for a set of types. Why would you want to do that instead of place them in their own methods?

{That is not objectivity, that is quoting OO mantra.}

Why won't you answer my question? Why would you want to put the implementation of 4 different methods into one switch statement? What is the advantage?

{Moved reply below}

[You guys don't get it. Trying to convince the pro-case guys is like trying to persuade the vinyl purists that digital sounds better. It won't work, they don't care about your Nyquist rates and information theory, they "just know" that vinyl is better. If you think SwitchStatementsSmell, don't use them - write OO code and move on. If you think switch statements are the cat's meow, or if you code in a way that you see no difference, then keep coding just the way you have been. Move on. At the end of the day, one style will work better than the other. BillJoy, when asked at an OOPSLA panel (I think it was 88 or 89) about how we propose to "train" the "army of COBOL programmers who resist OO", said "We don't. Ignore them. They'll die." I think he was quoting Alexander Graham Bell, who said the same about people who were uncomfortable about talking into "machines".]

There is no objective evidence here that it is less lines of code. That is a fact. Whether we are Luddites or not is irrelevant to the topic, and frankly insulting and unprofessional. I would note that your coding style suggests that the way you think about code and subroutines does not mesh well with the procedural paradigm with regard to LOC.

Count the number of lines in the examples above. That is objective evidence. The way I think about code and subroutines meshes well with long term maintenance, regardless of paradigm.

One thing that helps with long-term maintenance is keeping simple things simple. Reworked with the stated suggestions and the removal of that damned break statement from our pseudo-language:

  01  function foo(int thingType) {
  02      switch (thingType) {
  03          case TYPE_ONE: {
  04          }
  05          case TYPE_TWO: {
  06          }
  07          case TYPE_THREE: {
  08          }
  09          case TYPE_FOUR: {
  10          }
  11      }
  12  }
  13  function bar(int thingType) {
  14      switch (thingType) {
  15          case TYPE_ONE: {
  16          }
  17          case TYPE_TWO: {
  18          }
  19          case TYPE_THREE: {
  20          }
  21          case TYPE_FOUR: {
  22          }
  23      }
  24  }

(Reworked to match style of prior example)

40 line switch statements don't count as simple.

This is not one of them. If you see one, then lets look at the code to see if there is not another approach, such as tables, that could be used. However, I have seen "well done" 40+ item case lists. Length alone does not necessarily make something messy, but then again what confuses different people varies.

Each of those switch statements would be 40 lines long if you had an average of 7 lines of code in each case block. I didn't say "items", I said "lines". As I said before, the polymorphic example gives us methods for each operation for each type. The switch example should as well.

We basically have 2 interweaving factors: operation and subtype. The biggest difference is which factor is the outer weave and which is the inner weave. They are pretty much mirror images of each other, so there is no overall advantage. Swapping which is outer and which is inner does not change the volume. A shirt turned inside-out still weighs the same.

They are mirror images except that the compiler generates the switch statement in the polymorphic examples.

No, it just looks different in code. The code size difference is not more than about 5% regardless of the size of the list.

(Moved method size discussion below)

Agreed? Perhaps you are talking about some other characteristic besides code volume.

Not agreed. Polymorphism requires less code than switch statements.

I adjusted both examples to more or less match styles, and the line count is identical here. Your allegation has not been demonstrated.

No, you removed the methods from the switch example.

They were not needed. They did not help OnceAndOnlyOnce and generally a violation of YagNi. Listen, this is going nowhere. We should AgreeToDisagree, and move on.

I suppose you're right. We don't need methods. Let's just use goto and ignore methods, polymorphism and switch statements all together.

Your sarcasm is exceeding your specificity.

Sorry, I think methods are needed. Lots of short, clear methods.

Well, I like right-sized clear functions.

And you're seriously claiming that a 40 line switch statement will fit in a "right-sized clear" function?

Can be, yes. Case statement blocks are a very simple code pattern.

"Goto" is a very simple code pattern. If you think a 40 line switch statement is acceptable then I can't convince you otherwise. If I see it I'll refactor it.

Goto is a technique, not a pattern. Goto statements lacked any (documented) pattern, and that is part of their downfall.

"Goto" is just as much a code pattern as case statement blocks.

Note that in practice I don't use very many case statements. Large lists generally go into tables, not code.

And you never dispatch on type? Then why are you defending switch statements here?

I never claimed that. I don't understand why you are asking that question. There seems to be some confusion here.

I asked because you said "large lists generally go into tables, not code". I assumed you meant you didn't dispatch on type, you just stored it in a table. Why did you say that? What do lists in tables have to do with dispatching on type?

The table comment was to address the situation of many case options. If you talk about "large case statements", it can mean either large implementation code (within each case block), and/or lots of case blocks (options). The table issue addresses the second interpretation.

We've only discussed 4 case blocks. The 40 line switch statement has only 4 case options (and an average of 7 lines per case block). Even if you have many case blocks, what does storing lists in tables have to do with dispatching on type?

PickTheRightToolForTheJob. Long lists often belong in a database, not in code. If you don't have a database, then put them in EssExpressions or XML at least.

{I have neither RAM nor spare CPU cycles for a DBMS, an XML parser, or even an EssExpressions parser on my Atmel ATmega328 embedded controller. Before you claim that this is a borderline or unusual case, note that there is almost certainly an order of magnitude more embedded controller code running out there than anything else. For the sake of argument let's assume a database, XML parser, or EssExpressions parser is not an option, shall we?}

It should go without saying that one's tool choice depends on the target environment. Certain run-time hardware environments certainly limit which tools can be used for software development. In extreme cases, one has to use assembler to squeeze out usage of every last precious byte or CPU cycle, preventing the use of higher-level tools/languages/code-patterns. Your case is somewhere in-between on the continuum. I'm not experienced with embedded apps to recommend general alternatives. A specific scenario/UseCase may be helpful here. Maybe OOP sub-classing is the next best solution for situations where one can't use tables or other attribute-driven approaches due to the hardware environment. I tend to approach it from first finding the best solution from a programmer time and effort standpoint, and then work backward from that when the production environment limits our tool and design choices. Better human-side abstractions often require more horse-power; that appears to be a general tradeoff in our biz. In short, ItDepends.

Note that one may be able to build a feature-mapping "table(s)" using in-language arrays/maps to "centralize" list management (for non-trivial lists), but in my experience, such "tables" are a PITA to manage/read/change/maintain as source code array assignments in the typical languages available these days. It's much easier to do with a TableBrowser or similar. -t

{How about instead of condescendingly assuming that your correspondents are not quite as brilliant as you are, and therefore (it would seem) need you to "recommend alternatives", try treating them as the fully-capable, intelligent and experienced programmers that they are, and recognise that they are more than able to discuss the alternatives with you as equals and weigh up the options for themselves. It would also be helpful for you to recognise that simple data-driven coding is now, oh, nigh unto fifty years old, so that even the most junior of programmers are well aware of it. If they're not suggesting using it, and haven't mentioned it, it's probably because they've already dismissed it as inapplicable or infeasible, and are now considering its alternatives. Not everything benefits from a bloody database, and I'm speaking as a database application developer, DBMS developer, and database proponent.}

I didn't mean to rank types of developers or domains/industries. I apologize if my text came off that way to you. No offense was intended, I assure you. Coding well under limited hardware/environment constraints is indeed a valid and important and worthy skill. I never meant to demean it. And I cannot read minds about assumptions on environments or tool requirements. It's best we explicitly address such to avoid miscommunication. That's what this wiki is about: sharing, comparing, clarifying, and classifying ideas. When is it best to use X instead of Y, etc. Often there are no simple answers: SoftwareEngineeringIsArtOfCompromise (trade-offs). There are multiple ways to skin a cat, and each way shines and sucks in different ways at different times. Different domains/industries/shop-styles press upon different constraints. I'm not putting a value judgement on such "external" constraints.

{Fair enough. Your claim that "I'm not experienced with embedded apps to recommend alternatives. [...] Maybe OOP sub-classing is a consolation prize for those who can't use tables or other attribute-driven approaches due to the hardware environment", seemed unusually condescending, particularly in its claim that "OOP subclassing is a consolation prize", and it implied that if you were experienced with embedded apps, of course you would recommend alternatives (presumably superior) to those already being considered.}

I'll consider a better way to word it...in progress...

Functions Versus Decision Blocks

Re: Why would you want to put the implementation of 4 different methods into one switch statement? What is the advantage?

How about first we look at the disadvantages. Remember, the title claims it is a "smell". Thus, the burden of evidence is on those who complain about case/IF blocks. I generally use YagNi principles on such. If a decision block is sufficient for something, then I don't make it a function block. In other words, if a less formal block does the job, then don't promote it to a more formal block. Generally there are two reasons to farm off code to a routine:

Fix OnceAndOnlyOnce issues
Divide up long code into smaller pieces for easier groking or testing

A code segment that is only a few lines does not qualify for the second, and an issue with the first has not been identified yet. If a specific block gets to be say 15 or more lines, then I might consider applying the second rule.

First, what is a "decision block"? Second, what's the advantage? Why would you rather use a switch statement with embedded implementation than polymorphism? Is it just so you can make the claim that they use the same number of lines of code?

Other reasons were given around here. For example, they "degenerate" better when hierarchies and mutually-exclusive categories fail to match real-world changes in dispatching.

You'll have to provide an example of that. I've never had to replace polymorphism with a switch statement to match real-world changes.

No, you move *away* from them, not into them. Generally they become IF statements. Anyhow, OO proponents will probably just throw yet more layers of polymorphism at the problem to keep the code OO. It is doable, but not pleasant.

So you mean switch statements degenerate into if statements better than polymorphism? Please provide an example. I have no idea what you're talking about.

An example is given near the top. More can also be found at: http://www.geocities.com/tablizer/bank.htm#fee. In general, "task-ness" is a more stable factor than sub-type-ness; therefore if one organizes their code on task-ness, there will be less impact from changes. It is often said, "group code by the most stable factors". That is what I am doing.

I don't see an example of switch statements degenerating into if statements better than polymorphism there. I see some naive OO code, though.

If the options become non-mutually-exclusive, then one does not have to move the existing implementation code out of the existing named unit. However, in polymorphism you generally will. Changing conditionals is a smaller change cost than moving code from one named unit to another in my book. As far as your friendly little "naive" comment, if you disagree with the OO examples shown, you are welcome to show the "right way".

We aren't talking about options, we're talking about types. If the associations between methods and types changes it's best to reflect that explicitly in the source code, not buried in if clauses.

Why? Can't have everything sticking out.

A better OO approach to your example: You begin with fee calculation embedded in the account type. Later you say an account can have multiple kinds of fees. The association between type and method has changed. Introduce a Fee class and give Accounts a collection of them. StrategyPattern.

Are you saying you would start out with a strategy pattern? That is non-YagNi if you ask me.

No. I would refactor to a StrategyPattern when the real-world changed.

I'd start out with a switch statement. Then, once it gets too big, I'd ExtractMethod on each of the decision blocks. Then, if I ever have another switch statement with the same (or nearly the same) type fields, I'd ExtractClass into a Account/Savings/Checkings hierarchy. Then, if we ever need to have on/off fields for the different fee calculations, I'd switch to a StrategyPattern. That's YagNi, and everything remains OnceAndOnlyOnce to boot. -- JonathanTang

I'd do almost exactly the same thing, though I tend to put the decision blocks into their own methods right from the start, makes the switch far easier to grok on subsequent reads. I just don't allow methods to ever get so big as to require scrolling, ever, there's simply no need to make things difficult. If you can't grok an entire method in a few seconds without scrolling, it's simply to big. -- RamonLeon

{Regarding "refactoring", one should reduce the impact of change, not hide it under the euphemism "refactor". If we sell out to mass code changes, then why are some complaining that case lists are not "change-friendly"? You cannot have it both ways.}

We refactor to improve the design of existing code. That reduces the impact of change. Don't fear mass code changes. Make them in tiny steps and test between each step.

I find this all a contradiction. You have to do "mass code changes" to prevent mass code changes?

You'll never prevent mass code changes. You do small, easily repeatable code changes that improve design without changing behavior (called "refactorings") to make the mass code changes easier.

I believe in reducing the amount of code that has to be changed. Many shops don't like refactoring alone because it does not add new features. It might be job security from our standpoint, but owners often don't want to pay for reshuffling, possibly due to FutureDiscounting. Anyhow, if large changes don't bother you, then why not start out with case statements and only change it to polymorphism if it is a problem?

I won't work at shops that don't like refactoring. I often do start with switch statements. They smell so I refactor them.

You are lucky to have a specific choice like that these days. Anyhow, I would like to inspect such a "bad" case list. I rarely see any that go bad in the way OO aficionados describe. Thus, I wish to photograph this Bigfoot before it becomes just legend, like all the others before it. I suspect that it is simply poor use of the procedural paradigm rather than a victory for poly, but I could be wrong. -- top

Yes, I am lucky. It isn't one switch that goes bad, it's multiple switches distributed throughout the code and randomly ignored. Polymorphism gives every operation dispatched on type a switch statement enforced by the compiler without any effort on my part.

Well, I rarely see that kind. If you have to keep adding the same element to multiple lists, most likely the "list" should be a database table. At least that is how I tend to remedy such. Sure, there are a few cases in smaller lists where polymorphism may have reduced the total changes, but it is hard to know in advance whether future changes will favor polymorphism (new subtypes) or case lists (new operations on existing "types") with regard to code change effort expended. My experience indicates that sub-types tend to fall apart over time, as described/linked elsewhere here. I am not going to shape my code for the 20% of time time where it holds up. That is bad lottery pick. If your experience is different, then lets just AgreeToDisagree on that point. Further, I prefer dynamic/interpreted languages, and thus don't base that many code decisions on what the compiler will flag.

It's a list of types we need to dispatch on. Putting it in a database table won't tell the CPU which method to execute. Polymorphism works for new subtypes and new operations on existing types, so there's no need to predict.

[ControlTable says it all. Basically, the idea is to put a method name in the table along with the criteria to dispatch on. Type flags become values in the database. The program selects for the appropriate method (which actually gives a fair bit more power than traditional OO; you can select based on general predicates, on multiple values - it's similar to MultiMethods), and then calls eval() or apply() on it.

Top doesn't seem to care that if a hacker got access to the database, they could wipe out the contents of the server and anything it has write privileges to. ;) -- jt]

Is that what Top is advocating here?

[As far as I can tell, yep. He's always advocated putting code (or references to code) in tables and then using an eval()-like facility to run it. It's a nifty idea, and I love the idea of DBs like Mnesia where you can stick a FirstClassFunction in the database, but current "dynamic" languages and relational DBs don't provide nearly enough security or error checking to make this workable. -- jt]

I got the impression he was on the fence about putting code into databases. Is any of this possible with statically linked languages without writing the equivalent of a switch statement containing all of the method names that might end up in the database?

{Like I said before, in practice the association between table rows and behavior is rarely one-to-one over the long run. OO dogma artificially couples the perceived relationship between these in my opinion. I added more under ControlTable about this. As far as security, the above seems to be saying that statically-typed languages are more secure than dynamic ones. That should probably be a topic on its own so that I can let SmallTalk and Python fans battle over it instead of me. I see no reason why a compiled DLL-like executable could not be put in a table cell, except maybe that current static-type systems are file-centric WRT to DLL dispatching. But files systems are just a hierarchical database, which is an arbitrary choice as far as I am concerned. -- top}

I guess that's a "no"?

{The answer is that I don't know what is *possible* in static-land. I don't give it much thought because I don't like static languages. Are you trying to get a physical address into the table at compile-time? I don't know what will satisfy your question/challenge. Selecting a row in a table does not require an case statement, I would note. Thus we could grab a DLL from a row and execute that all without a case statement. -- top}

My experience is that the longer the "list", the less likely behavior maps one-to-one with the list. Whether this pattern holds in your domain or not, I don't know. And polymorphism is *more* change points when adding a new operation because you have to visit potentially each and every class to add a new method, as already described around here.

If you don't dispatch on type stored in these longer lists, why bring them into the discussion? We're talking about switch statements versus polymorphism when dispatching on type. Aren't we?

Most things *can* be modeled as types. Whether they "should" is another matter. It is one of those TuringTarpit things.

You're missing the point. Putting a list in a database has nothing to do with this discussion. It doesn't affect the need to dispatch on type. This isn't a "TuringTarpit thing". Multiple statements that switch on the same data are hand coded polymorphism. Instead of letting the compiler and/or runtime pick the right implementation you're writing the decision logic yourself.

You were talking about long, identical case lists. I don't see those often in practice except in cases where a table should have been used instead. Without looking at actual examples of what you see that makes you dislike case lists, I cannot comment further than I already have about such lists. Further, polymorphism does not get rid of decision logic, it just makes it look different. That is why it does not reduce code. There is no free lunch here.

What does a table have to do with it? How would a table be involved in method dispatching? Please answer this question.

I doubt we could settle this without looking at an actual production example that fits your description.

Just tell me how putting a list in a table has any impact on dispatching on type.

I provided an example. It showed two different places in the code where we switched on the same type information, and how that could be replaced with polymorphism and eliminate the switch statements.

No, I mean real code.

I didn't say polymorphism got rid of decision logic. I said it made the compiler/runtime generate that logic for you. That reduces code. See my example.

Which one? Your first? It only reduced code if you spin off each case option into a separate routine, which is generally anti-YagNi IMO. We have been over this already.

The first and only one I put on this page. YAGNI is about features. Keeping methods in their own method bodies is not a feature. Keeping methods small is not a feature.

I disagree. YagNi can apply to lots of things. A routine in this case is extra indirection. One should not add extra indirection without a reason. Some have suggested that such makes the code easier for them to read (for a reason that escapes my psychology), but my judgment it to not automatically do it except for the very long segments.

Extra indirection? What does that mean? It sounds like you're prematurely optimizing your code. Methods make the code easier to read because they have names. Those names summarize the intent of the code inside.

[This may be a big part of one of the misunderstandings. Any mindset that makes one think that creating a separate routine might be bad, is going to hinder communication on this topic. Creating a new routine should be viewed as being no more of a big deal than creating a new line of code.]

It is a personal choice. It does not make the code objectively better. I respect that personal choice, but it should not be a PersonalChoiceElevatedToMoralImperative.

[It's only a personal choice in the sense that you can choose to be more productive or less productive. Short methods reduce the cognitive load when viewing a piece of code (as you have to remember less context), they give more opportunities for MeaningfulNames, they produce more opportunities for reuse, and they often let the optimizer do a better job generating code from source. Try them; it's disconcerting at first and often "feels wrong", but once you get used to them you can go much faster. -- jt]

Who said anything about morality? Method names give you a chance to convey your intent to the next poor slob. What possible reason could we have for avoiding them?

Use a comment. I am not going to flood the function/method name-space just to make comments. Besides, they are usually easier to read than CamelCase anyhow.

Comments lie. The method name space is infinitely large. If method names collide that's a good hint that one of them is superfluous. Now please explain how putting lists in databases impacts dispatching on type.

Developers who put bad comments will likely also put bad routine/method names.

Comments lie because no one has to read them. When they change the code later they forget to change the comment. The same can happen for method or variable names, but everyone who reads the code has to read those names. It's hard to use a method with a lying name.

{I find your reasoning here a bit obscure. I have seen programmers have stupid naming schemes all the time. The compiler/interpreter/IDE did not reach out and slap them.}

The IDE won't slap them, but when the other members of the team have to use the lying method name their common sense slaps them. A method that calls itself "removeFoo" isn't going to keep that name very long if it doesn't remove foo, or if it does other things as well. A comment saying the same thing can live for decades without being fixed. The canonical example of lying comments are lines like this:

    a = a + 5; // increment a by 3

{That example is misrepresentative because we are talking about comments versus names, not particular run-time behavior. Decent comments don't state the obvious anyhow. This discussion is growing too pedantic for even me. I have seen no evidence that developers are more or less likely to fix comments than method/function names. Sometimes I feel more comfortable fixing comments than code because I am not altering anything that executes. Here is something more illustrative:}

  method incrementByThree {
    a = a + 5;
  }

{For more on this, look at TreatCommentsWithSuspicion.}

That example is representative. To use a name instead of a comment, refactor the comment to this:

    a = incrementByThree(a);

I have seen ample evidence that programmers will fix bad method names. They have to or they get lost because they read the method names to figure out what the compiler will do with the code. They tend to ignore comments because the compiler ignores them.

{The compiler ignores names also. It does not understand them. All it cares is that references to it match. Whatever. If you find that naming is given more attention than comments in your shop, then go for it. Like minds tend to hire each other.}

I've never seen a compiler that ignored method names. I'm not saying the compiler "understands" them. I'm saying programmers read them and are more likely to notice when method names lie than when comments lie.

{Well, I don't notice any difference. Let's AgreeToDisagree.}

Further, sometimes the case value is self-describing.

The case value describes the type, not the behavior being dispatched on that type.

{Sometimes they correlate. I did not say "always".}

Why would the type ever correlate with the behavior? I suppose one might list a list, but that correlation is an accident of English. And it only applies to one of the types.

{You seem to see more order in the universe than I believe is really there. More on this below.}

Even if there is not a name collision, it increases the IDE search list lengths when visually looking for something.

Luckily we always have a computer to help us find things in the IDE. Luckily we have a good way to group the methods into related sets (called classes).

{If you have to visually search 30 methods per class instead of 10, it will slow things down. For example, you might have to click on Search and enter a substring whereas before you could simply eyeball it. I would also note that modules group related functions, just by a different factor than classes. Functions are usually not just floating in a big sea, you know.}

If you have 30 methods per class, consider refactoring your classes. Sometimes you need 30 methods per class. In those cases click search. It's easier than eye balling the same code expressed as 3 methods.

[Or use the IDE's class browser or outliner. 30 methods fit in one screen quite easily in outline form. -- jt]

As far as a table example, again I would probably have to see your production code that has the problem you described.

Just tell me how putting a list in a table impacts dispatching on type. No matter where the data lives, there has to be code someplace that tells the CPU which set of instructions to execute for a given type. If you put the methods in the database and loading them with a query, then I understand. If you put the type information in the database you still need logic somewhere to call the right method.

It depends on the situation and how closely the behavior matches up with the "type".

I don't understand. Explain the situation where it works. I've got a list of types in the database. Let's say its a list of types of customers. I want to create a bill for each type. How does the list of types of customers in the database tell the CPU which code to execute to create a bill?

"Types" of customers? You should know by now that such is usually useless in practice in my opinion. In practice one should give customers "features", not sub-types. Sub-types is a nearly useless lie. Even many OO proponents agree with me WRT hierarchical taxonomies. ThereAreNoTypes. Heavy use of hierarchies and sub-types are for naive OO newbies. -- top

Yes, types of customers. Yes, I know you don't like types. I know that you know that other people do. Those people often dispatch on type.

[One powerful version of ExternalPolymorphism revolves around associating values with lambda functions in a hash table, and doing the dispatch (instead of switch) via (pseudo-code) callfunc(lookup(switch_value), parameter). The setup might be associate(switch_value, new_func( return parameter * 3))...simultaneously creating a lambda func and putting it in a hash table where it can be found via switch_value. This is very elegant, as long as you don't already think that lambdas and hashes are evil. -- dm]

See also ControlTable.

Long Methods

If you cram the method bodies into the switch blocks, which I don't.

Your code style differs from mine. That is not a "smell"

That is a "smell". See LongMethodSmell and the associated pages. -- jt

That appears to be talking about OOP code, not functions. You seem to be saying that a 10 line function is "too large". That is a rather extreme viewpoint in my opinion. Besides, length alone does not make "bad code". If it follows a clean, easy to read pattern and does not violate OnceAndOnlyOnce, then I usually have no complaints.

All long methods are bad, not just OOP ones. The main problem, besides readability, it that long methods almost always violate OnceAndOnlyOnce, no matter how clean and readable the pattern is. If you stick to OnceAndOnlyOnce, it's almost impossible to actually have a long method. -- RamonLeon

It applies to both (see TallerThanMe, and remember that good procedural programming is a precondition to good OO programming). A 10 line function is fine, but the 10 line function you gave above doesn't do anything. A 40 line function is pushing it, and that's the length you [??? was it the other guy? the lack of signatures makes it hard to tell] gave for a 4-case, 7 lines-per-case switch statement. Any more than that and it's time to ExtractMethod immediately.

The problem is context. With a 10-line function, you can see the declaration of every variable used in a glance. With a 40-line function, you can see everything in one screen, though you'll waste time reading through code instead of glancing through it. With a 100-line function (that's about a 10-case statement with 8 lines per case), you need to scroll up or down a few pages to find what you need. That takes time, and the time wasted adds up fast.

Hopping around from method-to-method or function-to-function can take time also. At least one knows where to scroll (up) to see any declarations, unlike the distribution of named units (methods/functions). Note that the decision to put the body into separate functions is not an all-or-nothing choice. One can keep the shorter ones local, and spin off the longer ones.

It does, but good IDE significantly minimizes the pain. Almost every mainstream Java and C++ editor features a tree view of classes and methods, and most let you instantly jump to a symbol definition. Emacs has had outline view and speed bar for ages. Even PHPEdit has a code browser. Most of these have CodeCompletion to view parameters and documentation too. See ReadingRavioli.

A "good IDE" can be used to "solve" lots of things. What keeps an IDE from being able to keep declarations in view even when you scroll, or collapse long blocks with [+]/[-] switches, not unlike an XML browser? Anyhow, pressing the UP button is quicker than reading and clicking a list/tree of names anyhow, at least for me.

And you shouldn't have to poke around the internals of a function. MeaningfulNames and good documentation often means you can just use a function and not care how it's implemented. That's the whole point of having functions in the first place! (I thought there was a TrustAbstractions? page around here somewhere, but I can't find it...)

You mean AllAbstractionsLie? :-) There's UsingGoodNamingToDetectBadCode...

I am not sure how this relates to the discussion. As programmers, it's often our job to work with actual implementations. Bloating up the function/method name space just makes for more screen clutter to distract and slow one down one's eyes. I don't have FastEyes; I cannot speed-read long lists. The ideal average function size is roughly around 20 to 35 lines in my experience (roughly a screen-size in a medium font). Too extreme either way slows me down, and many other developers approximately concur. You are an outlier in my experience.

To Eric et. comp: Have you ever considered that PatternMatching and subtyping are dual? PatternMatching is highly preferable in compilers and other algorithmic exercises, subtyping and inheritance in AWT, JFC and other windowing gadgets where pattern matching will fail. But you cannot decree that one is always better than the other.

Take my GridBagLayout example where there are two types of dimension constraints, either a fixed size in pixels or a proportional factor according to which free space is distributed. If I was writing in ML large portions of my code would look like this:

  match constraints.(i) with
  | Fixed ( size ) -> 
                      (* perform the computation for fixed size *)
  | Quota ( factor ) ->
                      (* perform the computation with factor *)

But in Java I don't have this benefit, so I wrote something like:

  if (xxxIsQuota[i]) { 
     // do something with
     xxxQuota[i] 
  }
  else {
     // do something with
     xxxFixed[i]
  }

This is by OO false standards a SwitchStatementSmell?. If Eric's theory was correct he would be able to replace what is known in theory as a SUM type (either Type1 or Type2) with a hierarchy of classes: there'll be the base type DimensionConstraint? and the two derived FixedSizeConstraint? and QuotaConstraint?. Once Eric or anybody else tries to force that "OO" scheme into the algorithm, I'm very curious what responsibilities will be attached to those types and how awfully complicated the resulting code will be.

''Won't the responsibilities just be "perform the computation for fixed size" and "perform the computation with factor"? I must be missing something. -- EH''

Why don't you do just that and let's see the results? Of course, following this method we can automatically eradicate all if and switch statements out of existence. But the resulting code is not exactly nice.

From what you've shown so far, I'd do this:

result = constraint.performComputation();

But that looks too nice. Show me the Java code and I'll try.

Give it a try: GridBagLayout. Also you may want to apply your theory consistently and replace all the enumerated types and their corresponding if/else switches: HorizontalAlignment? (HA_LEFT, HA_CENTER, HA_RIGHT), VerticalAlignment? (VA_TOP,VA_CENTER, VA_BOTTOM), and Fill ( FILL, NO_FILL). I'm still waiting.

See PolymorphicGridLayoutEx. I replaced the isQuota logic with polymorphism and I still don't understand how pattern matching and sub-typing are "dual", or how pattern matching would help in this case.

The sensible conclusion is that both PatternMatching and subtyping are dual (as in category theory dual) constructs and they are useful in different circumstances. And by the way, such civilized gentlemen like us should have given up the unrefined theory of OO supremacy long time ago and accept that both functional programming and OO programming are different but equal in stature and usefulness. Once you accept it, hunting Top for his love of switch statements becomes an exercise in futility, as countless grandmaster functional programmers use PatternMatching day in and day out and they can guarantee you that it is a seful construct that cannot be replaced with subtypes conveniently. Going against switches and if/then/elses (a restricted form of pattern matching) is a lost fight. If you want to have TopMind concede a point better find something smarter.

And the problem with sucking up wisdom from the likes of MartinFowler and others is that they never programmed seriously in ML, Haskell, or anything like that, so their overly zealous pro-OO statements risk being altered by their not understanding the whole picture. Use it at your own risk, but then do not complain when you're way off the mark.

And by the way, if TopMind considers that the reverse is true and subtyping should always be replaced by pattern matching, let him rewrite or sketch a design of JavaAwt or JavaSwing or JavaSwt or GTK or any serious UserInterface library using switches.

Switches? More likely table lookups. (http://www.geocities.com/tablizer/guitable.htm )

-- CostinCozianu

Wait, how did we get from switch statements to pattern matching? I don't doubt that pattern matching is useful, and I wish that more mainstream languages had it. But I'd much rather write

  factorial 0       = 1
  factorial (n + 1) = (n + 1) * factorial n
  reverse []     = []
  reverse [x:xs] = reverse xs ++ x
  fmap fun (Branch a b) = Branch fmap a fmap b
  fmap fun (Leaf a)     = Leaf fmap a

for example, than try to figure out the corresponding switch statements. I'd much rather program in pattern clauses than subtypes, and I'd much rather program in subtypes than switch statements. The argument is not that switch statements are useless, but that they're a code smell and an indication that a better alternative usually exists.

-- JonathanTang

  factorial n = case n of
                | 0 -> 1
                | _ -> n * factorial $ n-1

  reverse aList = case aList of
               | [] -> []
               | [x:xs] -> reverse xs ++ x

  fmap fun aNode = case aNode of
                   | (Branch a b) -> Branch fmap a fmap b
                   | (Leaf a) -> Leaf fmap a

Hate to tell you this, but pattern match in Haskell is case evaluation. Read http://citeseer.ist.psu.edu/peytonjones92implementing.html which describes how Haskell's runtime environment and compiler cooperate. --SamuelFalvo?

{The pattern-matching and decomposition aspect of Haskell's "case evaluation", in addition to support for guards, gives it some very different computational properties than has the traditional enumeration-based case statements from C/C++. You can only match it with polymorphism if you can do polymorphism on arbitrary predicates... a feature I've rarely seen outside of logic languages like Prolog. The argument for 'smell' on lesser switch statements does not generalize effectively to this more advanced pattern-matching. In fact, polymorphism itself is a form of pattern-matching, adding to it only implicit heuristics for dispatch when more than one choice of polymorphic dispatch matches. If Haskell allowed functions-definitions to be distributed throughout the code, such that you could define 'reverse []' in one place, 'reverse (x,y)' someplace else (e.g. another file), and 'reverse [x:xs]' someplace else, it would simultaneously provide the advantages of polymorphism and pattern-matching 'switch' statements.}

Switch statement is a particular form of PatternMatching. And in languages without PatternMatching, switch statement is your pattern matching. -- Costin
Not really. The key thing about PatternMatching is the DestructuringBind? aspect of it; matching on constant values is a useful adjunct to that, but is hardly the same thing. You cannot use switch to implement 4 out of the 6 examples Jonathan shows above, and in the other 2, the difference in syntactic sugar is pretty sweet. :-) -- DougMerritt
BTW for my tastes PatternMatching is one of the nicer aspects of ML-family languages, since I haven't found ReferentialTransparency to be that useful in real world programming. Yet PatternMatching isn't that different than GenericFunctions, so it could be viewed as being a natural part of some hypothetical OO language, rather than unique to functional languages. -- dm

I'm not arguing against conditional logic or switch statements, I'm arguing in favor of replacing switch statements with polymorphism where possible. -- EricHodges

Let's see if you also argue for replacing polymorphism with switch statements "where possible".

I don't. SwitchStatementsSmell. -- EricHodges

{So ReplaceConditionalWithPolymorphism is absolute for you? If not, then where do you draw the line?}

No, ReplaceConditionalWithPolymorphism is not absolute. Switch statements are a smell, but just a smell. That smell can indicate the need for polymorphism. When I see multiple switch statements with the same case values I look for a way to refactor them to one occurrence. That's often a factory. - EH

Is it only if they are duplicate, or if they are duplicate and one needs to add new "subtypes" often? What if they are duplicate, but one needs to add more operations that use the existing list more often than they need to add subtypes? -- top

If one needs to add new operations that could be dispatched on the same set of types, then yes, one should definitely consider replacing switch statements with polymorphism. Then you just add the operations you want and the language handles dispatching for you. It is the duplication that matters, not the need to add new subtypes, because the duplication is provided by the language when you use polymorphism. That reduces programmer oversight. -- EH

{If you look at actual keystroke counts and code typed, it is not less duplication nor effort. It seems to simply swap one kind of duplication for another, as described above.}

It probably requires the same number of keystrokes if you leave all of the different operations inside the switch statement, as discussed above. It requires less effort because you don't have to write or test the dispatching code itself. -- EH

{Unless you never pull the plug on the computer, objects have to be assigned somehow, which is where the "dispatching" takes place. It ain't no free lunch.}

But the code to determine the object's type is written once, not once for each operation. And even if you pull the plug the type may persist. -- EH

Would it change your mind if the "dispatching" were made more formal? For example:

  function foo(emp) {
    on emp type {
      case cubicleGrunt {...}
      case manager {...}
      case salesperson {...}
      ...etc...
    }
  }

(In practice, EmployeeTypes is an unrealstic example because many traits are orthogonal, but this is just an example to explore a type-centric viewpoint.)

-- top

I don't see how the dispatching has been made more "formal" in that example. Neither do I see how that could change my mind. You still have to duplicate the switch statement for each operation. I much prefer "emp.foo()". -- EH

{Much? I don't see much caller difference between emp.foo() and foo(emp).}

"You still have to duplicate the switch statement for each operation." -- EH

{Anyhow this again gets back to trading duplicated "sub-type lists" for duplicate "method lists".}

No. There's one set of types. Each type has a set of methods. The difference is that I don't have to write the switch statement for each method. The compiler does that for me. -- EH

{You seem to be elevating the status of one type of duplication over the other for reasons that escape me. I have explained it many times already. You have to repeat the same method block over and over and I don't.}

No. I never repeat the same method block. Only unique method blocks are written. If method blocks are the same they are refactored to a common method. -- EH

{Again, we are not talking about the contents/implementation of the "blocks", but the mere duplication of the block "shells". Dup methods are just as evil as dup subtype lists. Dup is dup and blocks are blocks and spades are spades.}

You mean method declarations? You need those anyway unless you place all of your implementation code inside the same switch statement. That's not an acceptable alternative for anything but the smallest application. -- EH

{This leads back to the function-length debate above.}

Re: "let him rewrite or sketch a design of Java AWT": I generally prefer declarative UI frameworks, which are easier to make language-neutral. Thus, your request is like asking W to write a guide on getting along with the U.N. Further, I don't claim that case/switch lists are "always superior". Again, the claim being addressed is that case/switch lists "smell". -- top

Not to distract from Eric's point just above about dispatching, but do you have some pointers to declarative UI frameworks? I'm interested in learning more about the subject. -- dm {See near bottom of NonOopGuiMethodologies}

They do smell, when polymorphism is an available option. -- RamonLeon

It appears that polymorphism does NOT:

make code change-friendly (at least not in my domain)
reduce code size
fix OnceAndOnlyOnce

I need a reason other than following dogma. -- top

[Only to you top, only to you.]

I guess I am delusional, then; me and AlexanderStepanov that is. I don't see any objective evidence for poly. I see 24 lines of code and you see 40. -- top

(As a complete aside, I'd definitely call Stepanov delusional. He took what everyone simply calls "interfaces" or "type classes" or "header files" or simply "good factoring" and slapped the GenericProgramming label on it. Poof, suddenly it's something new and important. -- jt ;)

The 24 lines don't do anything. It takes 12 lines just to tell the compiler which block to execute for each type. If those blocks average 7 lines of code then the switch statements for 2 methods and 4 types are 80 lines long.

m(ethods) = 2

t(ypes) = 4

l(ines per method, average) = 7

s(witch lines) = 12

m * (s + (t * l)) = 80

I am assuming the implementation (stuff inside the blocks) is generally the same for each paradigm. Thus, there would still not be a difference between them. (In similar debates, there were one or two lines difference, but that is peanuts.) Plus, I am not sure 7 is a good average.

Seven is my average. It sounds like your may well be larger. It doesn't matter that the lines per method are the same for each paradigm. In your approach they end up in a 40 line switch statement. In my approach (and my examples for both paradigms) they end up in their own 7 line methods. I'm not going to write 40 line switch statements without any perceivable benefit.

Oh, so you are only complaining about "long routines". As far as I am concerned, blocks are blocks. Think of individual case blocks as "micro-functions" if it makes you feel more comfortable. Sticking the word "method" or "function" in front of it does not make much difference in most cases. Regardless, "long routines" are not inherently evil, only poorly written ones. One could complain that you have "long classes". Why are classes allowed to be long/large but not routines?

Long routines are inherently smelly. Long classes are smelly, too.

As a blanket statement, I disagree WRT routines. Perhaps we should take this to another (existing?) topic.

We can, but what's the point? I don't like maintaining long routines or long classes. Nothing you say will change that opinion.

[How about "cheeseburger".]

Summary of Arguments

Duplication (OnceAndOnlyOnce)

Pro-poly: Avoids duplicating list of subtypes
- Pro-case: But trades that for duplicated operation blocks (methods). In the end it is not less code.
  - Pro-poly: Depends on one's coding style
    - Pro-case: What doesn't?

Adding New Items

Pro-poly: I don't have to visit/change each case statement if I add a new subtype
Pro-case: But I don't have to visit each class to add a new operation
- Pro-poly: I don't either if I can inherit from parent, which means only the special cases need to be visited.
  - Pro-case: It is not a free lunch. See http://www.geocities.com/tablizer/inher.htm.

Type-Checking

Pro-poly: Compiler warns me if I forget a method
- Pro-case: Only if you don't want the option of inheriting
- Pro-dynamic: Fans of dynamic or loose typing are not interested in this argument

Degeneration away from "subtype" model

Pro-case: Case statements can turn into IF statements without shuffling implementation code out of original named units (methods/functions).
- Pro-poly: Doesn't happen that often
  - Issue taken up in ThereAreNoTypes
Pro-Poly: If they change, refactor the code
- Pro-Case: If refactoring is allowed/accepted to "fix" stuff, then why not start with case lists?
  - Pro-Poly: ? [in progress]

Long Named Units

Pro-poly: Case-lists result in excessively long functions
- Pro-case: If they are too long, say 40+ items, then try a table-driven approach
  - Pro-poly: But even 10 is too long for me
    - Pro-case: Well, that is perhaps a personal preference. They usually don't bother me.
Pro-poly: Splitting them into named units allows you to give them a name
- Pro-case: Use comments, that way you don't flood name-space
  - Pro-poly: I don't trust comments
    - Pro-case: Bad method naming is just as possible
      - [back and forth].....

SwitchStatementsSmell is as much a LanguageSmell as a CodeSmell. If you only have single dispatch, you have to use SwitchStatements to emulate DoubleDispatch (aka MultipleDispatch, GenericFunctions, MultiMethods).

There is some truth in the idea of a LanguageSmell, but you can get the SwitchStatementsSmell even with a MultipleDispatch language. Besides, you can do DoubleDispatch without using switch statements.

You can even get this smell in languages that don't support a SwitchStatement by using ElseIfIsSelectCase. See ElseConsideredSmelly and HugeCaseStatements. There's a nice video from googletechtalks on youtube covering this at

Switch Statements and Class Factories

Switch statements, if then else sequences, and the degenerate case of a single if then else, are necessary operations because some decisions need to be based upon run time data and cannot be made at design time. The accepted pattern is to push these decisions into class factories rather than embedded them within operations, but it does not eliminate the need.

The area where I see a "smell" with switch statements, etc., is when decisions that should have been made at design time are made at run time. These are often based on "mode" parameters that may be passed down through several levels of method calls.

[NOTE: The original statement explicitly discussed mode parameters. All of the following discussion ignored this and proceeded to discuss something else.]

Would the use of a dynamic OOP language like Python or SmallTalk change your view on this?

How might a specific language make mode parameters any more palatable?

{Perhaps I need more clarification on "design time" decisions versus "run-time".}

As far as "passing parameters through several levels of method calls", quite a few OO proponents seem to have this problem with their procedural designs. Deep passing of the same information is usually a design smell that can be remedied without leaving the procedural paradigm (at least in some languages). -- top

(Note that Python and SmallTalk tend to use even fewer switch statements than Java/C++).

[Python doesn't even have a switch statement - the cases where you'd use one in Java/C++ are replaced with nested if/else (rarely, unless there's only a couple cases), with dictionary (table) based dispatch, or with polymorphism]

The procedural solution for remedying deep parameter passing is usually another big code smell - global variables. Or a database, which is just a big global variable store.

There is something in-between: regional variables.

[What are those? File-scope in eg. C? Those help, but can only store one entity shared by all function invocations. If you have multiple entities, you either need to pass in pointers to a data structure (essentially doing OOP in a procedural language), or pass in a unique identifier that can be looked up in a table (eg. file descriptors).

Granted, OOP does this too, but usually hides it behind the object.method syntax. It's really easy to forget that there's a hidden "this" pointer being passed along.]

Similar topic taken up near bottom of ProceduralMethodologies.

I should probably note that this difficulty is inherent in the problem, and not something unique to procedural programming. The history of programming languages is basically about trying to find ways to balance implicit data passing (global variables & DynamicScoping) and explicit data passing (function parameters and LexicalScoping). OOP pulls the shared state into an instance variable, which is like a mini-global with restricted scope. LexicalClosures are a little better, in that they define functions and state in the same place, but they can lead to really deep nesting.

Even pure functional languages, where all state is supposed to be explicit, try to deal with this problem. Haskell has combinators and monads, and there's a proposal to allow implicit dynamically-scoped variables. Ocaml allows limited mutable state, making it impure. Erlang relies on messages - a process's mailbox is essentially the only repository of implicit state for it.

Usually, OOP or a database is an acceptable compromise. They're useful for different problems. OOP works well when the data is relatively heterogenous - it doesn't share common structure, so the rules for dealing with data vary greatly depending on the data (hence the attached behavior, and the focus on letting one grok only part of the system at a time). Databases work well when the data is relatively homogeneous - you have many bits of data that all share the same structure and the same rules for dealing with them. Hence the focus on the schema (which is a way of encapsulating many of these rules), and the lack of concern for abstraction boundaries within the data.

What about "dynamic relational"? See MultiParadigmDatabase.

I could go into more detail if you want - this is basically why I concluded that a "pure" TOP language is not practical. Have no idea what to use as a page name though.

-- JonathanTang

Pure anything is probably not practical. You can put examples under TableOrientedProgramming or ProceduralMethodologies if you want.

We know from mathematics that "closure" - the assurance that the result of every operation will be a member of the domain in question - is extraordinarily helpful. I don't know what it means to say "pure anything is probably not practical". I do know that programming environments, systems, and languages that attempt to preserve something like closure are, in my view, far more "practical" than those that almost immediately resort to hacks, restrictions, special cases, and baroque syntax rather than solve a fundamental semantic problem that makes a "pure" solution difficult.

Real World Sub-Types

I am skeptical that there are many real-world things that result in duplicate and growing switch/case statements because ThereAreNoTypes in the real world that I observe. Can pro-type people provide actual example lists, preferably from the business domain? Thank You.

What do you call the difference between an employee and a product, if not a type? What is numberOfDaysInJanuary, if not a subtype of integer?

Integers are taken up in multiple places under ThereAreNoTypes. Employee traits are best treated as orthogonal. For example a manager can be "exempt" or "non-exempt". Exempt-ness is thus orthogonal to "manager". Full-time and part-time is also orthogonal to being a manager. Plus, the laws tend to change every few decades. There is no single "clean" linear or hierarchical list that captures all possibilities, unless you do a combinatorial explosion tree like the soda example in LimitsOfHierarchies.

I didn't say there was such a hierarchy; I agree there often is not (sometimes one gets lucky). But you didn't answer my questions directly. I am aware that such traits raise a lot of interesting issues, but I didn't mention them. I asked...well, read it again. What do you call those?

Okay, I think I see what you are getting at, but I don't see how it relates to case statements. The bottom of ThereAreNoTypes describes the difference between "types" and "sub-types". You seem to be focusing on "types" (entities) rather than subtypes.

You're still not answering my questions, which stand alone, all by themselves, without needing reference to ThereAreNoTypes (I did go look again, but never mind that for the moment), nor any other page. I designed those questions to be carefully self-contained. Look at my 2 questions again. What do you call those?

I am not sure what you are getting at I guess if ThereAreNoTypes does not answer your question (Wiki needs "#"-like marker labels really bad).

Yep! Pages like ThereAreNoTypes are too big to be referenced, without them. (SwitchStatementsSmell -- you should not need #-suffixes in wiki. Refactor the relevant topics if you wish to refer to them directly. --SamuelFalvo?)

The difference between employee and product is that they are different entities. The integer question seems to be confusing set theory with type theory. Subsets do not necessarily imply sub-types.

Ok, that's an answer, good. So the first thing is, note that there is not just a single theory of types; it's an ongoing area of research in several fields, notably pure math and also computer science, and by now quite a large number of distinct theories of types have been published. Of course, only a few have been widely accepted or used -- only a few percentagewise, that is; numerically a fairly large number have been accepted and used in various contexts.

So you said there are no types in the world you work in, and wanted examples. Now that you've answered my terminological question, the first part of the answer is "yes, you do too have types...you call one kind "entities" and you call another kind "subsets"; granted, other people mean something else when they say "types", but that's because there are quite a few type theories floating around."

That doesn't get around to answering the rest of your question about switch statements multiplying like rabbits, yet, but are you with me so far?

I guess, but I don't want to get into a definition battle over what "types" are here. I just want to see the actual "lists" (or something close enough) that you guys see that tend to be duplicates, regardless of what they are called.

I don't see how you can be concerned about a definition battle when I just said that there are many type systems, not just one, and wanted to know what terminology you were comfortable with for the example above! Anyway, I'll give a small example that occurs frequently in my own work, so it's not contrived. Tomorrow, though, not right now. :-)

Still waiting....

EditHint

Issues in Preparation for Cleaning This Topic
- Much of above overlaps with LongFunctions and LongFunctionsDiscussion
- The topics tend to weave back and forth such that similar stuff can be moved next to each other. Perhaps we can propose little category indicator flags, and then insert them where they apply. Example: "[long_mothod]". When people have looked over the classification assignments, somebody could pull together related tags, perhaps even splitting this topic into smaller ones. This way, we "mark before move".
- ?

Would like to address the stephanov comment and the nature of the bindings used to solve these problems, this is not just generic functions with a new name, this allows for compile time optimizations not possible in non c++ oop languages period(with that level of performance obviously as oop is turing complete) Generic Objects are new classes in themselves not subtypes for example traditional oop "polymorhism" (defined differently in all the texts hence the quotes lol) is not possible period using pure generics the term as it has been explained here (polymorhism as discussed here is a implementation of subtyping) Stephanov's main beef is essentially with oop really comes from his background as mathmatician. Consider a oop implementation of the == operator, it doesn't imply equality at all here is == is just a really just a function pointer and could really be anything.

Generic programing - is essentially contract programming where you implement a formal contract to standardize mathmatical operations generally it uses sets {T...} to define a set of a unknown type and assigns specific operations to operate on those containers this is not oop, to me its essentially functional programming (for example no inheritance in stl). Upsides as I see them: its the only way to factor out permutations having code oop style polymorphs don't help here period as the types of methods in the interface may not even exist yet! its doable in c but can be error prown and painfull. So very generally each class generated is its own class you get blazing fast speed with a lot of hair pulling and I find truly generic code. But if you don't need the performance or that level of generality (think real time code here as a example) procedural and oop is often simpler. Im comfortable with all three paradigms but this can be more work for general non specific code when you don't need the speed so if I don't I generally use oop and a pattern based design as oop is really the only paradigm anymore that most programmers care to master sadly.

SubType? Polymorph - this is standard oop ive implemented compliers for this in essence its a vtable lookup(little worse if you wan't mutltible inhertance) as the first point of view noted some times it does reduce dependence on duplicate switch statements but as the other point of view said you end up with more classes I would say this is subjective but dynamic cast can be slow in c++ if you have to use it probably a code smell sure but sometimes you inherit a code base and this quickly solves the problem at hand. Even worse some compilers generate terrible code for the vtable lookup and the switch although uglier generally does give common performance on different compliers. (xmacros even in c can work around this ugly though all admit but can solve the do it once problem, correct, maintainable just ugly) Another problem with this is if you need to port to another language oop is generally not well defined mathmaticaly, note not saying this isn't possible or even that its bad, just in practice, runtime type information and runtime polymorphs can create more subtle errors then the case statements would, since case it basically simple enough its usually implemented in the same way on most compilers.

RunTime Polymorph(not in c or c++ natively boost can help though here) - this is what you get in javascript, self, smalltalk, php, ruby ext. Essentially each method is attached to a key its usually implemented with a hash(if you get a good implementer sometimes some nice caching scemes to reduce overhead) this has the advantage of being able to have all the functionality of the above methods and none of the draw backs accept possibly speed that said in these languages oop is fundamentally different period so problems like the above can often be solved by using other methods the languages provide in a much cleaner manner. This is by far the most flexible implementation but its important to note that this is really a different form of oop what you get in java and c++ traditional classes(ie not templates, generic programming) Down side here can be performance quit often

Regarding performance imo profile don't trust compilers read the machine code, so much dogma out there in this area just be scientific and profile imo if performance is not a issue though I personally prefer the last as it is the most flexible. But its not the same form of oop really at all as you get in java or c++ (traditional classes or templates) ie if its prototypical treat it like that as methods are not static and variables are generally just hash tables so think of things like that. Ie type doesn't matter here period as long as at runtime the interface is really there if you desire stronger typing then you usually have to implement it yourself if its not supported manually

To summarize all three of these methods have there upsides and downsides dogma aside. I would say it depends on the nature of the shop I generally use traditional java style oop as its familiar to the most programmers but this could depend on your shop, to me this is a human language issue mostly. With one exception that I do see a lot subtype abuse, by this I mean you have so many subtypes you essentially lose strong typing kind of. Ie think of c# or java where everything derives from object abuse of this imo leads to a situation with many of the downsides of like say void * in c or non typed dynamic languages. IMO to me all traditional oop really is technically is just structures with more advanced access modifiers(private, public ext) and subtype polymorphism. Sorry if some of this doesn't directly deal with case but it does indirectly. For the record though I do agree with the second ops claims of a lot of dogma in oop, why do I say this we where told for 10 years for example inheritance was great and if all your functions where not virtual you don't know oop. Then gang of four came out thank god and finally told people deep inheritance leads to tight coupling is evil and to prefer composition or at least shallow hierarchies when you really do want the inheritance relationship. I was saying this for 5 years at this time and getting ridiculed for it then that book comes out and two years later my code was good(maybe it wasn't bad all along given it never changed). IMO use common sense sometimes you need to optimize and the code is uglier no way around this especially in native languages. Note im not anti oop I actually find its the best way to express a lot of the simpler coding idioms in a common manner that a lot more coders have been taught and understand, but I think the lack of well defined mathematical foundations and a lot of dogma(not always dogma but watch for it, oop just sucks for some things period as does any paradigm ive ever used) not really based in reality is a issue, ie I find it annoying when the boss asks for fast code then you get some oop purist complaining about a bad smell, ive scene cases in the wild where this code can be 20,000 times slower times and this uglier code can just be hidden behind a oop style interface anyways a lot of the time.

btw to me its worth noting the switch redundancy I do find annoying, but I view this as a language flaw imagine a language that didn't suck that somehow allowed enum's to work automatically with switch statements or something like that say something like a varible switch that could combine multible switches into one, works with enumerated types ext , to me this would be great as you get more performance options possible without the slightly longer code that can sometimes result.

--geo

Can you provide an example of such redundancy? It may be a sign that something else should be used aside from either CASE lists or sub-classing. Mass repetition of CASE lists in domain applications (outside of SystemsSoftware) is usually a sign of poor design, not lack of OOP, in my experience. Often such can or should be managed by category-to-feature mapping tables or ControlTable(s) of some sort, for example, not direct source code. Of course, what is "poor design" is often subject to debate, but I try to use grokkability by typical programmers and cost-of-change as my primary rulers. -t

I think one can summarize most of the above opinions as follows:

Those who like short functions and believe in the usefulness of sub-type theory prefer polymorphism.
Those who like medium or mixed function length and don't believe in the usefulness of sub-type theory prefer case/switch statements.
Those who like one or the other (short functions or subtyping) but not both will probably not care much either way or judge it case-by-case.

Many of us feel that both polymorphism and case statements are useful techniques, which we use depending on the circumstances; and that there is insufficient evidence to issue a categorial recommendation against either technique.

{That would be the same as the 3rd option, wouldn't it?}

I would like to see a practical comparison of actual code.

See ReplaceTypeCodeWithClass, ReplaceTypeCodeWithStateStrategy, PerceptionOfChange, PolymorphismLimits, PayrollExampleTwoDiscussion, MartinFowler's site at http://www.refactoring.com

CategoryPerpetualArgument, CategoryCodeSmell CategoryConditionalsAndDispatching