Sets And Polymorphism

One of my complaints against PolyMorphism is that it tends to require that a taxonomy be created such that a given object belong to one and only one sub-type. (I know there are other kinds of polymorphism, but the most common kind requires an explicit or implicit taxonomy.) I find trees too limiting a classification system as described in LimitsOfHierarchies. More generalized classification systems resemble sets, not trees. But, polymorphism does not seem to work too well under sets, at least as far as I have seen. Since something can belong to multiple sets, executing every possible matching method could turn into a mess. Are there ways to keep the simplicity of sub-type-based polymorphism and the more generalized classification systems that one gets under sets? I want my cake and to eat it too! -- top

DynamicTyping, ala SmalltalkLanguage, can do the trick. While a class in Smalltalk inherits from only a single parent, this only matters as a class implentation/construction detail. When it comes time to determining whether or not an operation on a type is valid, only the set of messages a class can handle is considered - not its set of parents.

For example, if you want to have an object be able to act like a boolean in Smalltalk, you only need to implement methods such as ifTrue:, ifFalse:, ifTrue: ifFalse:, etc. No need to try an "inherit" from the boolean type.

Of course, this is a bit more expensive at runtime - method lookup in a dynamically typed language (which often requires querying an associative array) is slower than the optimized lookup mechanisms used by a language such as Java or (as an extreme case) C++. Whether the trade-off is worth it depends on your application.

I was thinking more in terms of domain objects such as employees, customers, etc. For example, if we take the payroll example from http://www.geocities.com/tablizer/struc.htm and assume that an employee can belong to at least three orthogonal sets: manager, full-time, and exempt. Using dot notation:

Reference: Code Segment 1

  employee.pay()

could trigger up to three methods if we extrapolate sub-typing to sets. I don't see how this is desirable. For one, it does not control the order of execution. We would need ranked sets for that. (Related: EmployeeTypes)

While Smalltalk objects can implement whatever methods they want; they only inherit from one class - so this issue doesn't occur. Java allows multiple inheritance of interfaces, but only single inheritance of classes - and only classes can have implementations for functions.

Languages which support multiple class inheritance, such as C++ and Eiffel, require the programmer to resolve these issues. There aren't any languages out there which resolve such conflicts by calling all of the conflicting methods in turn - that would almost never be the correct thing to do.

Well, then it is not really sets then, because sets can have zero or many "matches".

[ You've just confused yourself. Up at the top you're talking about objects belonging to sets instead of trees, i.e. an object can have multiple "parents". MultipleInheritance provides exactly that - if you have a BorderedButton that inherits from both Button and BorderedObject, it's a member of the set of Buttons (inheriting all functionality thereof) and a member of the set of BorderedObjects. Thus, all 3 sets (Button, BorderedButton, and BorderedObject) fulfill your criteria of having zero or more members.

{What if they have overlapping methods? It is not really set based because that is not allowed to happen.}

Down here, you're talking about sets of applicable methods. This is an entirely different problem. C++ and Eiffel require that the programmer explicitly specify which member of the set of applicable methods he wants. CommonLisp defines an ordering to the "sets" to which an object belongs - see the CommonLispHyperSpec (section 4.3.5) and TheArtOfTheMetaObjectProtocol for details. Dylan refines the CommonLisp ordering - take a look at http://www.webcom.com/haahr/dylan/linearization-oopsla96.html.

And to answer your question about whether there are type systems that keep the simplicity of sub-typing and the classification of sets, go read up at Haskell and come back after you've gotten through TypeClasses at least. -- JonathanTang]

BookStop

As far as I can tell, all these techniques still lead to one and only one method. If not, their status as being "OO" is perhaps doubtful.

And an alternative is?

Full set handling. However, sets don't guarantee only one answer, and this is what bothers polymorphism proponents. However, polymorphism usually requires tree-shaped taxonomies, and this bothers set proponents. I am looking for a super-set or happy medium.

What is "full set handling" and what language implements it?


Re: One of my complaints against PolyMorphism is that it tends to require ... that a given object belong to one and only one sub-type.

I do not see any conflict between Polymorphism and Multiple Inheritance (aside from name collisions). A given class can be derived from many different base classes and any or all of the resulting methods may be polymorphic. Indeed, if you are deriving from the classes, one would expect the methods to be polymorphic.

If a class/object has multiple parents, then there is no graceful way to handle method collisions. I think most proponents of multiple inheritance agree that parents should be widely different "kinds" of things. Sets allow things to belong to multiple sets. If methods cannot overlap, then it is not a replacement for sets. Polymorphism is thus fundamentally an IS-A solution rather than a HAS-A solution.

Translation, please? What problem is being envisioned? Name collisions are not unique to object oriented programming and OO languages tend to have better handling for the problem than procedural languages. The shared base class problem is a nice theoretical problem, but is easily avoided in reality.

It is not a matter of error handling, but general software organization. The DiscontinuitySpike description below probably best describes the problem from a code maintainer's viewpoint.

Translation, please? The reference does not provide any description of a problem. What is the problem being discussed?

Couldn't MI result in a method conflict? Multiple "type paths" may reference a method of the same name. Set theory implementations generally use the set name such that there can be no conflict. Two sets cannot have the same name. It would be equivalent to a duplicate primary key violation. It could perhaps also be argued that it is easier to "query sets" if you want to study or process them on a meta level. But this gets into the age-old navigational-versus-relational conflict. Those who favor MI probably don't mind using paths and pointer based object/class browsers, versus set-and-predicate algebra favored by set fans.


Unique Answer?

Polymorphism seems to require some way to come up with only one dispatching answer. However, sets are supposed to be independent from one another. This seems to be at least one place where the conflict arises. The independence is at odds with uniqueness. Herein lays the dilemma it appears. Polymorphism seems to be a technique or set of techniques to get one and only one answer per message/request/method.

There can only ever be one block of code which is appropriate for any given case. Even if it is somehow appropriate for two typical implementations to be used in some case, that still reduces to a single code block. The correct thing to do in the case of an employee being/not-being a manager, full-time and exempt, is to dispatch on all three of these details. This is standard OO. Not all 'OO' languages support it elegantly (needing to resort to well understood patterns such as DoubleDispatch), others do (MultiMethods).

Or perhaps what he's talking about is AspectOrientedProgramming? (The talk of orthogonal vs non-orthogonal aspects leads me to this)

-- WilliamUnderwood

[The auto pilot asks the navigation system for the next course change. The navigation system checks the radar and sees a mountain ahead. It calls a set of methods that return a set of results (one that goes left, one that goes right) and returns the set to the auto-pilot. The auto-pilot averages the contents of the set and flies straight into the mountain. I'm missing something about the value of non-unique answers, obviously.] Either that, or the autopilot applies both course corrections, with the same result.

Well, indeed some approach needs to be applied to either widdle down the choices or do both (averaging or whatever). Most decision-making systems probably use some kind of ranking of results such that they take the highest-ranking course of action. Thus, they do return multiple answers, it is just that a technique(s) is used to widdle those down to one course. For example, if you are being mugged, your head probably evaluates the alternatives of fighting back, complying, or running away like a bat out of heck.


There seem to be two different uses of "sets" and SetTheory in the above.

These two uses of "sets" refer to two different things; and should be discussed separately.

I think we should avoid the use of "types" altogether if possible.

Why not? Even if you think ThereAreNoTypes; as a first-order approximation they are a useful concept. And while many real-world entities may be difficult to classify (see ClassificationProblem); other useful things (such as many mathematical concepts, such as integers) can be specified exactly.

I would rather focus on "RealWorldEntities" rather than math and integers since it is where the most difficulties arise; possibly because human-made classifications and concepts are more subject to whim and change than the rules that govern math. See ExtrapolatingMathToHumanConcepts for more.

[So you just want to avoid the parts that are subject to whim and change? I think we should focus extra attention on those. What do you want to use instead of types?]

No, I am saying that concepts useful to math may not apply outside of it.

Certainly, some of them are limited. On the other hand, what would be a better way of modeling RealWorldEntities?

Full sets with the possibility of "fuzzy links" such that set membership is a factor between 0.0 and 1.0. However, the latter is mostly for AI-like solutions, so lets concentrate on full or none membership. Polymorphism doesn't seem to scale/adapt to sets very well that I see so far.

So how do "sets" determine membership and resolve conflicts between multiple implementations of the same method? I am not sure I can identify any real world operations that would benefit from having multiple results from a single operation, and believe most people would view having multiple results a failure. In a more concrete vein, drawing upon the employee.pay() idea presented above, what is the union of the set where pay() is implemented as direct deposit with the set where pay() is implemented as printing and mailing a check? From set theory, it would seem the correct response would be to do both and issue the employee's full salary via direct deposit and issue the employee's full salary via a printed check. I do not believe set theory was ever intended to deal with operations and thus does not seem to have any correlation with multiple inheritance.

My observation is that sometimes you do want multiple operations and sometimes you don't. But it has to be determined on a case-by-case basis. The only practical way I know how is to use nested IF statements, but these "feel" ugly. Sets are the better way to "classify" things, but perhaps not the best way to dispatch behavior, at least not one-to-one the way polymorphic trees/lists do. Maybe in the real world the coupling between noun classifications and behavior is actually weak and attempts to make one "ride on the back" of the other is futile in most cases. I am trying to combine the power of sets with the clean mapping of polymorphic-like dispatching, but it is not working out. There is a puzzle here that I can't solve. I want to get away from IF's using something like polymorphism, but more powerful than polymorphism, which seems inherently tied to trees or mutually-exclusive lists of "subtypes" for these kinds of problems.

[You can still use sets to classify things. Build a set of objects to which you want to send a message. Send them all the same message. Let polymorphism resolve the behavior associated with that message for each object. Handle the set of results as you wish. What am I missing?]

{do you mean like:

Reference: Code Segment 2

  Set S;
  S.Add(Triangle T(1,2)); //base, height
  S.Add(Square Sq(2)); //2=length
  S.Add(Circle C(3)); //3=radius
  cout << S.Areas(); 

? }

[Perhaps. It depends on what's inside "S.Areas()" and what we want to do with the results.]

{Shouldn't S.Areas() just collect the individual results of the appropriate method foreach element? I.e. T.Area(),Sq.Area(),C.Area()... In which case you probably need somewhere to store them so internal to the Set class there would be something like:

Reference: Code Segment 3

  class Set
  {
    Object *members;
    Object *OperationResults?; 
    Set Show(); //parameters omitted - show members and/or OperationResults? if needed
    Set Areas(); //call member methods, store results in OperationResults?
    //...plus other common member object methods
    Set Union(); 
    Set Intersection();
    Set Difference();
    //...etc 
  }
}

[That depends entirely on what you want to do with the results. You might want to collect them. You might want to add them. You might want to average them. You might want to pick the biggest one. I wouldn't tie shape methods to sets. I'd make a visitor or something to traverse the set and do whatever I wanted with the results.]

[Discussion of VisitorPattern moved to VisitorPattern.]

There are dispatching questions that this does not seem to regulate:

Isn't the result resolving to a single specific operation? I am not sure how having a "set" eliminates the need for a single result.

As stated above, it appears to conflict with it. And, perhaps we don't always only want one result. In the employee examples, maybe top sales people get some of the same financial perks as management, but still have some sections that are calculated differently.

[But each employee still gets one paycheck. There is one action used to determine that paycheck, even if it operates on sets of data, belongs to a set of actions and is composed of sets of actions.]

I am not sure what you mean here. It might be "one particular set of actions", but it is hardly one action. Note that there may be other factors that determine which course of actions are taken.

[It is one action. Yes, deciding which action it is may depend on other factors, but in the end there is one series of instructions that calculates the correct value of the paycheck. Even if that series of instructions depends on a random value, it is just one series of instructions. No-one gets two amounts printed on their paycheck.]

Technically that may be true, but how does it translate to a practical decision? You appear to be saying something like, "in the end, a known and fixed set of machine instructions are executed". Even if true, it does not suggest a code organization approach or programming convention.

[You claimed above that because there are a set of possible ways to calculate pay that there is some conflict with the need for a single result. There is no conflict. There are different ways to calculate pay for different employees, but only one way to calculate pay for any single employee. This translates to a practical decision encoded as logic. Sets may be used to organize employees, pay scales, union rules, etc. Polymorphism may be used to simplify the code. There is no conflict that I can see.]

So you are proposing polymorphing off of every single combination of factors? If there are 500 combinations, then you make a subtype for every single combination? Possible, but I don't see how it simplifies things. A new factor could double the combinations if we create a CartesianProduct to get the combinations.

[No, that isn't what I'm proposing. Use polymorphism when it makes the code simpler.]

I don't see how polymorphism will simplify stuff here without doing a CartesianProduct to create "subtypes" or some wiley multi-dispatch. Perhaps an example would help.

[Perhaps you can give an example that shows why you think you need a cartesian product of sub-types.]

A good portion of biz entities don't fit into a hierarchical classification, partly because of too many competing orthogonal factors. Let's focus on the EmployeeTypes example for now.

[We already solved that example without a cartesian product of sub-types. Employees have sets of roles. Please provide an example that needs a cartesian product of subtypes.]

I don't see roles as a solution. One can belong to multiple roles, for one, and we are right back to the original problem of multiple set membership.

[Why is it a problem? Visit the roles and send them messages. Process the results and provide the answer. I must be missing something.]

I think we need a specific example to straighten this out.


Why is non-uniqueness necessarily a flaw? [moved from above]

[Because computers aren't people, they don't have intelligence, and can't make decisions on their own. This may change if AI is ever invented, but until then, maybe's are not a good answer for most scenarios, especially in typical business applications.]

It has to be dealt with one way or another. It happens in the real world whether we want it to or not.

[I didn't say non-uniqueness was a flaw. I said unwanted collisions are evidence of a flaw. If we need to execute one method and we have no way to decide which one it is, there's something wrong with the design of the software. It will never work.]

Well, some are wanted and some are not. We need a way to manage that in our software, and change from unique to non-unique with minimal code shuffling.

[If we have unwanted collisions we haven't written the software properly. We aren't doing what the business wants. Wanted collisions are handled whatever way the business wants. If they want the results summed, averaged, max'd, whatever, we do that.]

There seems to be a communications gap here. I believe it to be more about ChangePatterns, not absolute wrongness in output. Polymorphism uses a uniqueness assumption to simplify the software to some extent. However, if the uniqueness turns out not to be a lasting or stable assumption, one has more code rework. Polymorphism has a bigger DiscontinuitySpike between uniqueness and non-uniqueness than a typical set-based approach.


Let's try formulating it this way. We have multiple sets that a given employee can belong to. (Call them "roles" if you want). And, we have functions or methods that are to be executed when calculating pay. Let's say each method provides a dollar amount to be added or subtracted from the running total. It can also supply an optional description for a line-item on a receipt or tracing log.

Somehow we need a way to map the roles to the methods.

No we don't, each role has the same set of methods, we use polymorphism to allow sending the same message to any possible combination of roles. The roles contain the methods, so we don't need to map them, since objects carry their methods with them.

It is not a one-to-one relationship. My initial instinct is some kind of many-to-many table with a temporal priority column. But, this is hardly better than a bunch of IF statements.

Tables aren't methods, and are irrelevant to the solution.

[Why not use a Role interface with "calculatePay" and "getLineItems" methods, make all of the roles implement Role, and have a visitor call "calculatePay" and "getLineItems" for every role an Employee plays? The visitor adds the results to a running total and adds the line items to a list. Why would we need to map roles to methods, use a many-to-many table with a temporal priority column or a bunch of "if" statements? What problem are you trying to solve?]

A role interface would be a perfect solution, of course the problem is Top isn't looking for a solution, he's looking for a problem to justify his beliefs that OO doesn't work. Using employees with roles is the typical OO solution to this problem, but top wants to believe there is no OO solution that will work, so I doubt he'll accept your solution.

It is not a matter of "there is no solution", since most common paradigms are TuringComplete. It is a matter of convenience in design and maintenance. Of course I am going to prefer tables over a GOF visitor pattern. That should not surprise anybody. Needless to say, the mapping between methods/functions and the "roles" is probably going to be a complex relationship. I will tend to use EntityRelationship? conventions to describe those and OO proponents will tend to use design patterns. If you want, I will describe the reasons for my choice, but they are nothing you haven't heard before. -- top

[The visitor pattern works. If you've got a better solution, provide it. You haven't explained why you need to map roles to methods, why you need a many-to-many table with a temporal priority column or why you need a bunch of "if" statements. You haven't defined the problem you're trying to solve.]

I cannot explain anymore in English. I need some kind of notation. Before we get into a table-centric approach, do you feel comfortable explaining why visitor+ would be better than IF statements resembling something like:

Reference: Code Segment 4

 function calculatePay(emp) {
  ...
  if (memberOf(emp,A)) {  // if emp. belongs to set A
    ...
    foo = blah blah
    addLineItem(foo,"")
  }
  ...
  if (memberOf(emp,C,D)) { // if emp. belongs to set C or D
    ...
    addLineItem(bar,"")
    if (memberOf(emp,D)) {
      ...
      addLineItem(foobar,"line item description x")
    }
  }
  ...
  if (memberOf(emp,E) && memberOf(emp,F)) {
    ...
    addLineItem(flug,"")
  }
  ...
  process(...)
 } // end function

An advantage of this approach is that we don't have to create formal roles if such roles are ONLY going to be used to calculate pay.

[Why are you adding line items in the calculatePay method? I thought they were separate operations. You're going to have to modify this method every time you add a role or change the logic, and it's going to be a huge method. There's no good reason not to create formal roles. In fact, formal roles have already been created (in the business), we just want to model them in the software.]

That's not an advantage, you have FearOfAddingClasses, sad because it's the best way to simplify a design and build good code. Your approach is bad, it'd have to be rewritten every time a rule or role changed, making it very fragile. A better approach would be

Reference: Code Segment 5

  class Employee{
    theRoles=new RoleList?();

function calculatePay(aCheck){ foreach(aRole in theRoles) aRole.addLineItem(aCheck);//putting the code in the role } or... function visitRoles(aVisitor){ foreach(aRole in theRoles) aRole.acceptVisitor(aVisitor);//keeping the role simple, putting the code in the visitor, this visitor being for calculating pay } }

(Note, diplomacy ends at this point. Bumpy road ahead.)

Now roles can be added, removed, changed, extended, aggregated, etc... without ever breaking or forcing a change on the calculatePay method, or the employee class. Polymorphism is superior to if statements, when are you going to grasp that? How many time does it need to be demonstrated before you accept that you are wrong, and if statements are not the proper solution? Every time you show sample code, you demonstrate your lack of understanding of what it takes to write solid code, you should really take some classes or something and learn to program better.

And you should take some science classes to learn how to construct objective evidence instead of brochure-writing. You are simply trading one change-dimension-friendliness for another. All you have done is manually implemented set logic. Further, the methods are not necessarily one-to-one with roles. Multiple roles may share methods. We need another level of indirection. And why would you want to hard-wire a tangled structure like Visitor into code? Read CodeAvoidance 10 times and maybe you will see the evil of your hard-wired ways.

No we don't, you seem to miss that a role can ge an aggregate of other roles. Writing a loop is not manually implementing set logic, and if SQL had polymorphism, I'd gladly use it, but it doesn't, so we're forced to do it in the language. Your if statements are manually reimplementing polymorhpism, something the language will do for you, if you know how to use it. And using polymorphism is the best exercise in CodeAvoidance you can do, or didn't you notice how much shorter and simpler my solution was than yours? Every sample you show is bloated with if's and switches and always more code than the OO solutions for the same problem, I assume you like writing all that unnecessary code, maybe you use lines of code as a productivity measurement or something. Oh, and by the way, I'm adding change friendliness, your example has none.

Less code? That can be proven objectively.

It just was, we both only showed dispatching code, neither showed implementation of calculatePay.

And... all of your dispatching is hard wired into your if statements, at least I use the language to do it for me.

You didn't show the whole shebang. Note that IF's are better if one cannot extract out a clean-enough pattern.

The cleanest pattern is to remove them altogether, they are unnecessary, the language will dispatch for us, we don't need to manually write dispatching code.

However, if there is enough regularity, then tables are not objectively worse than things like visitor. Visitor is the ugliest of the GOF patterns. There is probably going to be a Role table anyhow in any real business system. Thus, there is no reason to create a copy in code (with classes). Otherwise you violate OnceAndOnlyOnce.

There is a need to have those classes, both to hold code specific to that role, and to allow the language to do the dispatching for us, so we don't have to hack up a dispatching system with if statements.

Reference: Code Segment 6

  evaluate(myRole.scriptName)

Get real, this is silly, putting code into tables is hardly acceptable practice. And you still have to write that script.

It does not have to be the code itself, just a function call. Of course one has to write the implementation. Even OO cannot read minds.

If all the functions have the same name, we don't need to do this, we can just say thing.function(), and let the language dispatch to the appropriate method.

is hardly more code than:

Reference: Code Segment 7

  myRole.payStrategy()

And it occurs probably only once.

How about we write 3 different versions to compare:

Couldn't you save your critiques/preaching until AFTER we are done? My goodness.

How about the example assume 10 roles that reference 10 methods/functions/blocks (units). In the real world it may be more, but let's just use 10 for the example. 3 of these units will use 2 other units, but supply their own description. In other words, they reference other methods, add the results, but supply their own title. The process will keep track of all the line items (description and amount) for a given employee. Remember, a given employee can belong to multiple roles.

[The visitor example already handles this case.]

They all can return the right answer. That is not the issue.

[Please, please, please tell us what the issue is. What will tables give us that visitors won't? How will they make my life easier? Please?]

Well, here we go again.


I honestly did not want this to turn into another tables versus OO rant. I was really trying to find a good middle ground between polymorphism and sets. Polymorphism is conceptually simple but sets are more flexible. I was trying to find a way to get the best of both. The complexity leap from regular polymorphism to a visitor pattern is huge. If set-like behavior was built into a language, perhaps the leap would not be so large. -- top

[False dichotomy. We don't have to choose between polymorphism and sets. They are orthogonal.]

Top, this isn't a table vs OO discussion. The problem is you still don't grasp the difference between data, and behavior.

DataAndCodeAreTheSameThing

I said data and behavior, not data and code, and there is a difference.

Set's are obviously the best way to view, sort, filter, and query data, no one is disagreeing with that. But set's offer nothing in the area of dispatching behavior at runtime without resorting to eval on function names in a table.

Is this an inherent property of sets, or just tradition?

Eval is bad, I'm not going to try and convince you of this, it is widely known, go look it up.

It is little different from say shell scripts that run other programs. It allows one to mix and match languages and tools. Sure, if you do it sloppily it is a security risk, but that is true of anything. You can't compile the whole world.

This leaves us two options for dispatching behavior, if/switch statements, or polymorphism. Polymorphism is widely accepted as the superior more flexible solution.

Where are you getting your popularity stats? Even a good many OO proponents complain that how poorly actual code is with regard to OO concepts. See OoBestFeaturePoll.

You're projecting... I did not say anything about OO, I said polymorphism. Polymorphism is a concept that exists in several paradigms, not just OO. You're so hung up on dissing OO that you can't have a discussion without it.

Polymorphism is more flexible because it allows systems to be built which can be extended and enhanced without changing existing code or being forced to evaluate strings, the "if/switch" solution doesn't allow this and forces one to change existing code.

That is incorrect. It simply favors "subtype" oriented changes at the expense of verb-oriented changes or non-orthogonal changes. We had this battle already at SwitchStatementsSmell. See also PolymorphismLimits. Your arguments for polymorphed subtypes better fitting the way the world changes is some of the worse reasoning I have ever seen.

Again... you're projecting, I said nothing of OO, I only said polymorphism.

You've read too many OO textbooks and had far too little actual practice implementing OO solutions to really grasp this yet, and you prove it just about every time you write a sentence.

Even a good many experienced OO fans agree with my criticisms of subtyping and polymorphism. They simply say that is more to OO than just poly and subtyping.

Until you do it, you won't understand it, and you'll continue to not understand why we all disagree with just about everything you say. You will not understand without putting in the effort to actually use polymorphism in real solutions, and this whole page just reaffirms that you still don't "get it". Rather than continuing to not get it and causing so much friction, why don't you just try using it, why not write a few systems in another style until you do get it? Are you really so arrogant that you think you are right and the vast majority of us are wrong? Is your opinion of yourself so high that you can't fathom being wrong?

It is not necessarily a matter of being "wrong". It is possible the differences are personal preferences. I am willing to accept this possibility. What I really don't understand is that you think it is perfectly normal to have so many people who think OO is wonderful, yet have OO's benefits be so undocument-able. You guys cannot articulate the benefits, other than with faulty or questionable assumptions about how the world changes over time. Your approach to OO is anti-scientific.

OO benefits are not undocumentable to anyone but you, just about everyone else accepts that it has benefits. There is no GoldenHammer, but to deny seeing any benefit in OO is simply being willfully ignorant.

''Anyhow, I think I am stuck being a table-head. I thought in tables long before any heavy exposure to databases.

I cannot fundamentally change the way I think.

Then you cannot learn anything new, because all learning requires changing the way you think. You've basically just admitted that you have a closed mind about the subject and aren't willing to discuss it, since your mind is already made up.

And, I think relational is superior to OO because it introduces discipline and consistency to relationships that OO lacks. OO is the goto of relationship management. It is just a bunch of classes/objects that point to each other using jungle rules. When forced to do OO, I simply translate them to ER diagrams in my head. Unless you show me objective evidence that OO is better, I will fight for other table-heads out there. It is time FP- and table-heads demand objective evidence from OO proponents. Issues with Eval and the like are only implementation details, BTW.

Relational and OO are orthogonal, only you think they are in competition. OO programmers use relational database daily, there are some issues involved with translating one model to the other, but they aren't competing. OO polymorphically dispatches behavior, something relational databases can't do, and relational databases allow dynamic flexible querying, something OO can't do easily. They're only competing in your mind, for the rest of us, they work quite well together.

[I'm still waiting to see what problem you're trying to solve with this page. -- EricHodges]

See above under the bullet points.


Attempt to Unify OO and Relational for Example Above (Code Segment 4)

As someone who believes himself to be a competent OO programmer with some experience with relational databases, I would like to address the employee pay code example above to show how OO and relational mirror each other and make a case for sets and polymorphism. -- WayneMack

OO Decomposition

My reading of the example is that we are discussing polymorphism for pay categories not for employees. An employee has a pay category, hence the employee class should contain a pay class.

Relational Decomposition

For the data modeling, I would probably have an employee table with a many to one link to a pay category table. Again the variations are contained in the pay category table not in the employee table.

OO versus Procedural Implementation

Regardless of approach, two steps are needed. First, one must evaluate the pay category and determine the pay operation to be performed, i.e., one must do the "if" code. Second, one must perform the selected operation.

One procedural approach would be to combine the pay category evaluation with the pay operation in a single method, as was show above (Code Segment 4). A second approach would be to implement each pay operation as a separate method and have these methods called from the pay category evaluation method (below).

Reference: Code Segment 8

 function calculatePay(emp) {
  if (memberOf(emp,A)) {  // if emp. belongs to set A
    Pay_Type1()
  }

if (memberOf(emp,C)) { // if emp. belongs to set C Pay_Type2() }

if (memberOf(emp,D)) { // if emp. belongs to set D Pay_Type2() }

if (memberOf(emp,E) && memberOf(emp,F)) { Pay_Type3() } } // end function

In an OO implementation, the pay category evaluation code becomes a class factory and the pay operations become polymorphic methods. Assuming a simple evaluation, I would create the pay category in the constructor of the employee class.

Reference: Code Segment 9

class Employee {

 PayCategory * myPayCategory;

constructor Employee(emp) { myPayCategory = PayCategory.CreatePayCategory(emp) }

function calculatePay() { return(myPayCategory->calculatePay()) }
}

class PayCategory {

 static function CreatePayCategory(emp)
 {
  if (memberOf(emp,A)) {  // if emp. belongs to set A
    return(new Pay_Category1)
  }else if (memberOf(emp,C)) { // if emp. belongs to set C
    return(new Pay_Category2)
  }else if (memberOf(emp,D)) { // if emp. belongs to set D
    return(new Pay_Category2)
  }else if (memberOf(emp,E) && memberOf(emp,F)) {
    return(new Pay_Category3)
  }
 } // end function

virtual function calculatePay() {}
}

Comparison of Procedural and OO Implementation

Note there is no difference in the operations that are performed, the difference lies in how operations are allocated to methods and modules.

Comparison of Relational and OO Model

The OO model shown (using containment) combines two classes in the same way a simple join combines two tables in a relational database. The same forces that would suggest decomposing the data into two table in a relational model, suggest decomposing into two classes in an object model. The pay category base class would be the equivalent of a pay category table, and the derived pay category child classes would the the equivalent of rows in a pay category table.

I would conclude that the differences between OO, Procedural, and Relational are not that different and they actually flow together quite nicely.

They don't, the OO model will only ever decide the pay category once, upon construction. The procedural one will have it each time it does a different operation. calculatePay won't be the only method, there are always several, this is where the OO model becomes cleaner.

This is a very important point that should be recognized. I just felt that the bounds of the given example did not provide a way to illustrate how other, unspecified operations, would also benefit from a polymorphic approach.

[Then move the factory call from the constructor to the calculate method.]

That would be an equally valid design decision and the code segment simply does not provide enough details to select one approach over another. The key point is that the "if" code persists from the procedural model, and that the OO model gives us far more choices to adapt its placement based on the needs and constraints of the system as a whole.

[Top has said that an employee can belong to multiple pay categories. That's why I would solve this problem by giving each employee a set of them and using a visitor to calculate their total pay.]

Unless we allow the calculatePay() method to return an array of pay values, the pay category must resolve into a defined finite set of methods. I am not sure whether it really makes a difference whether we define "(Category E) AND (Category F) or define a single Category "E_AND_F". The Visitor pattern is certainly an equally valid approach to the class factory approach and the Patterns book gives more than adequate guidance to help determine which to choose (it's kind of hard to pick an approach based on a poorly defined example!). My intent was to try to show the similarities between a procedural approach and an OO approach and also show similarities with a relational database decomposition.

[Top said that an employee can belong to multiple pay categories. calculatePay() won't return an array of values, but it will calculate the pay for an employee that belongs to any set of pay categories. It would be exhausting to define a category for each permutation of categories. Instead, let an employee belong to any categories and put the calculation logic in the visitor.]

Unless one uses a Decorator pattern to iteratively calculate pay, I am not sure how one avoids defining a unique calculatePay() method for each specific combination of pay categories. Even with the Decorator pattern, one must make assumptions on how pay calculations can be combined and what sequence they may need to be run in. Certainly, stringing together a series of Decorators to calculate pay based on a random series of pay categories is another viable approach and the Patterns book provides plenty of advice for when a Decorator approach is beneficial.

[Top said the pay would be accumulated. The visitor would ask each pay category to calculate pay and add the results to a running total. No need for a decorator.]

Note the the example, for whatever reason, shows an AND (intersection) of pay categories, "memberOf(emp,E) && memberOf(emp,F))". Eliminating the overlay of categories actually strengthens the case for using a Visitor pattern.

The end result is the same, however. We still maintain the two basic sets of operations. The calculatePay() set of methods would be allocated to the Visitor, and a Class Factory or similar implmentation of the "if" sequence would be needed to build the PayCategory class or collection. The basic code remains the same, it is merely how one chooses to distribute it.

[And I wouldn't ask a PayCategory for an employee's pay category, I'd ask an Employee. Each employee would know what set it belonged to and could ask the set(s).]

As the pay category is contained within the Employee, an Employee would forward the pay category description to the requester. This was not part of the original code, but adding the methods to return the category description to the Employee class and PayCategory classes is a straight forward extension.

[I think the factory is confusing the issue. How pay category objects are created is not the central problem. How pay is calculated is.]

The code that selects which method or code segment is used to calculate pay is necessary. A class factory is one well known pattern for storing this kind of code. I had hoped that the class factory would show an orderly migration from a single procedural method to one possible polymorphic solution.

[Polymorphism will select which methods are used to calculate pay. A class factory is just one way of generating the objects. What confuses me is the "memberOf" logic. Why not ask the "set" (A, C, D, E or F) to create the corresponding pay category instead? Why not get rid of the sets abstraction altogether and give each employee a set of pay categories?]

The polymorphism must be first set up before it can be used. One cannot have polymorphism without multiple objects, hence the construction of the multiple objects is part of the solution.

The memberOf() logic was copied from the original example code (see Code Segment 4). I am merely trying to show the basic structural similarities and not critique the presented code.


Sample Application Description

(DRAFT)

Example receipt stub:

  Name: Laura K. Jones
  Pay period: Week X
  Regular Hours: 80.0
  Overtime Hours: 2.0
  ----------------------
  Base pay:.....4,000.00
  Medical:.......-300.00
  Overtime:.......100.00
  Fica taxes:....-800.00
  Local taxes:...-200.00
  Parking:........-25.00

TOTAL: xxxxx.xx ----------------------

Questions and Comments

What roles is Laura in?

Medical, over-time ("allows overtime" role), fica-taxes, local-taxes, and parking. If some of these apply to everybody, then perhaps we should not state them? Note that a person may belong to different localities for local taxes. Thus, employees may belong to different roles for those.

So deductions are considered negative pay? Blech. Redesign the system so that folks get paid first and deductions are taken from that pay (pre- and post- tax).

Why do her roles need to indicate calculation order?

Because a given pay method may assume that certain calculations already took place. And, it results in a consistent output order. (Perhaps in practice the output order and calc-order would be different, but I wanted to keep the example fairly simple, thus a single ordering attribute.)

Which pay calculations "grab" info from the receipt?

Let's assume taxes do.

Let's not. Let's assume that taxes are not a form of employee payment.

I am not sure what you mean. We are calculating pay, and taxes are part of that.

Taxes deduct from pay. The system will be simpler if you recognize that in its design.

I am reluctant to hard-wire two groups (total pay and deductions) into the design for reasons already stated. Consider this "flexibility" part of the requirements, for good or bad.

Where did you state the reasons for your reluctance? I saw you state your reluctance, but not the reasons. Pay and deductions are hard-wired into tax law, so I'm not sure what scenario you're imagining that might violate this design.

The "overtime" statement above. Another example is that some taxes may consider parking fees taxable and some not. Thus there is no single target pay amount which they all can safely use.

I didn't say two groups. You'll probably have several sub-totals of pay and sets of deductions that apply to different sub-totals.

You can't meet the requirements of a payroll system with order alone. You're going to have to build abstractions for different kinds of pay and deductions.

Illustration:

  Base-pay
  overtime
  parking fee

taxA = f(base-pay) taxB = f(base-pay - overtime) taxC = f(base-pay + overtime - parking)

How about a specific use-case where such fails.

Why do you need order for that? I thought you were trying to insert negative tax amounts in the correct place (between base-pay and overtime, for example) to make it work. You are calculating sub-totals in each tax calculation. I'd give the sub-totals names and persistence and let each tax calculation re-use them.

The requirement is that the line items add up to make the total. However, each line item is not required to use any existing line items if it does not want to.

Understood. You want order so that all payments will be calculated before deductions and to organize the output, correct?

When do we get to the part that shows how your solution to this problem is better than using polymorphism and visitor pattern?

Like I said, it may be a few weeks. In the meantime, you can work on yours.

I've already provided mine. I'm sticking with plain-old polymorphism and visitor pattern.


New requirement: A line item can optionally be a "non-totaling" line item. This allows one to add comment-like lines into the receipt. This can be used to "make" sub-totals for example. Such line-items are marked with an asterisk on the receipt.


Dealing with Mutually-Exclusive Roles

I notice that some possible roles are mutually-exclusive. For example, we may have an "over-time" role and a "no-overtime" role (to produce an error if there are over-time hours). And, if we have local tax roles, then generally these will be mutually-exclusive.

One may suggest simple "strategy slots" in the employee entity/class rather than roles for these kinds of things. That way, they are automatically mutually-exclusive. However, we then have two "kinds" of strategies: those hooked to an employee and those part of a "roles list".

I personally think it is more flexible and consistent to make all the strategies part of the "roles list". Things are more change-friendly that way. For example, perhaps in some cases somebody is taxed under two different counties/states because they live at least part-time in both. Even if that is not allowed now, laws change. We would not have to move strategy indicators from the Employee entity into the roles list when such change occurs. It becomes a mere data change instead of a code and/or schema change.

However, we would then need some way to flag role groups that should be mutually exclusive, or at least marked for closer inspection. Since such "set validation" can be added on later, I don't think we should dwell on that right now. -- top

If they are mutually exclusive they probably aren't roles, but properties of roles.

I am using "roles" loosely. If there is a formal definition that does not fit this, then perhaps we should change our usage here. -- top

This has nothing to do with definitions of roles. Overtime and non-overtime probably shouldn't be modeled as roles if any role can have either of those 2 states. They should probably be properties of roles.

I think I need an example. We could perhaps introduce properties to roles, but I am trying to keep the example fairly simple, and so far I don't see a need for such. If our existing arragement can solve all known problems, then I think we should keep it the way it is rather than extending the complexity of "roles".

What if Laura gets paid overtime for role A but not for role B? And what if Joe gets paid overtime for role B but not role A? Overtime sounds like a property to me.

That is more of an issue of task/duty assignments, which is outside our scope here. Generally whether a manager allows an employee to work overtime or not for specific tasks (functional roles) or duties, is determined on a case-by-case basis. Our payroll system does not care about that. Maybe a duty or time sheet management system might care, paycheck generation doesn't. In your example, the duty management system might track which duties are allowed overtime or not. Payroll deals more with external legal and government issues surrounding paychecks rather than internal duty management.

I said nothing about duty management. It sounded like you wanted different overtime/exempt calculations on different roles. Perhaps you don't. You brought up overtime/exempt and local taxes that form mutually exclusive sets. If they are mutually exclusive then those sets should be modeled as such. One list of roles for each employee doesn't cut it. Each employee should have one value from each mutually exclusive set.

Again, I would like to see a specific scenario where it fails. Also remember that I wish to avoid a DiscontinuitySpike when a feature goes from mutually exclusive to non-mutually-exclusive. Thus, I don't want to hard-wire in such associations if possible.

It fails when someone in human resources enters mutually exclusive options and it isn't caught for months. If you don't "hard-wire" the rules, you're relying on humans to know them.

No, you add validation. Example, pseudocode: "If there are more than one county tax roles in a given employee's set, then issue a warning." It is just "set math". See a theme?

But you have to remember to apply the validation at the appropriate time. If you use the appropriate data structures then any programmer that touches the system in the future won't have to remember to call the validation code. The system will remember for them. See a theme? We use design to make our lives easier. We could put everything in bags (or arrays, or tables) and use logic to ensure their correct use. We have found its easier to move that logic into the collections themselves so they can't be misused. This also makes it trivial to reuse the logic. We name the collections according to the logic they use (set, list, map, table, etc.) and call them classes.

If things were always only is-a or only has-a you might have a point. But in my experience things slip back and forth between is-a and has-a fairly easily. Maybe other domains are different, but biz rules are capricious. Polymorphism requires a pretty big rework effort to go back and forth. Thus, I suggest using a more "meta" approach to rule enforcement rather than hard-wired language or typing arrangements. If you approach it from a validation perspective, then such a change is simpler to handle. Reworking a bunch of code can cause just as much if not more problems than forgetting to reset the validation rules.

Regarding "reuse", if reuse is your goal, then being able to use similar logic from an is-a solution in a has-a solution and visa versa will help reuse, not hurt it. -- top


Here is my draft schema for a table-oriented ("eval") solution:

Reference: Code Segment 10

  Table: Emp
  ----------
  empId
  empName
  hourlyRate
  regHrs   // current pay-period regular hours
  overtimeHrs

Table: Role ----------- roleID roleTitle payOrder // (float) order on receipt payStrategy // (string) function name to be "eval'd". Blank to bypass. isComment // if strategy is a non-totaling item (per new requirement)

Table: empRoles --------------- empRef roleRef

Table: Receipt -------------- empRef roleRef amount

Notes: tables "empRoles" and "Receipt" could probably be combined for this example, but in practice they perhaps should be kept separate so that we can keep older receipts, in which case we would need a "payPeriod" key.

For compiled languages that do not support "eval"-like operations, a switch/case statement can be used instead. Note that if such info comes from a RDBMS, an OO solution probably needs a similar approach to translate from a column code to a "sub-type".

Code to follow at some later date.

-- top

Recommendation. Let's wait until the example is completed (as promised above) before critiquing it. We can delete almost all of the following discussion as it is based on speculation of what solution might be presented. Just sit back and wait until the example is completed and we all might learn something from the exercise.

Showing a schema is not presenting a solution, you are still faced with the problem of dispatching to functions, you can't dispatch to a function you didn't write, and you aren't going to write a function for every possible permutation.

I am not sure what you mean.

If you have a role table, with a function name stored in it, then you're proposing the same solution as the OO guys, totaling up each of the roles calcPay functions. Your supposed control table, is nothing more than your own private hacked up object using eval to implement polymorphism. Every row in the Role table is really a role subclass. Blank to bypass just means a virtual method that wasn't overriden by the subclass. Of course, unlike the OO solution, you don't have a place to store each of your methods, so you'll pop them somewhere and give each one a different name, you are just re-inventing OO, and doing it badly.

This sounds like the age-old argument about whether OO is a hacked-up navigational database with function pointers or TOP is "hacked-up OO". I personally find relational superior to navigational. The "Eval" thing is only an implementation detail. It could be made prettier, but it does not matter much in the end.

If your biggest complaint about TOP is typing in "eval(x.y)" instead of "x.y()", then I consider that a huge endorsement for TOP. It's only real problem would then be the syntax of existing languages.

And, "Doing it badly" is not very scientific.

Note that for a simple example, a bunch of IF statements is probably better than a TOP approach (and OO).

-- top

Do you not understand that calling eval(SomeMethodWithWithManyNames?) is trying to reinvent polymorphism, which already does that, but much cleaner, with compiler support, and without requiring each method to have a different name? Do you honestly think your hacked up version of dynamic dispatch is better than the built in language support for it?

If the language and the DB were tightly merged, it would not be all that different than OOP. However, that slightly larger "seam" is the cost of sharability and specialization (data stuff versus behavior stuff). That seam could be removed, but with a cost.

[Why is a bunch of IF statements better?]

Eval has huge problems as a means of directing control flow:

That's why even languages with powerful eval features (Scheme and CommonLisp) frown upon their use, and recommend APPLY or FUNCALL instead.

If you are really hard-up for speed and security, then use a case/switch statement instead of Eval. Even with those, the total result is superior to your hard-coded class pointer database.

[Scheme, Lisp, JavaScript any language with eval abilities frowns on using it, it makes for extremely brittle hard to maintain code, you obviously don't have the experience to know that.]

As for the repeated IFs - talk about duplicated code!

Are you talking about the IF statements themselves? How does IF-ENDIF take up more code than METHOD-ENDMETHOD block markers?

[Bad comparison, IF END IF takes up more code than switch statements that the compiler writes for us, we don't have to do it, you like to hack it up.]

Aside from the arguments on SwitchStatementsSmell and LongFunctions (neither of which you accept), that's a whole lot of typing that the compiler for an OO language will take care of. -- JonathanTang

At some point, we're going to have to accept that top doesn't know what he's talking about and is in way over his head in these conversations, but more importantly, quit discussing this stuff with him. He consistently proves he's not willing to learn anything.

Nobody has presented biz example evidence that even a majority of OO fans agree is good. Thus, calling me dumb or stubborn is a red herring. Get your OOwn hOOuse in order first. You cannot even sell your evidence (cough) to YOUR OWN DAMNED KIND! -- top

[Untrue, there is large scale agreement on most everything we've discussed, I only see you arguing for the other side.]

Further, most of your arguments are based around the capabilities of existing tools rather than flaws in the gross theory. It is QwertySyndrome. We will never get away from 60's-style navigational pointer shit if everyone keeps that attitude.

[Existing tools are all we have, wishful fantasies about how things should be aren't tools for engineers, they're tools for daydreamers.]

Further, if using a RDBMS, you will end up having to map stuff to records anyhow. All the magics of polymorphism you describe require one to mirror the database structure in many cases, stabbing OnceAndOnlyOnce.

[Untrue, you just don't know how to do it.]

You throw duplication at the problem in order to get the "seamless" dispatching you find so wonderful. You can't compile the database, so you copy the damned thing into something you can compile. This is elegance? OnceAndOnlyOnce would dictate that "role" info be either in the database OR in code. You cannot have it both ways without mirroring the role records in code. For one, it would require adding new roles in two different places: the DB and your ladder-ish classes.

[Untrue, you just don't know how to do it.]

Pointers are ugly and OO is inherently a pointer-based paradigm.

<How do pointers differ from keys? Both are a mechanism to express linkage. They may be ugly, but they are all we got.>

[You're opinion, and you are in the vast minority.]

Relational is the best approach so far to get away from pointer messes. (No, it ain't perfect, but it beats OO.)

[No, relational is the best approach for organizing and querying data, nothing more.]

You have been using it so long that you accept and are addicted, just like those stuck on Goto's in the 70's. Visitor is just a big wad of pointers. You can't query and transform pointer wads the same way you can relational structures. Pointers are the shanty-town way to build things. I bet deep down you know it is true, but you have simply invested too much in your pointer tools and skills.

[You critique what you don't even understand. You are so blinded by your hatred for OO, you don't even realize you still haven't figured it out. If you think Visitor is a big bag of pointers, you haven't got a clue. Your approach is simplistic and amateurish, and clearly demonstrates your lack of skill programming.]

[ You don't have a leg to stand on, because you don't know what you are talking about, your web site is nothing but misconception after misconception.]

[ You're a one trick pony that learned to write a SQL statement, and not much else. Most of OO programmers here, could run circles around you in both SQL and a programming language. Hell, you still think a schema is a program, and I bet you don't even realize how funny that is.]

Note that perhaps we should move the eval debates to EvalVsPolymorphism.


CategoryPolymorphism


EditText of this page (last edited November 25, 2010) or FindPage with title or text search