Discussion of PayrollExampleTwo to avoid making it long, since it already has code.
One immediate drawback I see compared to PayrollExample is that a "power user" (administrator) cannot change the formulas. A Java programmer has to be hired or rented to make any changes. (I will agree that for a bigger organization, that may be less of any issue, but more so the smaller the org.) -t
The client explicitly set a requirement that all payroll updates be made by a "payroll specialist", i.e., me or a member of my staff. However, we did make versions of the system that allowed users to change various constants such as the EI & CPP rates and yearly maximums. Providing updates to formulae, however, was considered too technical for even administrative users, and was both a source of revenue for us and a source of comfort for our users who knew that they were getting a tested and validated payroll system without having to worry about potential (expensive) mistakes.
Understood. I agree they both have their place. Generally I was thinking closer to the SAP model where you have programmers and "configurators". The configurators are sort of half-way between programmers and business analysts. That way they can focus on payroll rules and customer relations without having to learn the in's and out's of Java and compilers.
At one point, we considered developing a scripting language on top of other parts of the production system (it isn't just a payroll program), specifically to make it easier for "configurators" to customise it. However, the business case for this was rather weak, as: (a) our client didn't want anyone outside of a professional development team mucking about with code (for good reason -- the system was used by small, mostly-independent offices to support in-home nursing provision, so there are financial, patient care, and liability justifications for this), and (b) the reality was that any "configurator" of suitable ability would be a programmer working for us in-house, and therefore able to understand C++ (I only used Java for PayrollExampleTwo) enough to code on the business side and use our build system, etc. However, I would be interested in seeing a ProceduralProgramming & TableOrientedProgramming equivalent to my ObjectOriented example, in order to compare the two approaches. In particular, I'd like to see how you tackle the requirement to override certain federal constants and formulae on a province-by-province basis.
I'll work on it. But first let me take this opportunity to rant how bloated typical Java apps are. Contrast by comparing these test stubs:
//Example P-1
Employee julie = new Employee(52) {
public double TC() {
return 26467;
}
public double TCP() {
return 16422;
}
public double Invest() {
return 2000;
}
public int getDisabledDependents() {
return 1;
}
public int getDependentsUnder19() {
return 2;
}
};
Versus:
// Example P-2
e = new Employee(52);
e.name = "Julie";
e.TC = 16422;
e.TPC = 16422;
e.invest = 2000;
e.disabDependants = 1;
e.dependantsUnder19 = 2;
Maybe you have fast eyes that can read verbose code fast. I don't. I like info compact and clean such that it reads almost like pseudo-code. Why people tolerate such shit, I don't know, but I hope it falls off the edge of the Earth and goes the way of the starter crank. We finally got away from COBOL to come back to THAT???? Damn! No wonder scriptish languages are making a comeback. Do we really have to nearly triple the size of our code to get "type safety"? Maybe anal domains need it, I don't know, but I don't want to be part of it. -t
- [This has nothing at all to do with type safety. Both methods above can be made type safe, so attributing the bloat in the first method to type safety is a mistake.]
Whilst Java is verbose, the above is an uncharacteristic example. Remember that Employee is a stub. It has only one member attribute (because its value varies for each test), and the return values of the methods default to what's required to support the majority of tests. That approach means two tests are rather verbose -- due to having to override methods -- but the rest are not. In production, Employee would have the same methods, but would have attributes populated appropriately via lookup from a database, set via constructor, etc. If Java supported properties (like C#), which I wish it did, the syntax for setting attributes could be almost exactly what you've suggested.
And, yes, I can read verbose code fast.
I did limit my criticism to Java, because for one, Eiffel offers shortcuts also.
(Note: I have inserted various "post-edits" to the discussion below to clarify information and add details. -top)
As it is, PayrollExample doesn't have enough "grouping power" to do these kinds of calculations in a streamlined way. But such could be added by making "PayItemID" be non-unique in table "payItems" and perhaps adding a new "payItemGroups" table. This would allow bunches of calculations to be performed based on region-hood. But such could complicate configuration for items that don't need grouping. It's a trade-off between optimizing the design to be a "glue" tool or a heavy-duty rules & math processing tool. At this point I'd probably leave most of the above kinds of calculations in code, whether it be procedural, OOP, heavy-typed, light-typed, spreadsheets, etc.
The best tools are often those that let you mix the best tools (recursive problem here? Hmmm). My approach is an aggregator that allows different tools and paradigms to do what they do best. This reflects a realistic business environment where different tools and sources of info must work together to solve a problem. Some info may come from custom spreadsheets, others from MS-Access, and yet others from "enterprise" languages like Java, COBOL, or Ada. In many cases one cannot control where the info or calculations are done. You are hired to make them work together, not overhaul the company with your Great Master Paradigm. I'd like to see how alternatives address such coordination.
I'm not quite following you here. The example presented simply performs tax deduction calculations. It would inevitably be part of something bigger, such as payroll and cost forecasting, which could be written in Java, COBOL, Ada, C#, VB, VBA, you-name-it. The intent was not to show some "Great Master Paradigm" (whatever that is) at work, but merely to be an example of a real business problem that benefited from OO. I understand you've been looking for such an example for a while.
The specific scenario you stated doesn't seem something that TOP could help with, by my assessment, except possibly to make it easier for non-programmers to set up formulas, which I won't explore further at this time other than set afloat a what-if to ponder and perhaps explore the approaches used in related configuration-centric tools such as SAP Payroll.
PageAnchor: non_top_procedural
As far as whether a non-TOP procedural or OO version is "better", it's hard to tell without an analysis of possible and likely future change scenarios and seems to be a classic case-statement-versus-subclassing debate found in other topics. My guess of future change is that it's close to a wash: OO will not offer any huge effort savings over time, only marginal improvement at best. Canada is not likely to gain a huge quantity of new territories, barring a world-changing event that may do away with or overhaul its current tax system anyhow.
I do see a very rough pattern such that if you have up to a dozen or two "nodes" which are a variation on a theme (such as tax regions), then old-fashioned CASE statements are sufficient and the simplest. Between about 20 to 75, OOP may have a slight advantage[1]. Past 75, TOP is more helpful so that one can sift, sort, print, search, cross-reference, etc. the larger volume of variation nodes. --top
Let's look at using CASE statements. E.g:
// Provincial non-refundable personal tax credit
double K1P(int taxregion) {
switch (taxregion) {
case AB:
double TCP = employee.TCP();
if (TCP == -1)
TCP = 16775;
return 0.10 * TCP;
case QC:
return 0;
case ON:
double TCP = employee.TCP();
if (TCP == -1)
TCP = 8881;
return 0.0605 * TCP;
default:
throw new InvalidRegionException("K1P");
}
}
Is this better? I suppose if the Canada Payroll documents were structured to resemble CASE statements, with the formulae provided in (say) tables organised by factor (K1P, K2P, etc.) and region within factor, then it might make sense. But, the Payroll documents are structured by region, and formulae within region. Thus, the CASE statements are conceptually more distant from the source document than using the OO approach. In other words, it's more effort to translate the source document into CASE statements than into classes.
Are you saying the OO approach happens to be closer to the documentation layout? It may be true, but that's merely a happenstance argument. In this particular situation the documentation may indeed be noun-grouped instead of verb-grouped, and sub-classing may indeed help fit that. I won't dispute such, but a different writer(s) could just as well have done it verb/task-grouped in an alternative universe. I've organized code and/or tables to match the specific layout of source documents also (not necessarily OOP). -t
Yes, the OO approach in this case happens to be closer to the documentation layout, but it's also closer to the real-world, er, noun-grouping (the tax regions), rather than the abstract verb-grouping (the individual factors).
- They are both abstract lines in the mental sand drawn by the human mind for the human mind. And, the documentation "shape" may just be a happenstance of the writer. It perhaps could have been written another way.
- Perhaps. Are you saying you prefer verb-groupings to noun-groupings?
- In general, yes, at least in code because that's what the code concentrates on as its specialty. The DB tends to hold and manage the "noun model". I use code to manage algorithms, not so much nouns. OO'ers tend to try to make the code be both, and it's an ugly force-fit much of the time in my opinion.
- Fair enough, but the example shown here is purely code, and you've already agreed that perhaps TOP is not a beneficial approach in this case. As such, do you still hold that verb-groupings are superior to noun-groupings, for PayrollExampleTwo at least?
- PageAnchor observe_work. As I mentioned elsewhere, I don't know the patterns of mistakes or problem that your group had with the prior CASE version, such as whether they were language specific or paradigm-created or mind-created or domain-created or a combo. Different things trip up different people; mother nature made a vast variety of humans. Full-out observation of editing sessions and interviews with maintainers would probably be needed to tease out sufficient detail about what is going on with the maintainers' minds, eyes, and fingers.
- Addendum: Further, I don't see enough "volume" of sub-classes/nodes/objects in this case to make much difference either way. The example doesn't stretch the limit enough to make different approaches obvious. We'd need hundreds of change scenarios and/or hundreds of sub-classes to see problem patterns stand out. Instead we have subtleties versus subtleties. --top
- Okay, but what about the fact that changes tend to be made on a per-noun basis? For example, this month the Manitoba and Quebec regions might change, in six months it might be Manitoba, New Brunswick and Ontario. The programmer will read the documentation for Manitoba, then go into the Manitoba class and make all relevant changes. Then she'll read the New Brunswick docs, go into the New Brunswick class and make relevant changes, and so on. In a CASE-oriented system, the developer must read the documentation, determine which factors have to be changed, then go into K1P, find the relevant CASEs in K1P, change them, then go into K2P, find the relevant CASEs in K2P, change them, then find V1, and so on. Doesn't that seem likely to be more difficult and error-prone?
- I would imagine that there could be nation-wide laws that could change the same method in multiple regions, such as an upper limit on a certain kind of tax break after it's discovered that too many citizens are abusing that break. The limit would apply to all regions and likely be on just one or two methods. -t
- That would be represented as a method in the base class ('Federal') and invoked as needed in the subclasses.
- Well, okay, not a good example.
- As far as your claim, if you say that most changes are noun-grouped instead of verb-grouped in that shop, then I'll just have to take your word for it because I have no way to verify it without hiring a private detective. Can we at least agree it's a domain-specific or component-specific pattern? I don't see that most changes are all one or the other in general. I see a roughly even mix, but do find CASE more flexible when noun-taxonomies become non-tree shaped, which most complex nouns eventually do. Thus, CASE is a better hedge against future complexity. In this case that's not likely to happen. Thus, one question to ask when picking a design is how stable is the noun taxonomy over the longer run. Somewhere on C2 there's a kind of checklist of changes that muck up polymorphism, but I cannot find it right now. Sometimes changes can sneak up on you. For example, sending a customer response by fax, email, paper-mail or recorded phone message seem like relatively stable response categories. However, what if a customer wants multiple responses? It's a smaller code change with CASE than with sub-classing. -t
- With regard to "roughly even mix", nobody's identified a force of nature that favors one or the other. There may indeed be specific-to-domain reasons why one or the other may be more common over the other, and only observation will discover that. (And in some cases it may be mere coincidence that one is higher historically.) But without further evidence, it's a reasonable assumption that on average noun-favoring and verb-favoring change patterns are roughly even in nature. -t
- I recommend trying an inheritance/polymorphism approach for a while to see if CASE statements truly work better. I think you'll find, as I have, that there are many cases where inheritance/polymorphism works better, especially for the inevitable invariants in the computational domain. For those that don't, you still have CASE statements. Inheritance/polymorphism adds a tool to the toolbox, but doesn't subtract any.
- It's difficult to know what's truly "invariant". At best we can guess a probability of change based on experience and domain knowledge. But sure, I can agree it's one of multiple tools available. I generally agree with the observation that other WikiZen's have made that OopNotForDomainModeling. Computing-space features do indeed seem to better fit/utilize polymorphism in general than domain nouns. Why this is, I can't put my finger on yet. Thus, I do agree that polymorphism works better in some cases. The hard part is knowing the future of changes up-front; we can only estimate based on domain and general experience. -t
CASE statements also increase the likelihood of inadvertently changing formulae in the wrong region. When tax updates occur, they are documented on a per-region basis. This month, Quebec and Ontario might change. In six months, BC and New Brunswick might change. Every six months, at least one region changes. Grouping the factors by tax region, as is done in the OO version, reflects the real world and facilitates making the changes that regularly occur in the real world. A CASE-statement approach does not. It's true that new regions are unlikely to be added (it only occurred once during a decade), but OO doesn't just facilitate adding new regions, it facilitates maintaining existing regions.
There's always going to be adjacent code of some kind regardless of whether you use case statements or sub-classing. "Bump thy neighbor" can happen regardless, so the "wrong region" accident argument can work both ways. I suppose with sub-classing, one is more likely to change the wrong method. As far as "reflects the real world", I'd like more details on that. Taxes are a mental concept, not a physical concept. (For something as sensitive as payroll, I'd hope you have unit tests that would detect adjacent breakage anyhow.)
Taxes may be a mental concept (essentially, but try not paying your "concept"), but the real world structure the system needs to reflect are the individual tax regions. These have a physical reality, for tax purposes, that the system reflects. As for changing the wrong method, I'd argue that's far less likely -- because the methods are visually and functionally very distinct from each other -- than accidentally changing a very similar implementation of a given factor in region MB when you should have changed region NB immediately below it. Yes, there are extensive unit tests to detect breakage, but they won't tell you where the breakage lies, only that the final figures for a given region are wrong. Interestingly, an early version of this system was procedural and relied on CASE statements. Not only was it more difficult to maintain than the OO version, the length of time it took to debug -- due to accidental mis-changes and the like -- was higher than the OO version. Indeed, everyone on the development team agreed that the OO version was a vast improvement in terms of readability, maintainability, and elegance. Switching to an OO approach, in this case, demonstrated all benefits and no downsides.
It's hard to know how much of that difficulty was due to the stupid way the C syntax family of languages implements CASE statements (IsBreakStatementArchaic). C's CASE syntax is so poor that it probably pollutes any such test. In other words, we don't have enough info to separate language shortcomings from paradigm shortcomings. I'd need to see what changes were actually made and what the coder was looking at when they allegedly hit the wrong keys. Your abbreviations may be too short. OO code-unit names using those same conventions of "MB" and "NB" may have caused confusion also. (More on this below.) But if sub-classing works for you in that scenario in that language, then fine. Let success be your guide.
IsBreakStatementArchaic seems, not surprisingly, to be more about the 'break' statement than CASE statements in general. 'Break' hasn't been an issue here, though I'd have no objection to CASE statements that don't fall through. How would you change CASE statements in general to improve over the inheritance/polymorphism example I've shown?
I thought you said they were using a C-like syntax? If that's the case, then C-style syntax is indeed a possible or contributing factor. It's a bad syntax design. That being said, I don't have enough details about the psychological (WetWare) causes of the errors your staff makes, and thus cannot work around their mental barriers without guessing. Like I said elsewhere, editing the wrong case row is not a high-frequency error of mine compared to other errors. I fix what's broke long before I fix what's not broke; and case slot confusion is not a real problem for me. Your mind/fingers may differ. And again the flip side is accidentally changing the wrong method which it should be weighed against because methods don't float in space all by themselves.
Surely an improved CASE statement would apply to everyone, not just my staff and their mental barriers? Or are you saying that apart from 'break', CASE is good enough, and we don't need inheritance/polymorphism-based solutions?
I believe the utility of polymorphism is exaggerated. As far as fixing CASE statements, first I need to know what is wrong with them, and whether it's universal or a brain-specific thing. And we'd have to compare to polymorphism-related errors, such as editing the wrong sub-class or method.
Perhaps the utility of polymorphism is exaggerated. That is a fair hypothesis. Can you suggest how we might quantitatively and objectively test it?
As for fixing CASE statements, didn't you write, above, that it's "hard to know how much of that difficulty was due to the stupid way the C syntax family of languages implements CASE statements"? As such, do you have suggestions for improving them (other than the obvious step of removing fall-through) and thereby eliminating that difficulty? Or are you proposing that there is some, as yet unknown difficulty that could be resolved within something that is still recognisably a CASE statement without going as far as inheritance & polymorphism?
Further, the CASE grouping makes it easier to
spot similarities and differences in algorithm patterns for the different regions. It may help one recognize a flaw and/or suggest re-factorings. If they are all spread apart, patterns are not readily visible. One can see that "AB" and "ON" have similar algorithms. In larger code sections, this ability grows more valuable. -t
I don't disagree with that. However, I'd argue the maintenance benefits of having the regions in separate classes outweighs the benefits of having them in CASE groupings merely to identify possible refactorings. Such refactorings can still be identified if one decides to go looking for them.
- Payroll may be a situation where error prevention is more important than code reading/analysis economics. I gravitate toward projects where nimbleness and resource control is valued over error-prevention. Often this is because I do a lot of UI work and the user doesn't know what they want until they actually use it for a few hours. Thus, many ideas are thrown away such that one must crank and dump quickly. Groking the code quickly is important for that. And, editing the wrong case block is not near the top of my list of difficulties or mistakes.
- Couldn't one select and edit the wrong region sub-class? What stops that from happening? At least with the CASE version, the region abbreviation is right next to the cursor and edit spot, not up a page or so.
- It's possible, but given that each region subclass is clearly documented within its own file, the file is the same name as the region (rather than an abbreviation, though this could obviously be addressed by not using abbreviations), the class is the same name as the region, and each region subclass is fairly different from others and can be visually distinguished as a unit, it strikes me that this is less likely to happen than with a collection of CASE statements.
- So if the regions are not overly abbreviated, then it would be a non-issue? It's possible to accidentally open the wrong file and not know it because the methods look very similar. I've made a cross-file mistake just today because each file's contents were very similar. One was already open in the code editor when I opened a second one. After a small distraction, I came back to the editor and selected the wrong file to edit. Yes, I should have double-checked, but that applies to all maintenance. And I may have spotted the mistake if the item name was right there instead of at the top. (It wasn't classes, I should point out.) Thus, "changing the wrong one" based on visual cues either favors CASE or is a wash. -t
- I suppose the only way to fairly and objectively resolve this debate would be to experimentally test a series of maintenance actions against the OO version and an equivalent CASE-based procedural version, and measure error rates, implementation time, and developer perception.
- Yep, analysis and observation. What we can do here in this topic is raise some questions and provide angles to investigate; but we cannot provide the final answer. The first step to a smart answer is asking smart questions. --top
- But we're talking about payroll here, aren't we? And, by extension, any business-critical code, yes? So isn't bringing up UI code somewhat non-sequitur?
- It's good to make sure the scope limits of an analysis are clear.
Let's not forget that if we cast aside all the OO philosophy, inheritance and polymorphism simply provide an implicit CASE mechanism that makes it easy to maintain each tax region's calculations in distinct, independent, region-based units.
- ...at the expense of scattering verb-ness. In the future I hope that the visual grouping will be fluid such that it can be presented to the programmer/reader noun-wise or verb-wise as they choose. They each have their advantage. TOP is about the closest thing I've seen to such ability because the presentation is based on your query and/or TableBrowser clicks, but the existing tools and conventions of the industry are still stuck in linear-text-land. It's time to dump tree file systems for development. -t
- Is "scattering verb-ness" bad? Isn't it just as bad (or something) to "scatter noun-ness"? Regarding fluid visual grouping, some years ago I ran across a project to create a syntax-free language. I don't mean it had no syntax, but that the syntax was free to be represented however the user saw fit, based on an internal representation that could be translated into any number of user-oriented representations. Unfortunately, I don't remember the name of the project, as it was quite unusual but not particularly memorable.
- All scattering can be "bad". Ideally we'd want to be able to present them together or apart as needed for the situation, as described in SeparationAndGroupingAreArchaicConcepts, RunTimeEngineSchema, and TableOrientedCodeManagement. If one tries to make something like you suggest to help with this issue, I believe it would run into GreencoddsTenthRuleOfProgramming. Meta-tizing programming code enough will generally lead one into database and collection-oriented concepts. A big wad of pointers is just too hard to manage and study (except for rare savants). Machine efficiency is about the only reason to keep them. -t
- Sorry, I have no idea what "meta-tizing" is, or what it -- or anything else on this page -- has to do with a "big wad of pointers". You appear to be writing random, seemingly-irrelevant speculations about programming in general rather than presenting a logical argument about the topic at hand.
- As far as "wad of pointers", you mentioned "visual programming" to create a "syntax-free" programming language. Based on my observations of visual tools, they tend to be graph-based, and this is why I used the phrase "wad of pointers". Graphs don't scale well as far as managing complexity for similar reasons that Go-to's fell out of favor. It is true that they may be fine on a smaller scale and one would need to look at the details of the domain to see if it's the best tool. Still, internally the info in the nodes needs some representation, and if they are in RDBMS tables, then they are much easier to tie into the rest of the business system, and one can later change the front-end representation if need be. My experience is that end-users relate well to a spreadsheet-like interface such that a graph editor may be unnecessary outside of algebraic-style formulas (AbstractSyntaxTree). -t
- I didn't mention visual programming. Please re-read what I wrote.
- I interpreted "fluid visual grouping" and "syntax-free language" to mean visual programming. If not, then I request clarification.
- A syntax-free language is not bound to any specific syntax. It has an underlying model that represents program semantics which may be presented using one or more views. A view is defined as a set of mappings from semantics to specified view elements such as keywords, literals, identifiers, grammar rules, etc. Whilst the views could involve visual programming, they could equally be text. They need not be visual in the GraphicalProgrammingLanguage sense. There was a quintessential example on SourceForge a few years ago, with a crude but working kernel. Unfortunately, the project -- which had a peculiar name that completely eludes me now -- seems to be dead and gone.
- Anything computer-processable or even human-readable requires some representational "language" or "interface". Nothing usable is representation-free. Removing quotes and parenthesis by itself doesn't necessarily achieve the stated goals.
- It's "free" as in "freedom", not "free" as in "absent". I.e., the semantics are not bound to a specific syntax, but are free to use a number of syntaxes.
- Just about any compilable/interpretable language can be re-translated into any other language. But that ability alone is only a partial solution to the problem at best. Ability to easily add or tie into meta-data is another feature that helps the goal of re-projecting one's view of the programming "code".
- As far as "meta-tizing", this is illustrated by looking at how my coded example does not hard-wire the payroll formulas. A power-user or administrator can create and change the formulas via CrudScreen(s). They can also list them out without having to print out programming source code, assuming we programmed in reports or gave them read access to the database tables so that they can use off-the-shelf report writers to make their own formula reports. It offloads much of the specific business logic handling from programmers. Related: CompilingVersusMetaDataAid. -t
- User-editable formulae are dismissed in the first few paragraphs of this page. In particular, "[t]he client explicitly set a requirement that all payroll updates be made by a 'payroll specialist', i.e., me or a member of my staff." So, end-user computing is precluded. Forcing its inclusion (a) suggests that we might as well handle all of this with (say) MicrosoftExcel and thus this debate is moot; and (b) any sufficiently advanced end-user computing is indistinguishable from software development, which brings us right back to where we started. Either way, it's irrelevant here.
- As I interpret this sub-section, we were drifting into the future of tools and the bigger issue of noun-versus-verb grouping rather than the specific example. As far as Excel, it is too open-ended for something critical to a business. Thus, I offered a "guided spreadsheet" of sorts, something that's in-between the free-form spreadsheet and hard-wired formulas. As far as your point "(b)", tools optimized for domain power users will generally be different than those optimized for career programmers, who have more time to explore the benefits, consequences, and gotcha's of different paradigms, interfaces, indirection, long-term maintainability, etc.
- Still, irrelevant here.
- The sub-section? Sorry, I have to disagree. Perhaps, though, it should be spun off into another topic related to end-user domain programming.
- This section was about something you called "scattering verb-ness". Whatever that is...
- Would you like clarification on "scattering verb-ness"?
- Sure.
- Traditional programming groups by "task", while OOP tends to group by nouns. This means that the code for a particular task is spread among different "sub-types" (assuming such division/classification is appropriate). By grouping by noun (or "types" of nouns) we are de-grouping task-ness, scattering the code for a given task among different sub-classes. It's a forced trade-off caused by the use of linear text. I hope in the future our tools can help transcend the limits of traditional text and hierarchical file systems by making grouping be virtual. See SeparationAndGroupingAreArchaicConcepts for more.
I will agree that perhaps sometimes the problem is best solved using a single tool or language. The decision devil is in the specifics. Maybe Canada's payroll rules are not sufficiently complex or dynamic enough to test a multi-tool/multi-source scenario where TOP may shine more. I hinted at a sales-commission engine that may be need to be integrated on top of hourly payroll. Suppose it is canned software such that integrating it directly into your Java app is not entirely practical. If we have a lot of these kinds of things, then the utility of a multi-source example may start to become more recognizable.
I set out to solve an integration problem, not merely a tax computation problem. --top
The intent was that you re-implement PayrollExampleTwo using TableOrientedProgramming and/or ProceduralProgramming, not (necessarily) implement it using the PayrollExample framework.
- Long formulas are not a place where TOP really shines. That is unless a way can be found to re-project them nicely into TOP, which is not coming to me so far. As it is, it wouldn't demonstrate anything that I'd call "better" for that portion. PickTheRightToolForTheJob.
- Where, then, does TOP really shine compared to the alternatives?
- Managing lots of relationships and attributes, especially if they change fairly often. If you only have five "variations on a theme", such as provinces, and they don't have a lot of interconnections into info sets, then text code may be the way to go. In the US where there are 50 states and if many of them share a sub-set of calculation features, then it may start making sense to manage at least some of that via tables that control which state gets which sub-formula. Another case may be where the regions are at least partly power-user-configurable. It may be easier to teach them to fill in tables with attributes and simple formulas than to teach them to write Java code for the same. Dedicated coders may help them by building custom functions from time to time. That way power users or "configurators" may do 80% of the "programming", and dedicated coders the other 20%. SAP kind of follows this model.
- Whether the number of tax entities is fifteen (the number of distinct payroll tax entities in Canada is fifteen, as I recall -- ten provinces, three territories, outside of Canada, and exempt {I think... Don't make me go look at the original source code!}), fifty, or five thousand wouldn't make any difference to the methodology I employed in PayrollExampleTwo. How would TOP better handle fourteen, fifty, or five thousand tax entities presuming they follow the pattern exemplified by PayrollExampleTwo? I can appreciate that a user-requirement to edit formulae may push for data-driven programming (of which TOP may be one option), but you've claimed that "TOP really shines" in such cases. How would it "really shine" over and above code, or data-driven alternatives to TOP such as XML configuration files?
- I'm not saying for sure it would shine, for one has to look at the patterns of same-ness and difference to see what will work best to manage then. If all the States are too different from each other to bother trying to share and manage commonalities, then TOP probably won't help. If there sufficient commonalities, TOP could help us manage all of those. I'm not sure if I can turn it into verbal rules so readily, but I'll continue to think about how to verbalize it.
- As far as the XML, there's two problems with it:
- 1. XML is awkward to read and write for most power-users, or even developers without a decent editor. Data grids and other CRUD UI idioms are usually better for that.
- 2. One cannot as easily search, sort, cross-reference, filter, etc. the XML (DatabaseVerbs). This is useful for studying, verifying, and debugging patterns and settings. Text code often pushes certain factors or relationships far apart. If we can re-project our view of the info in an ad-hoc way, then we are not stuck with this problem. Sure, there are query languages for XML, but why not use the existing RDBMS that most biz systems already use? And, XML is navigational, not relational. I thought navigational lost the war in the mid 1980's. Why is it trying to come back from the grave? Fix the shortcomings of relational rather than toss it out if it is missing features.
- Sorry, I didn't really intend to kick off a debate over XML's pros and cons. What I'm trying to grasp is how TOP is superior to other approaches to conventional programming. But, now that I think about it further, I'm really just trying to grasp what TOP is. Note that I'm no stranger to the RelationalModel. I've been developing database-driven business applications since the mid-1980s and I'm the author of the RelProject (TutorialDee being perhaps the quintessential example of, shall we say, RelationalModel-oriented programming), so integrating and promoting the RelationalModel is what I do, and I understand it well in terms of managing data and supporting application development. However, I hardly consider that a nameable paradigm like "TOP" or anything else! It's simply sound business application development, in which the DBMS supports server-based data storage, retrieval, management and constraint-handling, and the client-side receives user input, interacts with the DBMS, and presents output to the user. What then, is TOP?
- I gave a runnable example in PayrollExample. One could do the same thing in procedural-only or OOP. And you seem to being trying to turn RDBMS into a "type-base" instead of a database. In that sense, you view databases differently than I do. -t
- I certainly advocate use of types in databases -- they're handy, for example, for preventing simple-but-common blunders like joining employee IDs to store IDs -- but I assume that's not what you mean. Could you explain what you mean by a "type-base"?
- By the way, you still haven't explained what TOP is. To me, it seems to be:
- a) The notion of "data-driven programming" (a buzzword from the late 70s and early 80s) which advocates reducing the quantity of changeable constants and structure in source code and moving them to external user-editable repositories, wherever it is reasonable to do so without turning them merely into equivalent source code in a different location.
- b) The external user-editable repositories in (a) should be tables/RelVars in a RelationalDatabase.
- If that's what TOP is, then it's eminently reasonable, and indeed it is how most business-oriented data-processing applications (including most dynamic e-Commerce and entertainment/social Web sites) are currently built, regardless of the client language. In the late 80s and early 90s, there was still considerable debate about the future of RelationalDatabases. ObjectOriented databases were seen as cutting-edge, Relational anything was regarded as old, dinosaur-like, etc., despite the popularity of relational DBMS-oriented development tools like PowerBuilder. A few years later, roughly by 1997, the popularity of dynamic Web sites driven by SQL databases had pushed OO databases back into niche fields, and the RelationalDatabase was ubiquitous technology. As an approach, TOP won. The battle is over, your side captured the flag, you can take off your armour and weapons and go home. However, you inadvertently deprecate the power of the approach by still treating it as being in some sort of competition with OOP. There is nothing about TOP (or OO) that makes this true. Rather, it would be better to treat TOP simply (and powerfully!) as a general guideline for developing software when using OOP (and FP) and procedural languages.
- The production code I see tends to be "TOP-lite", not really going full-bore into table land. But in part this is understandable because the tools don't support it very well, geared toward BigIron efficiency in data-chomping instead of attribute management convenience needed for economical declarative programming. -t
- I agree. I suspect there'll be an increasing trend toward implementing industrial-strength versions of languages like TutorialDee, which endeavor to seamlessly integrate the RelationalModel into general-purpose programming. I've noticed increasing interest in my RelProject from Web-development savvy students eager to become early-adopters of what they consider to be an inevitable direction.
- We probably have similar goals, but different ways to achieve it. You want tighter integration between app language and the DB's type system, while I prefer a service-oriented approach that doesn't get too caught up in types. But we already battled that out in other topics without a consensus.
If I do add grouping, it may make an interesting exercise, but won't necessarily show any advantages that I could identify to readers. Some portions of tax calculations are simply best implemented as textual code (although I personally don't like your coding style). My framework allows us to take this into consideration, and to me that is one of the benefits of my payroll framework. Maybe another portion is best done by custom spreadsheets because they involve fast-changing business requirements, and my framework can accommodate that also. -top
Is your PayrollExample a definitive illustration of TableOrientedProgramming? What don't you like about my coding style?
- I'm not sure there is such a thing as "definitive". It handles many things better than Martin's approach (although that may depend on the environment its used in). As far as your coding style, good code reads almost like pseudo-code in my opinion, but yours has too much of what PaulGraham might call "scaffolding", a kind of code-based bureaucracy. This scaffolding makes it difficult to easily grok the code that does the real work. You admitted you can read verbose code fast, so it may not slow you down. Those of us who are not verbosity speed readers may be slowed down by it. PsychologyMatters.
- Interesting. I'd argue that all the code does the "real work", so I'm not clear what you mean by scaffolding. Could you explain?
- Here, I call "real work" code that directly pertains to the domain problem space, rather than code that manages our internal implementation. When the ratio of real-work-code to scaffolding grows too low, then it's usually a design smell. However, I will grant that different domains or apps have different requirements in this matter.
- Could you point to specific examples of scaffolding? From my point of view, it all pertains to the domain problem space. It's all "real work" code. Of course, I invite you to re-implement it using whatever you like if you feel it can be better implemented.
- Having to specify that methods such as K1P are "double" multiple times. Ideally it should only have to be declared as double just once. (I'm sure a language change could fix this.)
- That's a language characteristic rather than a criticism of OO per se. Note that return types must (or at least should) be specified both in the function declaration and the function definition in procedural C, for example. Indeed, I'm sure a language could be designed that would eliminate all seemingly-redundant keywords.
- There's only one return type needed for a group of statements, not each case row. Note that many dynamic languages don't need return types, and are thus often less cluttered with formalisms that slow reading. But the payroll domain is probably not a place for dynamic languages. Getting it right is more important than reading it quickly.
- I'm not clear what you mean by "[t]here's only one return type needed for a group of statements, not each case row." Explain?
- I believe the total number of repetitious type declarations would be smaller with a CASE version, but don't feel like testing that hypothesis just yet.
- Too much set/get bloat. It harms readability, at least to my eyeballs.
- That's a criticism commonly levelled against Java that is essentially solved with C#'s Properties. In Java, you get in the habit of creating getters & setters to publicise private class attributes. You get used to them after a while, i.e., they don't seem like bloat, but if you don't normally develop in Java I can see how they could harm readability. C#'s Properties provide syntactic clarity and simplicity equivalent to public class attributes with all the benefits of having getters & setters surrounding private class attributes.
- Anything that makes the code unnecessarily longer is going to harm readability even if one "get's used to it". It's less "real" information that can fit on a screen/page. The more one has to scroll back and forth to see related information, the more one is slowed down and risks loosing their train of thought. I don't have a photographic memory, and like to see stuff together at the same time.
As
MentalMasturbation, I've also kicked around ideas for a "set-based formula engine" that allows sharing of formulas based on set theory instead of inheritance. It could do combos that would be difficult or messy to replicate using "traditional" inheritance. -top
PageAnchor: Calc01
Here's a table that could map features and formulas to provinces/states:
Prov./St.|Feature-A|feB|feC|feD|Formula-A|ForB|Etc...
---------|---------|---|---|---|---------|----|------
Califor..|.........|.Y.|...|.Y.|foo+bar()|....|
Texas....|....Y....|...|...|.Y.|.........|....|
Nevada...|.........|.Y.|...|...|A*B-C....|V(x)|
New York.|....Y....|.Y.|.Y.|.Y.|fed-hls..|....|
Vermont..|....Y....|...|.Y.|...|.........|....|
Etc......
"feX" is an abbreviation for "Feature-X", where X is a letter. A similar shortcut is done for formulas. These abbreviations and letterings are merely done to simplify and size the illustration. More meaningful names would likely be used in practice. (Dots to prevent
TabMunging.)
We could have it all in one table, or for more flexibility (and complexity) we could split out the entities and use many-to-many tables to link them:
table: province_state // province or state
----------
ps_ID
ps_title
table: ps_features // features of province/state
------------
feature_id // could be an integer or abbrev.
feature_title
has // Boolean
implementation // optional
table: ps_feature_links
----------------
ps_Ref // f.k. to province_state table
feature_ref
table: formulas
---------------
formula_id
formula_title
table: ps_formula_links
---------------
ps_ref
formula_ref
Oh my. I suggest you try it with PayrollExampleTwo.
Again, I don't claim it would make example 2 "better", other than possibly help the grade of the "non-programmer configuration" feature. As-is, it would only serve as a "teaching guide" at best, and a failed experiment at worse.
I've seen a packaged production software product (outside of ExBase) that kept formulas in tables not to too different from PayrollExample, although they were broken down to their elementary level of two operands and one operator instead of expressions, similar to an AbstractSyntaxTree, although more "linear" and reference-based than tree-like. It was an IBM AS/400 billing system for civil engineering firms from a company called McCosker? (as I remember it). Generally a power-user(s) created the formula templates for the non-power-users (billers) to use. -t
Attempted Summary
Here is my attempt at a summary of this discussion. Feel free to describe alternative interpretations.
Approach A may result in human errors/difficulties G, H, I, etc., and approach B may result in human errors/difficulties J, K, L, etc. However, without "lab" or field observations, it's difficult to verify which are actually more common, which are the more costly errors, and how much variation there is between individuals.
--top
Footnotes
[1] In my observation, when you start to have a lot of variations on a theme, such as maybe cities, then there is often a lot of similarities among them such that they don't fit a clean pattern of one algorithm per one "sub-type". The pattern becomes much more complex and "fractured". This particular example doesn't have enough variations to see how the pattern scales out in practice. I often lean toward using TOP to manage the sub-variations. See Example "Pete-82" in HofPattern for a related discussion on patterns of variations. -t
PayrollExampleTwo exemplifies the pattern you describe. Commonalities go into the base class, differences go into derived classes. This can be extended to arbitrarily-complex hierarchies, with delegation (or multiple inheritance, where available) handling even non-hierarchical commonalities and differences. If you feel TOP can do this better, I invite you to start demonstrating its efficacy by converting PayrollExampleTwo to a TOP equivalent.
DeltaIsolation can grow into a sprawling mess and is arguably unnatural because we generally don't think about things that way in our head, at least not on a large scale. It's essentially hierarchical instead of set-based, meaning it scales ugly. The TOP equivalent is roughly that blank "cells" get the default (common) behavior, by the way. PayrollExampleTwo does not have enough variations and no "nested variations", and thus does not make a very good example either way. It doesn't "stress test" the patterns.
I see. Then I'll have to take that as meaning polymorphism and inheritance are superior for handling situations like PayrollExampleTwo. Fortunately, the majority of my work in the past three decades (or so) has been more like PayrollExampleTwo than not, so I'll keep using polymorphism and inheritance instead of TOP until a suitable demonstration or two gives me enough evidence to consider changing.
I didn't agree to that. I only agree we don't have enough dissectable info about the change patterns of the said domain to make an informed determination. It may be that in that domain poly is better. By the way, enough others have agreed that OopNotForDomainModeling such that I don't feel that compelled to fight that fight anymore. -t
Indeed, I wrote a goodly bit on OopNotForDomainModeling and OopBizDomainGap, and if I recall correctly, several other related pages. I guess we'll have to forgo seeing any working benefit of TOP, then? As for the change patterns of PayrollExampleTwo, they are explained in PayrollExampleTwoDiscussion -- some individual provinces change every six months, and the federal calculations can change every six months.
I can see patterns in that example that could make use of TOP if there were more or larger occurrences, such as range tables. In bigger systems/orgs, power users will want to edit those ranges without calling a programmer, and tablizing them would facilitate that. And the add/subtract sequences (A + B - C + E - F, etc.) could also perhaps be tablized. It would almost be like building a domain-specific spreadsheet processor, kind of a more constrained version of PayrollExample (one).
Please, build it so we can see it in action for PayrollExampleTwo, which is real business code unlike PayrollExample. Surely if it works for "more or larger occurrences, such as range tables" (what are "range tables"?), then it can work -- and be easily implemented -- for the relatively simple PayrollExampleTwo.
It would be overkill for PayrollExampleTwo as it currently is. It would be of no net benefit unless there is a need for non-programmers to change the values/constants. A range table is something equivalent to code in the example that resembles "If a<300 then x1() elseif a<450 then x2() elseif a<600 then...". I worked in a postal rate calculation system that used many range tables once. Maybe I'll kick around the idea some as MentalMasturbation ("Tablebation") and let you know what I come up with.
Yes, I'm sure it would be overkill for a production application, but its simplicity could make for an excellent example. It's not unusual for CPP/QPP and EI rates to be changeable by non-programmers, so you could illustrate that. By "range table", do you mean something like this?
// Additional tax on taxable income (Ontario Health Premium)
double V2() {
if (A() <= 20000)
return 0;
else if (A() <= 36000)
return theLesserOf(300, 0.06 * (A() - 20000));
else if (A() <= 48000)
return theLesserOf(450, 300 + 0.06 * (A() - 36000));
else if (A() <= 72000)
return theLesserOf(600, 450 + 0.25 * (A() - 48000));
else if (A() <= 200000)
return theLesserOf(750, 600 + 0.25 * (A() - 72000));
else
return theLesserOf(900, 750 + 0.25 * (A() - 200000));
}
That's taken from PayrollExampleTwo.
It's not overkill for all applications, only for this one, unless there is a stated need to make the constants non-programmer-changable.
Sorry, I didn't mean TOP was overkill for all applications. It is perhaps overkill for this one, but as stated above, its simplicity could make for an excellent example. It's not unusual for CPP/QPP and EI rates to be changeable by non-programmers, so you could illustrate that.
I'm trying to find a TOP design that's generic enough to flex yet not too confusing and/or has too many layers of indirection. So far I haven't, but will keep kicking around draft ideas.
Cool. I look forward to seeing it when it's done.
Dammit, you tricked me into thinking about such gizmos :-) Now various designs keep involuntarily spinning around in my head like an Abba elevator tune.
What my drafts are looking like when I generalize them tend to resemble the back end of what would or could end up being very similar to "graphical business rule" tools, as can be found at: http://decisions.com
How well such tools integrate with existing systems, I don't know. (Related: BusinessRulesMetabase.)
If you want to make the constants power-user-editable but specific to the existing system, then assign constant names in a ConstantTable of some sort. The entry form could resemble:
If A <= [20000] then
result = [0]
else if A <= [36000] then
result = theLesserOf([300], [0.06] * (A - [20000]))
else if A <= [48000] then
result = theLesserOf([450], [300] + [0.06] * (A - [36000]))
Etc...
The "[.....]" represent input boxes. I can envision "in between" designs in terms of variations between a "generic" formula engine and something "hard wired" like above that only allows changes to the constants instead of new ranges, new formulas, etc.
It would be trivial to replace the constants with variables, or an array of elements to allow for an arbitrary number of conditions, and parametrise them so they can be edited and retrieved elsewhere in the program. I presume that's not what is meant by TableOrientedProgramming, however. Or is it?
There are various design approaches, each with various levels of what I'd consider TOP. For example, we could put the formulas in range tables:
Group | ...From... | .....To..... | Formula
-----------------------------------------------
..V2..| ......0.00 | ....20000.00 | 0
..V2..| ..20000.01 | ....36000.00 | lesserOf(300, 0.06 * (A() - 20000))
..V2..| ..36000.01 | ....48000.00 | lesserOf(450, 300 + 0.06 * (A() - 36000))
..V2..| ..48000.01 | ....72000.00 | lesserOf(600, 450 + 0.25 * (A() - 48000))
..V2..| ..72000.01 | ...200000.00 | lesserOf(750, 600 + 0.25 * (A() - 72000))
..V2..| .200000.01 | 999999999.99 | lesserOf(900, 750 + 0.25 * (A() - 200000))
(Dots to prevent
TabMunging) An "eval" would be used to process the formula. This approaoch is programmer-friendly, but may not be power-user friendly. There are different ways to make range tables, but since they are usually
published as "from/to" charts in my experience, I chose to use the from/to approach here. An alternative is to specify only the "floor" of the range to avoid the ugly "99999999.99" thing. The floor approach would be similar to only having "from", but could be "higher than" or "higher than or equal". The query would then select the lowest "match". But I'll leave those tradeoff choices to the reader, for cases can be made for each.
If this approach is considered programmer-friendly, but not power-user friendly, I have two questions: (1) As a programmer-friendly approach, how is it better than encoding the above in source code? (2) As a non power-user friendly approach, given that the goal is to expose the formulae to power-users, how could you change it to make it more power-user friendly?
I find the tablular approach much more readable than the case version, at least to my WetWare. If I followed WorkBackwardFromPseudoCode, I'd start with a similar table, and then case-ify it if I cannot find a practical way to implement it as a direct table.
I said the above wasn't "power-user-friendly" largely because it doesn't protect them from typos etc. To make it more PUF, we'd probably have to make a controlled UI around it. To do so, we could have the following tables:
ranges (table)
--------------
rangeID
groupRef // such as the "v2" group
from
to // perhaps optional, see notes above
formulaRef // f.k. to formula table Ex: "v2x"
parameters (table) // AKA "constants"
------------------
paramID
rangeRef // f.k. to "ranges" table
formulaVar // a letter code defined by the formula
paramValue // double-precision
formula (table)
------------
formulaID // Ex: "v2x"
descript
implementation // [design choice] actual formula if using EVAL approach
userNotes
formulaParamDict (table) // parameter dictionary
----------------
formulaRef
varName
varDescript // prompt text
type // integer (see note), number (double), yes/no (1,0)
// potenetial validation columns, not shown
The formula would be associated with each given range row. The parameter values (constants) would then be entered via some kind of data entry form. Note how this is edging closer to the what I callded "the back end of what would or could end up being very similar to graphical business rule tools" above. However, the formulas themselves would still probably be hard-wired, but the parameters (constants) would be alter-able by power-users. Our sample formula may be defined something like this in the back-end code:
function v2x(p[]) {
return lesserOf(p['a'], p['b'] + p['c'] * (A() - p['d']);
}
However, the choice of using formula "v2x" could be in a pull-down list for the front end targeted to power-users. Thus, power users could:
- Add a new range row
- Select among multiple existing formulas to apply (such as "v2x")
- Supply the constants associated with the formula instance for a given range row
In our particular example, all 6 "case" rows can use the same formula, but in the above I'm
not assuming this will always be the case (no pun intended).
Note that for the "integer type" in the formulaParamDict table, this is a virtual type in that the destination field is actually Double (or maybe Decimal if the tools/languages support it). Checking floating-point values for such can sometimes get sticky (RealNumbersAreNotEqual). A possible validation operation for integer could resemble:
if (round(fld.foo, 4) <> round(fld.foo, 0)) { // validate for integer
frm.addErr("Foo can't have decimal places");
}
The second parameter of the "round" function is the number of decimal places to round to. The value of "4" may not be appropriate for certain application uses.
--top
See also: CompilingVersusMetaDataAid
Category CategoryConditionalsAndDispatching, CategoryPolymorphism, CategoryExample
AprilZeroNine SeptemberZeroNine JuneThirteen