Just Make It Right

"I order you to violate the laws of math and physics!"

A situation where there is complexity that the user does not understand or want to understand, but the complexity cannot be removed in the simple way they hope. They might shout, "Just make it right!" and leave the room. One is faced with a difficult choice of implementing an ugly fudge which makes you look bad, or forcing the user to understand and still making you look bad by being perceived as a "pest".

For example, one client wanted a list of integer percentages that added up to 100 percent. In practice, the numbers were decimal, but if you rounded them then they would not always add up to 100 percent.

Before rounding:

40.4 52.4 7.2 ----- 100.0

After rounding:

40 52 7 --- 99

The client did not seem to "get" this issue and just wanted the integers to add up to 100. The programmer eventually built an elaborate scheme to fudge the results by "bumping" up or down one of the numbers closest to 0.5. For example, they may make the first integer into "41" instead of "40".

Actually, it was the programmer that didn't "get" the issue. In the real world, percentages add up to 100. It's not a lot to ask and should never have come up.

I am not sure what you mean. They did not request it to be "fudged". They just wanted it to add up. They had "no time" for a long explanation. If you fudge, you risk similar problems down the road.

Programmers have been baking in this fudge with more or less elegance for as long as I can remember, and that's a very long time--the first time I saw it, the code was in COBOL65. More digits of precision don't solve the problem--you have to assign the error to one or more of the elements. Picking which, how much, and in which direction is fun. Note that with a long list, the error can exceed one percent plus or minus. This is one of the platinum examples of the rule that making one thing simple means making something else more complex. (Programming is fun--more programming is more fun) --mt

The "problem" is that if and when somebody checks it against the detail numbers, they discover the fudges. It's hard to write up an explanation for the inquisitive without making the managers who wanted the fudge look stupid (which they are). It can create sticky office politics. The best solution overall in my opinion is to just footnote the number ("100*") and explain that it may not add up due to rounding, and that if they want to see the full (unrounded) list, they can call 1-800-EAT-SHIT. Okay, I'm kidding about the that number, but the idea is to give the reader a brief explanation along with a reference/link to specifics if they want to "see for themselves".

I have encountered similar issues about rounding significance where the client being billed thought that we were using rounding to rip them off. We had to fudge the rate tables to round down instead of rounding up to avoid such accusations. for example, we would change the rate to 39.9999 instead of 40. We willingly took a loss of a few dollars over the longer run. This one was easier to solve than the above.

I was surprised that our business analysists didn't understand this quirk of rounding but after a few days of reflection all agreed that the percent of ownership of three equal investors would be correctly reported as follows. This corresponds to doing the rounding as close to the glass as possible which is my advice to programmers. -- WardCunningham

33 33 33 --- 100

Just hardcode '100' for the final result

But the hard part is what to do with the displayed portions.

Display them as is. Incorrect math is incorrect.

You seem to be naive about the "power" of the PointyHairedBoss.

Many moons ago, I encountered what you may consider a better solution to this problem. It's called "Distributed rounding." The idea is that instead of rounding down below int+0.5 and up above int+0.5, you choose a fractional threshold to make the sum remain the same. If that won't work because several numbers have the same fractional part, you round the first specimens up and the others down or vice versa. So in this case, the threshold would be .33333... and the result would be

33 33 34 --- 100

Yes, the three equal values become unequal, but the sum balances.

-- AndrewKoenig

If the problem were to distribute funds to share holders then, yes, some solution like you suggests is appropriate (unless other solutions are called out in contracts.). The guiding principles would be:

distribute all the money
make the distribution as fair as possible

If the problem is distributing funds to shareholders, then what to do ought to be specified in exquisite detail in the various contracts involved. Yes, people get picky about money.

In our case we chose the alternative we did because it captured the two most essential characteristics of the situation from the business perspective:

all investors have made equal investments
their investments comprise the whole

-- WardCunningham

Another rounding example: an application converted various kinds of measurements to other forms. For example, miles-per-hour to feet-per-sec, and then feet-per-sec to cm-per-sec. When the reverse operations were performed (cm-per-sec to ft-per-sec and then to MPH), the result was often not the same as the original value, but the client wanted it to be the same. The fix here was to keep a list of "typical values", like 45 MPH, 55 MPH and 70 MPH, and if the result was close enough to one of those typical values, then it was "rounded" to that value. (Yes, there are better solutions to this problem, but given the low accuracy of the numeric representation and the low horsepower of the machine, this one seemed best.)

I am curious, what would be the "better solutions" if hardware was not the issue?

Keep the original value and units, and never store a converted value. For each reading, keep a list of converted values and their types, and use a stored one instead of re-calculating anew. Use fractions and arbitrary-precision arithmetic, etc, etc...

That can greatly complicate things. Normalizing units reduces conversions and improves debugging IME. Then again it may depend on who the end-user is. I am surprised this happened anyhow. If you display something back to non-engineering end-users, it's usually rounded to, say, 2 or 4 decimal places. If you store it with sufficient precision (say double-precision), you shouldn't normally get back things like 49.99 MPH if you convert back from, say, kilometers-per-hour. It should round to 50.

I worked on some process automation devices that used calculations based on - wait for it - values stored as text. They couldn't understand why the accuracies were way the heck off and why the CPU spent so much of its life just trying to convert the measured quantities into "displayable" units - five character text strings - and then back again to do the next step of calculations. In point of fact, the actual numeric values were never stored anywhere in the box other than the UI text values. Yowza!

Unless you are doing some heavy-duty simulations or something, it seems like it is usually not worth it to worry about the machine speed of such issues these days. I stopped focusing on such generally about the time that 386's came out. The only places that seem to worry about that are shops who can't afford to upgrade because of the expense of, say, a Sun box. Windows and Linux shops are more likely to upgrade it seems. Sun shops often get stuck in 1986.

Uhh, it kinda sounds like you are slamming Sun shops for some reason. Your experience is completely different from mine. But what the heck - I've only been doing this stuff for the last 27 years or so. [MartySchrader] <ahem> But anyway, there are plenty of applications where speed of processing is still important, regardless of the hardware platform being used to host the app. Embedded systems jump right out as primary examples, of course, because you are limited by cost, memory, heat, battery power, real time considerations, etc. Creating applications that are grossly inefficient is simply bad practice, but doing so in certain environments is complete incompetence.

Your description originally did not sound like an embedded application to me. Thanks for the clarification. Note that I will make the machine work harder if it makes my life easier at times. The machines are here to serve mankind. Higher levels of abstraction often consume more CPU because there are more layers, wrappers, and translators in the middle. That is life.

Ah, but if the client says, "this is all you get," then you don't have the option of overworking the machine. Resources in an embedded environment are at a premium and can't be wasted on inefficiency. Abstraction often gets lost in the optimization shuffle. There is a real reason why Forth and Lisp and Java and Scheme and Python and Basic and a ton of other languages aren't used in these situations. The same is true of inefficient applications that don't make the best use of the underlying hardware's capabilities; one needs to be aware of the best way to use what one has and work within the limits of these resources.

Another rounding example:

Our system had been written to compute taxes on invoices by multiplying each invoice total by a fraction. Later, the boss wanted a report that computed tax by product, and was very concerned that the sum of the taxes by product did not add up to the same number as the sum of the taxes by customer. I tried to explain why to him, and he seemed to understand my reasoning, but said that he had other software that "did it right", so mine should be able to do so as well.

I suggested that we start computing tax by invoice line instead of invoice total, but that was unacceptable. The tax amount on the invoice must accurately reflect the rounded tax on the invoice total. The solution that finally satisfied the boss was to compute tax on the total, then prorate the tax across the invoice lines, starting with the smallest-valued line to minimize per-line tax discrepancies as a percentage of each line amount.

I betcha your boss will be promoted out and the new boss will ask "why are the product tax rates all funny?", and you'll end up doing it your original way.

Rather than argue, I might consider saying, "Based on what I know, all solutions will impose some kind of trade-off. It appears to be a matter of selecting the least-painful trade-off, or at least those the customers will complain the least about. If there's a better way that avoids trade-offs all-together, I'm not currently aware of such a technique.

Rather than claiming it violates the laws of math/physics/logic (which it very well might), just admit you are currently "stumped". They may be disappointed, but at least they won't hound you beyond saying, "keep thinking it over". It usually doesn't go over well to tell somebody they are violating the laws of math/physics/logic.

I have encountered similar confusion about hierarchies that should not have been hierarchies. Trees are usually easy for users to grok, but get messy beyond a certain point. See LimitsOfHierarchies.

Once I worked with a product order system that could have multiple "ship-to" addresses per caller so that some ordered items could go to location A and others to location B, etc. We transferred the order information to various customer's systems, but one client's system could not handle multiple ship-to's. Yet the marketing team did not want to remove the multiple ship-to's option for their account, so we could not just switch off that feature. We couldn't just make the issue go magically away, but we could not give them data their system could use without resolving the conflict between their marketers and their IT staff. It was a fundamental conflict in requirements, but they didn't see it that way. They saw it as some kind of problem on our side because the sales side didn't understand or want to deal with the data import process, and their IT department just wanted data their system could use. They would tell their sales department, "Those guys are unable to give our system data it can use. They are being difficult". If I remember correctly, we just ended up sending them only the first ship-to address if there were multiple, and let their customer service deal with any complaints or problems that resulted from it. Sometimes you just have to let customers learn the hard way. -- top

Early in my programming life I was asked to write an estimating program for our sales people to use. A simple database with stock items was used to store the retail price of each item. An estimate was generated by fetching the item, multiplying price x quantity, and adding that to an accumulator that wound up as the estimate total. Piece of cake.

Or piece of kaka. It worked really well, but when the customer came back waving the estimate, we had no way to retrieve it (no customer information was stored) and convert it to an invoice. So, hey, just add a feature to the program to store customer data and the list of items in the estimate, and a way to express it as an invoice. Piece of cake.

More kaka. Hey, we have a list of stuff, and we know when we sell one, let's add stock levels and have a report that tells us when we're out of something. Oh, and a screen or three to manage this new feature. Oh, and can we have a list of sales people, so we don't have to type it in every time? And some way to assign the estimate or invoice to that sales person? Hey, piece of cake.

Still tastes like kaka. Wait, now that we know who sold what, and the total for the invoices, can't we calculate the sales commissions from this? In fact, can't you distribute the total sales commissions over our sales people, with a portion going to the house (sales manager) according to some simple rules ... that I'm about to make up? You can handle the rounding, yes? That should be a piece of cake, no?

Interlude: discussion of why none of the damned numbers would derive correctly if the formula was worked on the sales invoices in one sequence instead of another. Well, JustMakeItRight. Piece of cake.

Or slow-cooked, well seasoned kaka. Well, I didn't have a name for it at the time (I was new at this) but I got to write my first "best fit with rounding and skimming" routine.

Piece of cake. -- GarryHamilton

Possibly related: MakeSummariesTraceable. Any complicated or convoluted calculation technique will eventually need some kind of tracing or auditing report. Otherwise, you'll be on the phone all day explaining where the numbers come from.

"Just Fix It!" WarStories:

http://www.infoworld.com/article/06/07/11/79615_29OPrecord_1.html

CategoryStories