Garbage In Garbage Out

"Garbage In (results in) Garbage Out" - The system's output quality usually cannot be better than the input quality.

There is a coding policy that I am trying to state clearly, and GIGO was the best name I have thought of so far. That is: if you are given garbage input, it is better to give garbage output than to throw an exception. For example, code to reverse a string:

 public static String reverse(String s) {
     if (s == null) return null;
     StringBuffer buf = new StringBuffer();
     for (int i=s.length()-1; s>=0; s++) {
         buf.append(s.charAt(i));
     }
     return buf.toString();
 }

The relevant part of the example is that if you pass null in, you get null back. Even though the given operation doesn't really make sense on null, a relatively sensible response to it is provided anyway. The other options would be to throw a NullPointerException or an IllegalArgumentException; or to write an Assertion that (s != null). I don't know whether it is better for reverse to return garbage or to throw an exception. Probably nobody will call it with null anyway.

I think this is more of an issue in C, because there is no assertion mechanism for returning errors, and because it's easier to do separate #ifdef DEBUG builds. In C, I would probably abort in debug mode or return null in production mode. GTK+ has macros to print a trace message and return null:

   gchar *str_reverse(gchar *buf) {
      RETURN_IF_NULL(buf);
      ...
   }

(Or something like that.)

Returning null seems like DefensiveProgramming rather than GarbageInGarbageOut: I would say garbage out would be to return a pointer to some random uninitialized block of memory, or to walk through memory trying to return it.

The standard JavaIdiom seems to be to throw an exception. (Your callers will just ignore it anyhow ;-) -- MartinPool

I used to program this way, but then I met a man named KentBeck. He taught me a better way. Just don't send the garbage in. Instead of having vast amounts of code to verify arguments received put in a little bit of code to ensure they are correct before the call. --DonWells

I don't see how repeating the validations at each call is better than consolidating them at one place (inside the called routine). In fact, I think the author of the called routine has a better chance of knowing what things to validate than the caller (even when it is the same person!). Maybe I am missing your point? Of course, it would be *ideal* to "not send the garbage in" but I have yet to work on a project where I got to practice IdealProgramming :). --JeffShelby

I agree, doing something sensible with bad parameters is a great way to hide bugs until one day they jump out and bite you. Better your software throws an exception or calls an assertion failure so that it doesn't just hide the fault in the calling software. Also, its enough effort to code and test the functions that a routine is supposed to have without having to code and test for all the cases that are not supposed to happen.

An Argument Against:

The problem with propagating garbage is that it makes bug symptoms appear further away from their root causes. A program that fails an assert() or throws an exception as soon as bad data is detected calls attention to the problem as soon as it is discovered; probably very close to the area of code that is creating the problem. But if functions routinely accept nonsensical data as valid, it can be hard to backtrack from garbage output, visible on screen or report, to the code that originally produced the bad data.

I UseAssertions on the input arguments to ensure that the caller provides rational arguments. When they don't, an assert() fails, and it's immediately obvious that the caller made a mistake. This makes fixing the caller's code very quick and easy.

However, there is a place for the GarbageInGarbageOut approach; there's even a pattern language for it. Robust functions are good. And it can be helpful to propagate a "bad data" indicator from input to output to provide robust operation, even when input data is unreliable. -- JeffGrigg

I'd use GIGO under two conditions - the input was not an 'error', merely 'garbage' on the periphery of a meaningful contract. And if the garbage output were 'symmetric' to the input. The root example here gets and gives a null pointer. If the caller is into null pointers they'd better be able to survive ours. But if the return were an integer (maybe an index), I would not return 0 or -1. -- PhlIp

Handling the garbage after it's found is something that should be decided per application. That could very-well mean throwing the garbage right back out; however, detecting the garbage and eliminating it before it throws any sort of error should be a greater priority. A real-world tactic is to place the burden on both the caller and the sender. If the sender detects garbage, it needs to deal with it before it gets to the caller. If the caller detects garbage, it needs to deal with it before anything is returned to the sender. This creates a safety net of redundant checking and could make up where a component, for some reason, lacked adequate measures to detect garbage.

It depends on who is going to call the code. If it is a routine that will be called from your own code, it's best for it to go bang straight away. That way, you'll discover the problem immediately and fix it - probably while you're still writing the code. What if the garbage input is a different kind of garbage than you thought of? Isn't it better to UseAssertions that describe what your code is designed to do? This approach contributes to code re-use. If your code vomits when fed poison by a bad routine, then the poisoning routine can be cleaned up - which will make it more likely to be re-usable. The vomiting routine will probably be more re-usable too. -- DominicCronin

Re: "The system's output quality usually cannot be better than the input quality."

Unless perhaps the purpose of the system is to help clean information. However, usually suspicious data can only be marked for inspection and needs human intervention (input) to make final decisions. For example, a cleaning system might identify duplicate records for a given customer. A human will likely have to review the duplicates and decide which one to keep.

If you handle garbage input, it's not garbage anymore. By definition, garbage input violates your contract. Violating your contract is, again by definition, an error/exception. If you special case certain cases of input (like a null pointer), then you're altering what your contract considers valid input. Sometimes, for the convience of callers, you want to allow certain types of otherwise invalid input - like deleting a null pointer in C. But by accepting that data, you're creating a new contract, and your output should be guaranteed in those cases. --ChrisMellon?

The standard JavaIdiom seems to be to throw an exception. (Your callers will just ignore it anyhow ;-) -- MartinPool

I used to program this way, but then I met a man named KentBeck. He taught me a better way. Just don't send the garbage in. Instead of having vast amounts of code to verify arguments received put in a little bit of code to ensure they are correct before the call. --DonWells

Synthesizing these ideas, and tossing in a little DTSTTCPW, the indicated behaviour in Java is to not do any checking at all; if the input is null, the code will automatically throw the appropriate NullPointerException at the first charAt() call.

I do find DefensiveProgramming practices such as GuardClauses? useful (since you don't always have as much control over things at runtime as you might like), but generally I prefer to handle these things in the simplest possible way - which frequently is implicit. Although, since that is then relying on the DontCare? nature of the result for bad input (since we're not supposed to provide it), I suppose that is GarbageInGarbageOut philosophy after all... --KarlKnechtel

Another way of handling this would seem to be more complicated compiler and language support; have the routine declare conditions on the arguments that must be satisfied for input to be valid as part of the interface. Then, when a call to that routine is compiled, the compiler checks to see whether the conditions are known to be true at that point; for the conditions that aren't, it can (a) error and require the caller to make the conditions true somehow, (b) insert checks at the call site, (c) compile a call to a trampoline that does the checks first and then jumps to the real function code, or possibly (d) something else not mentioned here.

Interestingly, this "GiGo" sentiment was originally applied to the data entered into a system as opposed to the software that constitues the system. In the early days of computing (1960s & 1970s) Data Verification was quite common, given that a job that failed because of bad input data, was an expensive waste of computer time. In those days, cited error rates for professional key punchers were 15%, without verification and 3% after (verification, being a second, independant data entry pass that confirmed the inputs). Sadly, it seems that this rate of data entry errors persists at this time with the result that we have merely become tolerant of the resulting errors because the cost of verification is now prohibitive and we have enough defensive code that critical systems no longer ABEND for this reason.

Admittedly, this is my mathematical experience surfacing, but to me, the above example doesn't illustrate a "GIGO" mentality. If you pass a null String, then it makes sense to define the reverse of null to be null. This is akin to defining 0! (factorial(0)) to be 1, or 0^0 to be 1 as well--because a lot of mathematical equations just become simpler when you make this definition.

If you have a very good reason for believing that the reverse of "null" should be an error, then by all means, throw the error...but while, in general, making a function as robust as possible could result in a lot of bugs, having a function that is somewhat robust--especially in a way that makes sense--can prove to be very useful when you know you are dealing with messy data, and don't want your program interrupted every time an unusual but harmless exception comes along!

(Having said all this, if you are given bad XML, I've discovered that it's a lot easier to just tell the person who gave you the data to give you good XML, than to try to parse the XML, fix the problem, and discover it's broken another XML error patch you made previously! There are certainly limits to what you can do with "be as robust as possible" before it becomes a grand waste of time. :-)

      --Alpheus