One problem in addressing large problems is that the solutions often have large runtimes. For example, I'm currently working on a bug where the customer has pointed out that it takes four days to finish, and it used to run in 10 hours. (Luckily our competitors take weeks.) Our integration test take about 12 hours, and just the unit tests take 2 hours. -- JeffBell
Hmm. Would you believe RefactorTests?
There are certain tests cannot be refactored into smaller tests. For example, when writing software to arrange circuitry on a chip, it doesn't matter too much where exactly any particular gate goes. What matters is how good your heuristics were for placing all of them. A test of less than 10,000 cells really doesn't tell you much.
Hmm. Is it possible that a handful of tests of fewer cells would tell you enough to be interesting, even if they didn't tell you everything? That is, is it possible to create a heuristic test suite that gives you 80% confidence in the code and runs during a coffee break? Or 60% and it runs in 10 seconds?
Not always. Again, in chip placements, the first 80% of the problem is easy. It's fitting the last 20% of the gates into a die that you find out whether the decisions your program made in the first part were good. There some estimations on the quality of results that you can use such as average and peak cell density and routing congestion, but even then you don't know how you're doing until you get past halfway.
In other cases the customer's acceptance test is that we be able to process 1 million cells before we run out of 32 bit address space. We can detect memory usage changes in smaller tests, but sooner or later you have to run the big test to know for sure.
This is an example of the case where some systems have stringent and extreme requirements on their behavior. Not so much functionality in the sense of "if A then B" but the scope, scale, precision, range, concurrency, up time, and so on of a system. It just plain takes a long time to load a big database, run lots and lots and lots of users in parallel, do something a few thousand times, and so on.
We are greedy creatures. While many of yesterday's problems fit comfortably in today's hardware, with short test cycles for example, two things are true. First we're not willing to give up yesterday's problems that still don't fit comfortably. And we're off inventing more problems at the limits of what we can do, which often means time. These problems at the limits can be a differentiator for a product, and often are.
In principle, one could execute a smaller test and extrapolate the results - this also is done. But stringent comes into play when we're just not confident enough in our extrapolation to bet on it. Pacemakers, for example. So we actually do the test. Surprisingly often, at least surprising to the big brains who do design things, the test fails. Then they usually say "that's just . . . " and go fix something small. But it did break. That's the point. And until we tried it, we didn't know for sure. That's another point. And while we did the big test, if we're clever we'll characterize the behavior of the system, so we'll know what "working right" looks like. That's an extra payoff to the test for correctness if we're smart enough to take it.
My personal favorite is making something do an hour's or a day's worth of repeated activity. In principle you could find any kind of resource leak kind of problem with the right instruments, in just a few cycles. Again, this also is done. In practice there's usually something lying around that you don't know enough to instrument for, or can't watch. One particular system I worked with, if it stayed up long enough it ran out of socket handles (same 32 bit integer thing), since they were simply incremented vs. deallocated and reused. Bad design? Possibly. But that's the point of testing things. If we got enough stuff right enough through design, we wouldn't bother testing. And socket handles aren't something your memory leak detector will typically look for.
I have seen a technique that might be useful for some of these types of cases. We had a 32 bit counter updating at 100Hz, I think. This meant that the counter would roll over every 40 days or so. One of the seasoned programmers didn't like the idea of not being able to test this except if the product was left running for over 40 days straight. So he set the initial value of the counter so that it would roll over about 4 minutes after every power cycle. This meant we exercised this corner case almost continually instead of very rarely.
I think this concept could be used in other contexts as well. Set the socket counter to a very large number. Or if that didn't work maybe just run a tight loop that allocates almost all the socket handles and then start your regular tests. You could malloc() a huge section of memory and then run your low memory tests. For the cell placement problem, if the first 80% really doesn't matter then do the 80% work and then soemhow save the state of the processing. Then start up your new code with that pre-rendered environment and see how it completes the processing.
Jeff said earlier "It's fitting the last 20% of the gates into a die that you find out whether the decisions your program made in the first part were good." That means that the last 20% of the work is essentially testing whether the first 80% was done well enough. Saving the state at 80% doesn't really solve his problem. The ideas regarding counter roll-over and low memory were good, though.