Simulated Annealing

One of a class of techniques whose adoption amounts to an admission that the problem we're applying it to is too hard for us to solve, and the best we can do is to imitate MotherNature? more or less blindly and see what happens. (NeuralNetworks provide another example, GeneticAlgorithms another.)

If that were really SA, it would suck. FuzzyLogic isn't really about "yes maybe no", so maybe SA is a little more, too...

The analogy here is to a way of softening a metal, by aligning the crystal structure. You heat it up and then cool it down very slowly. This allows it to form larger crystals than you'd get if you cooled it faster (if you try freezing water, or forming sugar crystals, you can observe the same thing), which corresponds to finding something nearer to a global minimum-energy configuration.

So, you have some function on a space of many dimensions whose global minimum you want to find (or at least come close to). What SimulatedAnnealing tells you to do is this: start with any old point, then repeatedly try out random "simple" changes. If a change improves things (i.e., decreases the objective function), then accept it. If it doesn't, then maybe accept it, choosing randomly. Randomly? Well, you accept small worsenings more often than big ones. And - and this is the important thing - as the algorithm progresses, you make your criteria more and more stringent.

The way this fits into the physical analogy is that you think of the changes as being like random thermal fluctuations. As the temperature drops, the fluctuations get smaller. It's traditional to make your rule for accepting or rejecting changes look just like the one that actually applies in statistical mechanics (I'm not sure whether this is actually in any sense best).

The plausibility of the method comes from two sources. Firstly, it's like what MotherNature? does, and MotherNature? manages to solve some very hard problems. Secondly, the intuitive picture is of your candidate point hopping around on the "landscape" defined by the objective function, able to jump out of shallow local minima, but with its ability to jump around gradually getting less and less; with a bit of luck, sooner or later you'll jump into a really deep minimum and never leave it.

I'm not sure what the verdict is on SimulatedAnnealing methods these days. They certainly tend to be very slow, but I think they're still useful sometimes when you have a really hairy optimiZation problem to solve and no real idea how to proceed...

We do SimulatedQuenching these days. It's very quick.

SimulatedQuenching is not a general replacement for SimulatedAnnealing. The so-called quenching algorithms were an obvious enough extension to simulated annealing to have popped up independently many times. It works great in cases where your problem is a well-conditioned estimation problem like, say, some image restoration problems. When this problem was first approached by annealing in the (very good) Geman & Geman paper [1], they didn't realize that even with a poor image model the problem was well enough conditioned that fast nearly-greedy optimization would get you to a nice low energy state. This is certainly not true in general; you could perhaps go so far as to say that quenching is a good replacement for annealing in a subset of those problems you shouldn't have been using annealing for in the first place! That is a bit overstated, but somewhat applicable due to the many, many, occurrences of blind application of SA for not better reason than the fact it is very easy to set up.

[1] (Not added by the author of the comment above, but probably the right reference.) S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721-741, 1984. Available from http://www.dam.brown.edu/people/geman/Papers/stochastic%20relaxation.pdf (~4.4 megabyte PDF)

SQ is a direct replacement for SA, it just uses a different cooling schedule. It trades off coverage for speed. I have used it for optimizing highly constrained electronic networks (antennae actually). Works great. Much better than GA in my experience.

"Replacement" in the sense that you can use it instead, but not in the sense that it finds better answers quicker in every case. I have problems where SQ fails completely, but SA works a treat.

I suppose I should have said "not a general replacement" (now fixed) originally, as perhaps it was not obvious from context. SQ is not merely trading off coverage for speed, it is achieving speed at the expense of no longer solving a large class of problems (unless that is what you meant by coverage, and I misunderstood you). In your comment lies the answer: your highly constrained problem is almost certainly well-conditioned (by those constraints). This conditioning will either manifest as a 'nice' energy surface, or 'nice' IC that start you off somewhere helpful in phase space. From here, a fairly-greedy, fairly-local algorithm like SQ will do the trick. Of course in general, this is not what you have. For hard optimization problems, SQ is hopeless; you will just end up freezing in a state near your initial state. Of course, SA is no panacea either; see the NoFreeLunchTheorem.

I meant coverage of the solution-space; apologies if this was unclear. I don't consider SA and SQ as different as you do. Indeed I have used an adaptive engine that went from Metropolis to hill-climb, though after a lot of testing I found a simpler approach worked better for all the test problems I've found so far, well-conditioned and otherwise. Perhaps you can give an example of a problem that demonstrates the hopelessness of SQ versus SA?