Solved Problems In Computer Science

Sometimes it feels like the more I learn, the less I know. (It is OK to feel that way; so did great thinkers like MrSocrates.) I'm yearning for a little bit of certainty right now, so I was hoping to track some of the problems that have been solved. I mean, really solved. Understood completely. We know what algorithms solve the problem best in each given situation, and we have proven that no better algorithms exist in each situation than those we have found.

I'll try to start (though I'm not certain that these problems really are completely solved):

These problems are really vicious in chip design. The problem of addition/subtraction carry propagation is literally the limiting factor on the clock cycle of a CPU. I am unaware of a proof that the best currently known methods of doing fast carry lookahead/propagation are in fact theoretically best; if anyone knows otherwise, enlighten me. -- dm

"we have proven that no better algorithms exist " - oops, sorry, no, the opposite is true...the SpeedupTheorem? proved that there is no fastest algorithm; there's always a faster one.

That doesn't apply to O(n) analysis, where we do know that e.g. sorting based on compare and swap cannot be faster than O(nlogn) - but in practice multiplicative and additive constants can in fact matter. Also obviously we don't know whether P = NP; everyone doubts it, but if it turned out to be the case, then a vast number of important algorithms might have much faster solutions than currently known. -- dm


The problem of addition/subtraction carry propagation is literally the limiting factor on the clock cycle of a CPU.

True at one time. But I've been told that the first level of cache is now the limiting factor - make the cache bigger, and it slows down the cycle time of the entire CPU. (Make the cache smaller, and you get more misses, which makes the CPU take more cycles to execute a given bit of code). -- DavidCary

You are absolutely right, but that is a separate issue, one that would disappear if external DRAM were getting faster fast enough. The problem you're talking about is due to DRAM speeds not keeping pace with CPU demand; their Moore's Law curves are different. This put increasing pressure on the SRAM caches between the two, so this is a side effect rather than being fundamental -- although it certainly does matter.


I'm sorry to say I must complain that all of your examples below address only the page title, but not his further characterization of "what algorithms solve the problem best". You're listing problems where we've found some solution; the full list of such must be in the tens of millions.

OK, I deleted a few of my non-best solutions.

Is it OK to list algorithms when it's been shown that no other algorithm can do better?

How about proofs that a particular problem cannot be solved?


Problem: All hard drives fail sooner or later.

Solution: Write everything to an array of at least 2 drives (preferably more). When one fails, replace it with a fresh drive, then synchronize before the next one fails. (RAID).

(If we assume that hard drives are the only cost-effective online storage media, this is the only solution. Of course this begs the question of whether hard drives are really the best.)


Cryptography

It is impossible to decrypt messages encoded using a "one-time pad" of random numbers, unless you have that pad.

There are no other algorithms that can possibly make encrypted messages *more* secure.

"Password handling is simultaneously one of the few Solved Problems of Cryptography *and* one of the most misunderstood. Simply store a MD5 or SHA-1 one-way hash of the password." Except that's still vulnerable to RainbowTables? - use salting please. And even good salts are vulnerable to brute force searching for bad passwords, so really... ad infinitum. -- http://www.doxpara.com/read.php/security/password_rejected.html

Is this one safe from my critique? It's claiming I may misunderstand it. :-)


Q: Is it possible for one program, in a finite amount of time, to decide whether some arbitrary other program gets stuck in an infinite loop, or eventually finishes?

A: No. HaltingProblem.


Compression

If we have a set of messages with known probability, and we want to indicate 1 of them with a sequence of bits, canonical Huffman compression gives the minimum expected number of bits.

(Other types of encoding might give the *same* expected number of bits, but no other algorithm can possibly give *fewer* bits).

(puzzled look) I don't quite see why it is incorrect. If I want to indicate *more* than 1 message, then arithmetic encoding may be able to transmit the whole sequence of messages with fewer total bits than Huffman. If the receiver *doesn't* know the probabilities, some other algorithm may be able to transmit messages with fewer total bits. (Especially if there is some sort of correlation between messages). But in the case of 1 message (and known probabilities), it is impossible to beat Huffman. (I would be very surprised and delighted to find even a single exception.) -- DavidCary

I see. This makes me uncomfortable, because I think there's always a way to examine some algorithm, and then after the fact come up with a very specific problem for which that algorithm is the best solution, which loses discriminatory value for "best".

In this case the general problem seems to me to be compression, and you are pointing out that there is some specific problem within that domain that Huffman encoding is best for, if we are careful enough about our phrasing of the problem.

Maybe it's reasonable to allow Huffman encoding here, because maybe in some sense it fits the spirit of the page, but if we do, what's to stop me from coming up with a carefully stated problem for which e.g. BogoSort is the best solution? How do we keep the dam from crumbling?

If I said "1 symbol" instead of "1 message", would that be less misleading?

Yes, that's true. I was just trying to point out that for a few distributions, there *is* a perfect compression algorithm. It's a Solved Problem In Computer Science. Unfortunately, most files don't have that sort of distribution :-(. -- DavidCary


SortingAlgorithms

Q: Is there a sorting algorithm known to be the best?

A: Yes and no. No, in the sense that you can custom design a RadixSort (not just use an off the shelf radix sort) to be optimal if you have enough information about potential input data. Yes, in that we know that it can be done by some radix sort in time proportional to O(n) if do that, and yes, in the sense that we know we can't do better than O(n log n) if we don't know anything about the input data and aren't allowed to use anything but compare and swap.

(Note that there is no magic bullet. Yes, clever data structures can make sorting appear O(n) but O-complexity is really measured in the space*time domain. People often simplify it to just time since the space is usually constant. An Order(N) sort may use O(N^2) space to guarantee the performance; you may use some slick data structure to reduce that space (hashing, tries, etc) but on closer examination you'll find degenerate cases where the time complexity is once again increased.)

But it's not a solved problem, exactly, because there's still the question of decreasing the multiplicative constant even for the O(n log n) ideal. A new refinement of QuickSort (Bentley and Sedgewick, IIRC) has appeared in recent years that does better in that regard.

There also is still no known algorithm for guaranteeing O(n log n) performance on any old input data. Some of the best algorithms perform very badly on almost-sorted data, or on reverse-sorted data, etc. In regard to QuickSort, there is no proof of the best way to pick pivots, yet picking good ones is critical to its performance.

Usually, to know the "best" would mean that further research is pointless, and this is not at all the case for sorting.

Careful... HeapSort for example is guaranteed to be O(n log n) on any old input data where comparisons are of constant cost. The interesting issue is really taking good advantage of ideal cases while guaranteeing O(n log n) in the worst case, while maintaining a reasonable constant: one could conceivably write an algorithm which gives a whole list of algs exactly the amount of time that a guaranteed O(n log n) alg would take before timing them out, and falling back onto the O(n log n) only after expiring the rest. Such an algorithm would give O(a n log n), where 'a' is a constant factor in the worst case, and therefore has no bearing on the final O; as such, it would still be O(n log n), but in most practical applications, the overhead would kill you.
Of course, this is not to imply that further research is pointless, because the constant factors/overhead can still be improved, and those gains are still useful. Indeed, some of the constant factors may not actually be constant; comparison of two arbitrary numbers, for instance, is really of the same complexity as comparing two strings, on commonly available architectures and sufficiently large numbers. To a sufficiently anal observer, big-O notation must take these things into account.
-- cwillu


EditText of this page (last edited July 1, 2008) or FindPage with title or text search