Kolmogorov Complexity

The complexity of a pattern parameterized as the shortest algorithm required to reproduce it.

See http://en.wikipedia.org/wiki/Kolmogorov_complexity

In general, this seems to say that the complexity can be measured by the length of a compressed representation of the thing. A drawback is that the minimal possible compression of any given thing may never be known.

This seems to have been missed in some of the discussion below: Complexity isn't the shortest string for which an algorithm can be defined to decompress it; it's the shortest algorithm. Values are intrinsic in nature; an external algorithm would make the value of a string extrinsic in nature. It would not be a relevant measure of complexity, since for any string, you can define an algorithm that compresses that exact string to zero characters. Kolmogorov Complexity is far more practical: it's the shortest complete description of a value... meaning you can send any equivalent description, such as "the result of executing this algorithm on this data".

It does, however, presume that the language provides no special features for the exact value in question. E.g. in English, one can compress "3.1415926..." to "the ratio relating the circumference of a circle to its diameter" to "pi". A commonly understood language MUST be assumed for communicating the algorithm or value, and most languages will possess at least a few primitively defined values. Those particular cases should probably be considered to have an artificially low complexity as a consequence of the language. It is not theoretically possible to separate Kolmogorov Complexity from the language, so complexity will always be language-dependent. Some values will have infinite complexity in a particular language... especially coinductive or recursive values in a language that lacks support for them. These values define the limitations of the language.

Does this include the context i.e. the producing mechanism? Otherwise it is poorly defined.

It only needs to be defined up to an additive constant. Different "producing mechanisms" produce complexities that differ by at most a constant that depends only on the mechanisms and not on the data whose complexity is being assessed, provided the mechanisms are good enough. If they aren't then what you're measuring isn't KolmogorovComplexity, though it might still be interesting. :-)

One relatively elegant "producing mechanism", based on EssAndKayCombinators, can be found at http://www.cwi.nl/~tromp/cl/cl.html - the page also provides the Kolmogorov complexity of the device itself in terms of the device.

It is defined up to an additive constant because, given any "producing mechanism" (say, a Turing machine, lambda calculus or combinatory operators), you can give, before the actual program, a prefix program which emulates the other "producing mechanism".

That is, if you have program "A" (take it as a sequence of n bits), which is interpreted by the "producing mechanism" "X", the length of the program "B", interpreted by the "producing mechanism" "Y", which yields the same result as "A" when run on "X", is n+m, where m is the length of the prefix program which emulates "X" atop "Y". m is the additive constant we're talking about :-)

Does this include the context i.e. the producing mechanism? Otherwise it is poorly defined.

Why?

I had a look and it does, it applies to Turing machines. Thanks for the answers though. As to why, well it struck me that the power of the underlying computational device could vary, as displayed in quantum computing mechanisms. So I wondered if the Kolmogorov complexity was defined for these others too. Hmmm, time for some googling.

In the article it doesn't restrict the description language to turing machines, but says you can use programming languages such as Lisp, Pascal and also lambda calculus. I assume this means only that the description language must be turing complete.

GregoryChaitin (http://www.umcs.maine.edu/~chaitin/) has taken this idea a long, long way, ending up with highly suggestive evidence that pure mathematics (even number theory) is best viewed as an empirical subject. See TheLimitsOfMathematics.

Yes, I think that is a deep and important result that might bring Mathematicians back into the Zen fold. Emptiness is also empty :).

Here's an example of Kolmogorov complexity. Consider all bit-strings with exactly 100 bits in them. Then a string of one hundred 1s

   111111111 ... 1111

has very low Kolmogorov complexity, because it can be more compactly described as, say, the six-character string "100 1s". A string with alternating 0s and 1s

   010101010 ... 0101

also has low Kolmogorov complexity, because "50 01s" is a much shorter description of the string. Now consider flipping a fair coin 100 times in a row, and recording a 0 for heads, 1 for tails. You will end up with a random-looking string of bits such as

   011100001 ... 0110

There's no way to describe this precise string in fewer than 100 bits. [This is not necessarily so. It is not true if all programs less than 100 bits long halt or can be proven to not halt.]

You can succinctly describe the process as "flip a coin 100 times", but that doesn't encode the exact string of bits that running this process gives you.

Roughly, any 100-bit string with a pattern to the bits can be described in fewer than 100 bits. In fact, you can define a bit string to be random if the length of its shortest description is as long as the bit string itself. Much of Chatin's (and many other mathematicians!) work is exploring the logical consequences of this sort of definition.

A key fact about Kolmogorov complexity is that no matter what compression scheme you use, there will always be some bit string that can't be compressed. This is pretty easy to prove: there are exactly 2^n bit strings of length n, but there are only 2^0 + 2^1 + 2^2 + ... + 2^(n-1) = 2^n - 1 bit strings with n or fewer bits. So you can't pair-up n-length bit strings with shorter bit strings, because there will always be at least one left over. There simply aren't enough short bit strings to encode all longer bit strings!

Kolmogorov complexity is powerful stuff, although admittedly most of its applications are mathematical, or philosophical, if you're a mathematically inclined philosopher. You can prove GoedelsTheorem using an amazingly simple Kolmogorov complexity argument based on the BerrysParadox. [However, saying that you can't prove complexity is greater than some constant is just a convoluted way of saying there is a program shorter than that constant which does not halt but you can't prove it. This is so because the set of nonhalting programs is not recursively enumerable, so any such proof can be simplified by simply appealing to the fact that there are programs that don't halt but we can't prove it, which itself can be proven by exhibiting such a program: Foreach Proof x If x proves "This does not halt." then HALT.] It can also be used to derive average-case running-time of algorithms, such as the average case running-time for heapsort. The trouble with traditional average-case algorithm analysis is that you need to make assumptions about the distribution of all possible input strings to an algorithm, which typically ends up requiring heavy-duty mathematics from probability and statistics. But a technique called the incompressibility method lets you get away without needing a probability distribution over all input strings, and instead you pick one representative bit strings such that results you can derive about that one string then apply to all other bit strings. Sometimes (not always - not even mathematics has a PhilosophersStone) it leads to incredibly simple proofs of average-case run-time results.

I think the above is not quite correct. There is no compression algorithm that will compress all possible inputs to smaller size, but there does exist for any arbitrary input a compression algorithm that will compress it. Consider the bitstring 010001100101000111. Define a compression algorithm thus:

if the bitstring to be compressed is 010001100101000111, then return 1
if the bitstring to be compressed is 1, then return 010001100101000111.
for any other bitstring, return the bitstring unchanged.

This algorithm very efficiently compresses 010001100101000111, but is not very good at other bitstrings. -- AndyPierce

(This decompression algorithm contains the original string. In transmitting the compressed string, you must deliver the algorithm AND the '1'. You definitely didn't save any bits...)

Your point is well taken, but what the "counting argument" above shows is that for any compression scheme you can devise, there will be some bit-string that it can't compress. What you point out here is indeed true - given a particular bit string, it's trivial to find an algorithm that compresses it to a single bit. But it's not what the theorem above is about.

Be careful... although you may be correct, the example given to produce '010001100101000111' is arguable longer than '010001100101000111' itself. As an aside, would this mean that '1' has a high Kolmogorov complexity?

I disagree with the last comments.

I think I understand what AndyPierce was talking about. He forgot that the algorithm which must generate the string must include all its data (yes, the algorithm generates one string only). So even if your "data" to be decompressed is "1", well, you algorithm will have to include everything the "1" means. The algorithm must contain all its data, and it's the size of the algorithm itself that matters.

Does '1' have high Kolmogorov complexity? Good question! It can't be compressed to any fewer number of bits, so I guess you would have to say it is random in the Kolmogorov sense. But it's a degenerate case, so you shouldn't draw any major conclusions from it.

'1' has very low Kolmogorov complexity (at least in most languages). It can be encoded in very few bits. This measure of complexity isn't about percentage compression! It's about absolute size.

Makes me wonder.... It's not about mapping 1 possible to 1 possible; it's about absorbing an occasional bit into an evolving randomized stream. One would hope it was not too random because you'd find it difficult to get anything absorbed back. not not red is not necessarily red. Be careful when understanding proof by negation.

I'm rather skeptical that KolmogorovComplexity has anything to do with "evolving randomized streams". KolmogorovComplexity is about describing a value; values are things that can be communicated in a language, which requires that they be representable in a signal. Usable signals in our universe are physically bounded in space and time. Many infinite values can be described in finite space-time via coinductive expansion, such as the set of all natural numbers or the Fibonacci series or the digits of pi... but infinite random streams (of bits) are not among them. As such, infinite random streams cannot be values in any language that we can physically utilize. For any language designed by denizens of our universe, evolving randomized streams will have an infinite KolmogorovComplexity.

Questions

If complexity or size of algorithm is being measured and not it's representation, then how is size of algorithm being measured? At least measuring executable size (EXE) is an objective number.
- It is the size of the shortest expression of an algorithm required to produce the string. As noted above, this will be language dependent.
I doubt anybody can prove that any given algorithm or representation is the shortest possible, only the shortest known. True?
- True. It is impossible (at least in the general case) to decide whether a shorter algorithm or expression could produce the same output. That said, it is easy and practical to prove upper bounds on KolmogorovComplexity (i.e. the complexity is certainly not greater than the shortest algorithm found so far), and - assuming a language with a finite alphabet (which is the usual case) - it is often possible to prove lower bounds on asymptotic KolmogorovComplexity. The pigeonhole principle (http://en.wikipedia.org/wiki/Pigeonhole_principle) is one of the better tools in the lower-bound proofs. Being able to identify upper and lower bounds makes KolmogorovComplexity a useful, objective tool.
- By iterating over all strings with length < supposed minimum I can theoretically find the KolmogorovComplexity of any finite string for the system architecture being tested (there exists a proof that this property is directly related to system architecture). --JoshuaHudson
  - In general, you'd face the HaltingProblem. But if you constrain your description language to guarantee termination, you could iterate over all smaller strings.

ComplexityMetrics describes possible drawbacks of KC as applied to images and biology.

CategoryMath CategoryComplexity CategoryMetrics