Kolmogorov Quotient

I'm proposing the concept of Kolmogorov Quotient as a calculable number that represents the amount a high-level language reduces the complex. That is, the amount of expressivity or succinctness of a programming language. This idea of "simplification" is a factor of text-wise reduction (fewer characters needed to express a complex concept, à la AlgorithmicInformationTheory) and another, less easy to quantify concept of maintainability. Fleshing out this latter concept, it is clear it has to do with how easily one can establish programmer consensus for the given task (i.e. how many programmers of the language would put it back the same way you've expressed it or otherwise agree on the best implementation for a given problem?).

It is a Quotient so that higher Kolmogorov numbers for a given language denote a reduction in the complexity of solving the problem in the given language.

How do you propose to calculate the Kolmogorov Quotient?

Well, once the basic premise/methodology above is agreed to, it is only a matter of a rough constant addendum of difference for any specific implementation. (That is, as long as the implementation is the same across all measurements, the number should be valid and comparable.) But it could go something like this: Pick a language "close to the machine", like C or Assembly, and measure the amount of bytes of machine code it generated to implement a standardized suite of common, independent, programming tasks(*) (base_language_count). Then code the exact same functionality in the target language (no external libraries allowed) and count the number of bytes of source code (test_language_count).

KQuotient = (base_language_count / test_language_count) / number_of_equivalent_programs. - MarkJanssen

number_of_equivalent_programs is the number of programs fluent programmers of the language agree that are the best and equal solutions to the problem. (In case it's not clear, equivalent programs are those whose output is the same for all inputs.) This accounts for the "Perl problem" (I hope), because although fewer characters are required for many tasks, they would have 50 different implementations.

This should always be greater than 1.0. {why?} I'd guess Python or Ruby has the highest Kolmogorov Quotient.

[As for "why?": if your ProgrammingLanguage isn't reducing the complexity of programming an operation beyond the machine code (or even C) equivalent, there's no point in creating it.]

Programming languages are not necessarily about reducing the complexity of performing an operation beyond the machine code equivalent. Sometimes they're about providing a way to express problems more easily. Sometimes the former and the latter are the same thing. Sometimes they aren't.

So you actually calculated your KQuotient here, or are you guessing?

Python? Really? It's a rather conventional imperative programming language, with roughly equivalent expressivity to any other. What about Haskell? Or Prolog? Or even CoffeeScript?

[Sorry you jokers. A shorthand on top of CommonLisp has to win this one (trivial macro to rename the long names to really short names).]

Lisp cheats (edit: on a VonNeumann achitecture): it uses recursion and hides all the actual machine code cycles it takes to run the program. Also: making macros to cheat at source length would count against you on the other heuristic: maintainability or "programmer consensus".

Huh?

....Unless you're running on an actual Symbolics machine. But no-one does.

Show me the machine code for a recursive factorial function on standard hardware. I bet it will be larger than the iterative version. If you're not going to limit yourself to such hardware, the KQuotient concept is probably inapplicable or, at minimum, would have to be reformulated for SymbolicComputing?.

On a modern architecture with instructions for procedure calls, the machine code is not going to be significantly larger than the iterative equivalent. It might be slightly slower and less efficient. Of course, any recursive function can be converted to an iterative equivalent, but a solution is often much easier to express recursively. Isn't expressiveness, rather than efficiency, the essence of your KolmogorovQuotient?

"On a modern architecture..." -- un huh, recursive functions require arbitrarily large stacks, iterative procedures never do. Show me a CPU ("modern architecture") that has an unbounded stack. None. Thank you. You have to emulate stacks in software, so you have to include that code in your count. Next time, I'm going to ask you for the AckermannFunction, 'ya bastid. (I chose a bad example above, as the factorial function can easily be written to use TailRecursion. <snap>....Now I'm going to have to make sure the test suite includes a non-primitive recursive function....)

There are architectures (which is not necessarily limited to the CPU, but is typically dependent on the operating system as well) that allow stack sizes up to the limit of available RAM. Of course there can be efficiency issues with recursive solutions, but in cases where that is not an issue, recursive solutions may be more expressive than their iterative equivalents. That's why we use them.

Intuitively, I doubt your KQuotient would be useful as a general measure of language expressiveness, because you can almost always find a problem for which language <a> is more expressive (has a higher KQuotient) than language <b> and another problem where language <b> has a higher KQuotient than language <a>. It might, however, be a useful way of comparing languages given a particular problem.

I don't think that will be that much of a problem -- I believe one will be able to find a more-or-less agreeable set of "suites" to test languages with. The key for a fair comparison though will be to separate out the functions which require graphical I/O (computation towards the user) vs computation towards the machine (like scientific computing).

Anyway, it's obvious that among well-known, mainstream languages, SQL, Prolog and APL are tied for first place, Haskell is second. Python and Ruby are so far behind they're not even in the race, along with C, C++, C# and Java.

I believe this list is exhaustive.


I'm not sure what good KolmogorovQuotient will ever be. But comparing code sizes can be entertaining, and there was a decent analysis from the shootout a while back: http://blog.gmarceau.qc.ca/2009/05/speed-size-and-dependability-of.html

Legend:

Well, as you can see, you've proven my point: Python, Ruby, Perl, and Javascript have the smallest code size for the same functionality. But that is just one factor of the KQuotient (maintainability, the other). I suppose measuring the concept of "maintainability" has to be fleshed out more, but I think you could assign each symbol a special number indicating it's significance, then measure NxN symbol/letter difference between different but equivalent implementations with these special multipliers (something like a[M]x[N]) and then...

[And Python, Ruby, Perl and Javascript are all slow. Ocaml, on the other hand...]

Haha, yes, but you see once we decide on a language, we can engineer the hardware to make it fast. Until then it would be a PrematureOptimization to focus on speed because the world really doesn't know what it wants computers to do yet.... ;*)

{That's an optimistic assertion given the historical failures to distribute specialized hardware or sufficiently smart compilers.}


Related, see this article on programming language expressiveness: http://redmonk.com/dberkholz/2013/03/25/programming-languages-ranked-by-expressiveness/

In particular, the following graph shows more expressive languages on the left, less expressive on the right:

Of course, some of the above are not programming languages per se, or at least not general-purpose ones. E.g., Augeas is a language for manipulating configuration files, and VHDL is an electronics description language for specifying hardware.

Interesting, thanks.


See KolmogorovComplexity, LinesOfCode


AprilThirteen


EditText of this page (last edited December 1, 2014) or FindPage with title or text search