As Fast As Cee

An apparent holy grail of performance, to be as fast as CeeLanguage, as often cited by users, developers, and advocates of higher-level languages (e.g. LispLanguage, FunctionalProgrammingLanguages, JavaLanguage). Quite often, a failure to hit this target is pinned upon the lack of a SufficientlySmartCompiler.

This page covers a lot of ground, but it mostly boils down to this:

One major issue with this page is that it fails to note that the core C language is nearly a lowest-common-denominator sort of programming. Some high level languages can achieve c-like performance when you program *like* c. For example, several Lisp implementations, OCAML (somewhat less true, it has some good tricks of its own), and some simply can't. On the other hand, these high level languages can easily do things that there is no natural way to do in C. If you spent the time and 100k-lines of C code to provide some of these features to your C environment, it isn't clear that it would be faster than another languages native implementation. See, for example CLOS, or Haskell's patterns. So, as usual, it really depends on what you want to do. It is also worth noting that for *really* cpu intensive tasks, C is often not good enough. Hence the continued use of Fortran, and inner loops written in ASM, etc. Especially where vectorization is concerned, if your c compiler isn't sufficiently smart, you may still be able to blow it out of the water with a dozen lines of ASM.

I think this page can be summarised as follows:


Of course, AsFastAsCee is a very simplistic and almost-useless benchmark. Some languages are faster than C in some domains; FortranLanguage is faster than C for numeric applications (many optimizations performed by Fortran compilers are disallowed by C, though C99 narrows the gap). Many statically-typed FunctionalLanguages can do things faster than C as well, especially in the absence of SideEffects. Proponents of other languages will cite these as examples; OTOH outside the specific problem domains these languages suffer compared to C in either performance, usability, or both.

Please cite some concrete examples of functional languages that are faster than C. The only one I've seen that's anywhere near as close - within 2x - is ObjectiveCaml, and only if you resort to using imperative features and manually remove bounds checking and so on.

Many Lisps often sit within 2x of C also (which is not the same as being faster!) so ObjectiveCaml is hardly unique in that. The book On Lisp (http://www.paulgraham.com/onlisp.html) features some Lisp code that it claims is faster than C on page 24. However, it fails to name implementations of either language, and a comparable C version, so.

Functional languages can hardly beat C in terms of pure speed for very optimized functions. However, for common, unoptimized code, which makes the bulk of programs, the speed is comparable. One should think of C being to functional programming languages what assembly language is to C. Therefore, a more interesting point of comparison, both in terms of speed and functionality, would be C++. Things that modern functional programming languages like ObjectiveCaml take advantage of are : static strong typing which allows some compiler optimization, absence of memory fragmentation and improvement of locality in cache memory by the garbage collector, and highly optimized use of the call stack.


On the other hand, the speed of C comes at a cost. It's not very abstract, giving the programmer plenty of room to tune every last data type or memory allocation to the specific needs of an algorithm. This flexibility can make the program run faster at the cost of programmer time in the best case. It can also make no appreciable impact on performance at the expense of programmer time, or it can even make the program slower.

More abstract languages offer heavily-optimized, widely used data structures and algorithms right out of the box. Few C programmers can whip up a hash table implementation as fast as Perl's, for instance.

There are two ways to view this one, from the C side, one-offs can benefit from knowing the distribution of your numbers, sometimes mod over an array is optimal. From the Perl side, few could write better hash tables than perl has for the same functionality that Perl's hashes offer. Otherwise, we'd have a patch to perl.

I'd rather avoid the premature optimization and steal perl's hash tables. I can always tune the performance after I can measure it. Higher-level languages may permit the implementation of smarter, more efficient algorithms in less programmer time, which should also inform this debate. The difficulty of implementing faster but highly complex algorithms can discourage their implementation in C.

Moore's Law has changed the game significantly in terms of language optimization.

Many programs on servers and end-user PCs are I/O bound, so languages that can use smarter I/O mechanisms have a big advantage. And of course, being AsFastAsCee is a RedHerring for many applications anyway. A CrudScreen never needed to be AsFastAsCee, nor do many other types of programs. The C bigots who loudly insisted that performance is the only thing that matters ought to be ashamed; the bigots of other languages ought to be ashamed when they fall into this rhetorical trap.


Hmmm. I wrote two pages of Perl that processed several GigaBytes? of input from a pair of major datasets, compared them using a variety of RegularExpressions, compiled a set of comparative totals and statistics from the data, and which executed in about 35 seconds. I'm not sure I could have written that, such that it ran any faster, in CeeLanguage. Granted, it was a Sun E-10k running SolarisOs, but by any standards, grinding through two sets of 13+GB (that's 26+GB total) files and making sense of it in under a minute on anything is good exercise for the eyebrows. And it's an interpreter. [Still shaking my head]

I grant that Perl is, itself, written in CeeLanguage, but I have to say that it has leverage that goes beyond that. One might say "more than the sum of its parts." -- GarryHamilton

Perl has a highly optimized regular-expression parser. It was made for doing processing of the type you describe. You probably could've written code in C that would have run just as fast, but it would've taken you 10 times as long to write. -- TimKing

There exists an implementation of perl-compatible regular expressions in Common Lisp that is actually faster than the CeeLanguage implementation embedded in perl. That this is possible is due to the fact that a running Common Lisp program has access to the compiler, so it can compile the regexp into machine code, something c code cannot do. See http://weitz.de/cl-ppcre/#performance But see TickCee

Perl also has highly optimized I/O buffering. Again, made for doing this type of processing.

It sounds highly likely that your processing was limited by I/O bandwidth; not CPU speed. An interpreted language like Perl won't do too badly if most of its time is spent waiting for the disk.

HorsesForCourses. If your application dictates the speed of C (or C++ or Fortran or whatever), use that. If not, and you're more productive in Perl or Python or Smalltalk (or even Java), use that instead. Rather simple, really.

And don't forget the strategy to AlternateHardAndSoftLayers. Typically only a tiny fraction of an application, the innermost process bound loops, really need to be AsFastAsCee.


When people talk about relative speeds, they are usually talking about efficiency of use of cpu, not about IO, and C often is faster, compared with e.g. Perl, Python, Ruby, Pascal, Ada, etc etc etc.

Lost me there when you included Pascal. Although I can often get the speed of Pascal by including all of my C source files into one big source file and declaring everything static, it's by no means certain: pointer references and 8 bit values are sometimes optimised poorly.

Most of the latency involved with I/O is spent - doing the I/O. That's an OS function. A language design can have an effect, of course - depending on what I/O primitives are available. In most case, stdio is plenty fast.

That said, the C standard library provides little support for "advanced" I/O features (no pending on multiple file descriptors, no networking support, no asynchronous or nonblocking I/O, etc.) - but those are very OS-specific, and most languages are less flexible than C in this regard.

They're OS-specific because the C standard library doesn't provide an interface to them; not vice-versa.

And there are many useful de facto standard libraries for these things - Berkeley Sockets, POSIX, Win32, etc... that are widely available. You will notice that most programs that need to do high-performance I/O (web servers, RDBMS's, etc.) are implemented in C/C++.

The C stdio library does burn some cpu, and compared with a more specialized facility, can be noticeably slower. Long ago I needed to read very large hexadecimal files, and speed was important. The original code used scanf. A hand-coded solution was vastly faster. Sometimes use of IO library functions is actually cpu-bound rather than IO-bound; this is, of course, not typical.

This particular example--where scanf is slower than the handwritten code--is probably not a result of scanf being an unspecialised facility; more likely, it's a result of most of the information that would be determined at compile time via a sane API (say, type information, parameter locations, etc.) being delegated to a fairly complex system of runtime string processing, stack frame manipulation, and possibly even callback handling, depending on the implementation. Then again, one could argue that the stunted compiler-in-a-can that is scanf is simply what you get (deserve) for using unspecialised facilities designed for pre-ANSI C.

If you are doing something in Java, try JNI if there are performance problems in some function. I've always found that it takes less time to RefactorTheCode? in Java, rather than to RecodeInCee?.

C is not faster than SML when doing NumericalComputing?.

For doing IO you should always consider using AsynchronousProgramming, so that your application can continue to do work rather than waiting for I/O.

Here's one approach: Read the file in Java. Put it in a queue using a thread, then use another thread to process it. You will be surprised that if the file is big enough, the queue will use all the memory and the program will stop. So, IO can be so fast in Java that the processing can get behind. The same is true in C++.

Java never claimed it would be faster than C, but now it is on several benchmarks, with the remarkable exception of Swing.


CeePlusPlus can be just as fast as C. Yes, you could write C++ code that's slower and bigger than C code, but you also could write C++ code that's as small and fast, if that's your goal, if that's what's important to you. Additionally, this C++ code can be cleaner and more maintainable.

I recently wrote a boot ROM for an embedded system, wrote it in C++. Even threw in an exception for good luck. (C++ exceptions add no performance overhead unless you actually throw one.) I had no performance or code size problems. That's because the low-level code, the part doing all the real work, compiled just as well as it would have were it written in C.

And if the C code you're comparing to is not small and fast (instead preferring other values), you could probably outperform it with a number of other languages.

-- TimKing


See also OptimizeLater.

As always, the real solution is in the design and coding, not the language. A program in <high-level-LanguageOfChoice> could be as fast as it needs to be, to within a constant multiple, if the design is correct. If a program is primarily I/O intensive, then fast I/O (which is usually a factor of the system calls, not the language or the libraries) will be the key, whereas computation-intensive code will depend on how you design the code.

Note, however, that correct design includes the necessary post-optimization; if the HotSpots? are re-written in CeeLanguage or even AssemblyLanguage, then the remainder can be written at whatever LevelOfAbstraction is appropriate. AlternateHardAndSoftLayers

In any case, performance is not the only criterion that a program should be judged by, even if it is the one most obvious to casual users. Correctness, usability, and maintainability count for as much, if not more, in the long run. -- JayOsako


As a long time C/C++ supporter, let me provide my observations.

  1. The speed of the language rarely affects the speed of software.
  2. I/O speed, network hops, disk accesses, etc. are usually far more critical to run time speed.
  3. Physical memory size, i.e., avoiding disk cache, is usually more important.
  4. Particularly in desk top PCs, any performance improvements made in code today could be more easily obtained by using next year's hardware. The increase in CPU speed will improve performance more than any code level tuning.

I like C/C++ a lot, but I really couldn't justify C/C++ as a project language based on speed considerations, except, perhaps in some very limited environments. -- WayneMack

Anecdotal opinions like this would be much more helpful when the application area is indicated, because these sorts of observations and conclusions can vary sharply depending on what you are doing.

To provide a general answer one must rely on general observations. I base my observations on many years of embedded design work with C, assembly, and microcode sometimes with CPUs running less than 1 MHz where I was actually tweaking code for performance because the hardware was barely sufficient for the job. I also designed the hardware and knew the trade-offs made when designing the hardware. I have done C++ work with 2 and 3 tier client server designs primarily with MS Windows starting with Windows 3.1 with occasional ports to Unix environments. More recently, I have been the project manager for a 3 tier design incorporating Visual Basic, C++, and SQL scripts with some COBOL and Korn shell thrown in. My team has also done some preliminary investigations of first going to a web architecture with ASP and more recently with Java.

In the set or performance problems I have seen, the problem has never been due to excess machine instructions generated by an "inefficient" language. Most often, performance problems are related to the handling or transfer of "bulk" data. The majority of language constructs (i.e. math operations and branching) are common across languages, often the syntax itself is similar, and the underlying assembly language to support these operations is the same. Often the only differentiator is memory use, both in quantity and efficiency of memory management, and this usually only affects speed if the application is running near the limits of physical memory. This observation seems born out as with little effort, one can find a benchmark for any language showing that it is "fastest."

In summary, I would state that when selecting (one or more) languages for a new project, language "speed" ranks very low in importance. For most operations, the differences in languages are minor and often masked by processor speed and other design criteria.

-- WayneMack

Does this apply when designing a language too, i.e. is speed something that a language designer/compiler writer should worry about? Basically, I want to know if I have to worry about really fucking up my language speed-wise through language design decisions?

The research I've done on ContinuationImplementation, KeywordParameterPassing, ImplementingMultipleDispatch seems to indicate that most features, no matter how advanced, have fairly small performance penalties. Continuations can be had for a 5-10% slowdown across the board, keywords are just an array lookup, MultiMethods can be done in quasi-constant time, though the constant is fairly large (an array lookup for each active selector in the method). I'm wondering if this squares with your experience, and if there're any particular "gotchas" when implementing a compiler, things that could make it run multiple factors slower.

But it seems like speed considerations do come into play with a lot of language implementors, mostly concerning the cache. Dylan introduced sealing so that the appropriate method could be selected at compile time, eliminating the run-time dispatch entirely. Dispatch-DAGs for multimethods are preferred because dispatch tables require a lot of non-local memory accesses, potentially blowing the cache. One of the main reasons ParrotCode uses a register architecture is that stack machine have a higher memory load, and hence cause more cache misses.

Any advice for would-be compiler writers? -- JonathanTang

Purely my opinion, but focus on the big picture of programmer productivity. Do not ignore compiler speed, but don't give it undue priority either. Correctness is highly important; a single miscompiled statement will probably waste far more of my time than even the slowest compiler. Make sure compiler error messages are clear; I've wasted countless hours tracking down silly errors that were not clearly reported. Make sure the compiler has a way to escape endless loops; I have a preprocessor from a major vendor that never returns if it is given certain typos. Lastly, define a good benchmark. Don't load the benchmark up with worst cases, make sure it reflects typical use; otherwise you will spend time tweaking unimportant cases.

I remember the days of start the build and go have lunch. For the size of projects I am currently involved in, compiler speed is not a significant differentiator, it is usually the speed and ease of use of other parts of an IDE that cause me irritation in development. -- WayneMack


Combining the speed of C and a simpler language

One reason C is so fast is that many man-years of effort have gone into writing efficient optimizing compilers for it; some new language would have to duplicate at least a significant fraction of that effort.


Speed is not always the issue

Incidentally, Fortran is known to regularly wipe the floor with C in numerical computation; see comment above.

Of course, most programming tasks do not need the speed of C; for those, a higher-level language is usually a better choice. For the subset that do need high performance (and for which the designers are willing to sacrifice a high-level language), C is a reasonable choice. ForthLanguage is another; however there are many more C developers and C implementations than Forth implementation available.


Some claim that it is possible to tell the difference (in performance) between an application written in CeeLanguage (or ForthLanguage or AssemblyLanguage, or some other low-level language) and one written in a HighLevelLanguage. It was proposed to implement a similar low-level language (but one easier to program in than C, which contains lots of pitfalls). Such a thing is certainly possible (and has been done; for example ModulaTwo, etc.) but a considerable amount of effort has gone into writing efficient optimizing compilers for C; some new language would have to duplicate at least a significant fraction of that effort.

It was then re-emphasized that many programming tasks do not need the speed of C; for those, a higher-level language is usually a better choice. For the subset that do need high performance (and for which the designers are willing to sacrifice a high-level language), C is a reasonable choice. ForthLanguage is another; however there are many more C developers and C implementations than Forth implementation available.

Regarding optimizing compilers: DEC tried to grant all of its languages the same optimization attention by turning all of their compilers into front ends that produced code for a virtual machine. This code was then optimized and compiled to a specific physical machine. In this way all of their C optimizations were available for all of their languages. Microsoft has chosen the same strategy with .NET.

HP has done the same for many years, and very successfully; their RISC code optimizing back end has been world class for something like the last decade.

The GNU compiler suite does something similar, but more ad hoc.

[The comparison to DotNet seems to be questionable. CommonLanguageRuntime code is targeted towards a VirtualMachine that is executed directly, not translated to the instruction set of the target processor (though JustInTime compilation is done). What is being spoken of above is an IntermediateLanguage (or an AbstractMachine), which is intended for subsequent translation into something else. While the AbstractMachine can be implemented; that's not its intent.]

.NET's "JustInTime" compiler actually compiles the CLR code when the application is installed or run the first time. The virtual machine (or abstract machine if you want to call it that) doesn't execute the code; the physical translation persists. This is very much like DEC's GEM line of compilers, with the final compilation step delayed until installation.


And now, a humorous interlude.

 Some languages ain't fast
 Some languages ain't slow
 Some languages are just half-fast


One domain where core language speed still matters a lot is video games. C/C++ is used in almost all video games. (One company, NaughtyDog?, uses a Lisp dialect for their PS and PS2 games, but it seems they may be switching back to C for their PS3 games.) See LispInJakAndDaxter.

If speed is of over-riding importance, why don't they use AssemblyLanguage?

I think the summary at the top really says it all. You are right, and everyone is arguing past each other.


Given the pipelining and multiple execution units of modern x86 processors I'd say that there are few instances where AssemblyLanguage is warranted. Here are some:

As to writing the innermost loop in assembler I've seen the Wind River PowerPC compiler condense a relatively small block of three nested whiles into the assembler equivalent of one while. It was impressive, and doing something with the innermost loop there would have made the end code much slower. - OlofForshell


DecemberZeroFive

CategoryComparisons, CategoryPerformance


EditText of this page (last edited October 22, 2014) or FindPage with title or text search