Is Dynamic Typing Sufficiently Efficient

The operative question is: Efficient for what?

Having digested (I think) the various differences between statically and dynamically typed languages, there is one important point that I want addressed: performance. I work in a domain where performance is very important. In fact, most of my day-to-day design decisions hinge on balancing abstraction and performance (not that a gain in abstraction always implies a gain/loss in performance).

Could someone answer the following question honestly: Could Quake have been implemented in Smalltalk? I fear that the answer is 'no', since the run-time overhead of checking dynamic types would become a bottle-neck.

Doom and Quake had to push the available hardware to the limits. People played Doom on i486 boxes! Yet, despite the need to push pixels as fast as possible, these games still exhibit the 80/20 or 90/10 rule or whatever you want to call it; their execution time is spent in a few hotspots. My guess would be that it's in the clever code which draws the actual pixels as it figures out what color to use by projecting the screen coordinates into the texture map data. This task, which is repeated for every pixel of the viewport, is so time-critical that no high level language may be good enough; some parts of it may have to be done in assembly language. In fact, the Quake source code, I seem to recall, contains drop-in assembly language replacements for some of the C code. These days, people buy special video cards that do the texture mapping in the hardware, and so the programming language is irrelevant; you use some API to talk to hardware and that's it. There is much more to a game like Quake than just pushing pixels; the higher level routines could well be written in a high level language, and even benefit from it; there is no reason to use the same programming technique for managing the higher levels of the game as is used for the pixel-level calculations. That being said, the pixel-level calculations could be written in a dynamic language; but they would have to be strapped down with tons of declarations that would enable the compiler to generate code that manipulates unboxes [?] integer types. The people who developed these games were comfortable with C, and so that's what they used everywhere. There is probably a connection between their mentality and their success in completing games like this at the time when they did. There is a psychological reason why you could not use an intelligent programming language for such a project: the hackers who work with such languages could not stay interested and motivated long enough to crank out a boring first-person shooter. -- KazKylheku

It's unlikely to become a bottleneck in any case. The cost of dynamic typing is largely distributed across the code and process. It tends to slow the whole process pipeline down a little, whereas a bottleneck causes work to build up or halt before it and prevents work from happening at the ideal rate after it. Dynamic typing gets you UniformlySlowCode, where the slowdown depends largely upon the complexity of the type-system. Faster processors tend to run UniformlySlowCode at a higher rate in proportion to their processing speed; given that processor speed has increased exponentially and multiplied a great many times, it's very much the case that Quake could (now) be implemented in SmallTalk. Heck, it could probably be run in Smalltalk on a virtual computer running an emulator...

Essentially, Dynamic typing is a coefficient cost. It doesn't effect big-O cost at all. Thus, in domains where the big-O cost is overwhelming (e.g. AI, Automated Theorem Proving, Travelling Salesman, Sorting very large lists, etc.), the answer is: definitely yes!. However, in domains where the coefficient cost is of extreme importance (e.g. realtime systems, including video display), the answer is: usually yes, except when hardware resources are limited, but it's often better to use the static typing to reduce this cost as much as possible so you have time to do other things. Technically, big-O cost will always outstrip a coefficient cost except for O(N) and lower, but in practice coefficient cost matters up to about O(N^3) because, quite often, the set of operation is small or looks more like O(M*N^2) where N is smallish and M is largish.

Computer display has a soft realtime constraint (X frames-per-second), thus so does any processing that must occur between frames. Realtime processes must have a cost-per-frame that falls strictly under the frame width. Any higher than that and you can't make any hard guarantees. Faster processors let you fit more, but better is to shrink the cost in each frame so that even more realtime work can be done. One very easy way to do this is to use static typing where you once used dynamic typing.


Does anyone have experience comparing the performance between similar dynamically typed and statically typed programs? (Have you tried the ComputerLanguageBenchmarksGame? It's the ultimate LanguagePissingMatch.) Dynamically typed programs in a real-time domain? etc. I am not interested in the constant case studies being quoted (especially in XP forums) about how using Smalltalk made your accounting application easier to develop/faster/safer etc. Even Kent's book on XP seems to assume this domain, and does not address issues that could be pertinent to other domains (such as mine).

If performance was the only requirement, I would write everything in assembler. But I also work in an industry (the game development industry, to be exact) that is highly competitive and where technology evolves very quickly. Thus, statically typed languages see to be the ideal compromise. Could someone in favour (or a devil's advocate) of dynamic type systems please address this short-coming...

And please don't give me the "in the future computers will be faster, so it won't matter" BS. That's just a lazy attitude, and I don't have time to wait.

-- Matt

AlternateHardAndSoftLayers because PrematureOptimization is evil. There are those who believe Quake could have been written this way. (It was - see below.)

High-quality, rapid development is much more important than imaginary speed requirements. If the customer wants the program to run at a certain speed, then this should be put into a story and have an automated test to match. Our job is to do the simplest thing that can possibly make the tests pass, because to do any more is to steal from the customer.


AlternateHardAndSoftLayers because PrematureOptimization is evil. There are those who believe Quake could have been written this way.

Probably because it was? I asked Google about QuakeC, and it says http://www.gamers.org/dEngine/quake/spec/quake-spec34. Quake C was the scripting language for Quake. In fact, it still is, but I doubt enough people play it to keep it in the present tense, as it's been superseded. (CounterStrike. WormsArmageddon.)

However, even the latest Quake engines (QuakeThree? and its derivatives) use a similar concept - a byte-coded scripting language running in a VM - for the "game logic" portion of the system, quite analogous to QuakeC, only faster.'


Ok, here are some real figures from http://www.math.purdue.edu/~lucier/615/software/ concerning matrix manipulation code written in the dynamically typed SchemeLanguage.

"The software is relatively fast. For example, the sparse-matrix/vector multiply code runs in 10 cycles per flop on our Compaq DS20 clone (dual 500 MHz 21264s with 4 megabyte caches) after compiling the C code generated by Gambit-C ... The same code written in C, with the same matrix structure, the same compiler, and the same compiler options, has exactly the same performance."

The whole static/dynamic typing distinction is really a smoke-and-mirrors show. MrSpidey statically types SchemeLanguage programs and the VortexCompiler statically types SmallTalk programs.

-- NoelWelsh


You may also want to look up the "Stalin" Scheme compiler. It supposedly generates very fast code.


My pre-conceptions about the performance of dynamically typed languages stem from not having personally done performance comparisons. However, I would like to know HowDynamicTypingIsImplemented in most systems. This would go a long way towards giving me a deeper understanding of what the base-line performance trade-offs may be.


There is a comparison of Lisp, Scheme, Java and C++ available at http://www.flownet.com/gat/papers/lisp-java.pdf.


In all honesty, Quake could have been written in Smalltalk, but we need a bit of revisionist history first. Back when Quake was written, the majority of the execution time was spent in software rendering. Really, this took 90% of the time. The game was doing things like z-buffering in software, which is amusing in retrospect. Fast forward to today. Given a modern video card, you could upload an entire Quake level, completely ignoring all visibility information, and render it in a handful of milliseconds. This rendering, BTW, happens entirely on the video card, so the mainline program keeps ticking along while it occurs.

Yes, there's more to a game than graphics, but once the hard rendering stuff is out of the way, then the door is open to use whatever you want. Remember, it's typical for modern 3D games to lean heavily on interpreted scripting languages as it is.


See SelfLanguage, StrongTalk.


To explore the question in detail, we need to know what are the runtime costs associated with DynamicTyping (compared to StaticTyping with DynamicDispatch - i.e. a C++ virtual function call, and StaticTyping with StaticDispatch - i.e. a function call in a procedural language, or a non-virtual call in C++). In general, costs are incurred when one tries to perform an operation on an object - access one of its fields/attributes or call one of it's methods. Method call is probably the more important concern, as accessing fields (directly) from outside the object is generally ConsideredHarmful in most ObjectOriented languages:

All of this assumes execution on a real CPU, not a VirtualMachine (DynamicTyping is less expensive on a VM due to the VM's own overhead, and many VMs can implement the dynamic operations as a single virtual instruction).

In the case of static typing, static dispatch, it's simple. The function address is known to the compiler/linker and can be hardcoded. For PositionIndependentCode?, an additional indirection may be required. However, not much else. The overhead is generally a few cycles, and is O(1).

In the case of static typing, dynamic dispatch, it gets a bit more complicated; and we have two cases to consider: languages with MultipleInheritance and those without. Without MI, it's a simple lookup in a VeeTable; since the compiler knows the offset at compile-time it's an O(1) operation. With MultipleInheritance, it's still an O(1) operation, though the call overhead is noticeable larger (an additional indirection is required). Several books (including ProgrammingLanguagePragmatics and DesignAndEvolutionOfCpp?) explain the mechanism in detail. (It should be noted that the dirty hacks, uh, mechanisms to pull this off, are a leading cause of the FragileBinaryInterfaceProblem in languages such as CeePlusPlus). It should also be noted that typically, there is no way to handle failure (i.e. DoesNotUnderstand); typing errors must be caught by the compiler or not at all.

In the case of DynamicTyping, a lookup needs to be performed; and it is assumed that failure (DoesNotUnderstand) is a possibility. In most languages with DynamicTyping, mapping of methods onto a sequence of integers (as is done in C++) facilitating array lookup is not possible--many such lanaguages assume no compile-time coupling between models. (For C++ to call an object's member function, the compiler must have the entire class definition available). Instead, "lookup by name" generally needs to be performed. One common arrangement is through a HashTable. This requires O(1) time in the average case, though it can be as bad as O(n), where the class has n methods. Furthermore, the time to perform the hashtable lookup is several cycles longer than the corresponding VeeTable lookup; that overhead is inherent in HashTable implementations.

Many modern VirtualMachines implement caches (SelfLanguage was the first to do this, see PolymorphicInlineCaching?) to speed this up. The VM JIT-compiles the bytecodes to native code, and the lookup code is placed inline with function's machine code.

The above analysis only considers SingleDispatch languages; MultipleDispatch makes things even more interesting.

Which brings us to <what happened to the rest?>


With MultipleInheritance, it's still an O(1) operation, though the call overhead is noticeable larger (an additional indirection is required).

I don't think so, at least not as commonly implemented in CeePlusPlus. The common implementation for multiple inheritance is multiple vpointers per object, and depending on the (static) type of the pointer, it points to different offsets into the object, and different vpointers. It does, however, make casting, including implicit casts, not no-ops.


I think some people are glossing over some of the real benefits of static dispatch over dynamic dispatch. Static dispatch allows expanding code inline, and that is generally not possible with dynamic dispatch. The effect is dramatic, the entire STL of CeePlusPlus is based upon this, and when used with a good compiler, it tends to be comparable to hand written assembly. (That is the power of templates: compile time polymorphism.) It's not just the added benefit of expanding inline; it's also that expanding inline allows further optimization opportunities; constant propagation, more inlining, etc.

I have timed this myself. I rewrote a simple stack-based expression evaluator, an important part of my companies product, in C++ and Java. I used vector and all of the common abstractions to C++ and did not have a single explicit new or delete in my source. I used ArrayList, and all of the common Java idioms in writing the Java version. The Java version ran ~4.5x slower. Moving from ArrayList to built-in arrays, I saw a speed-up of 2x, making it still ~2x slower than C++. This difference can only be the dynamic dispatch. (The remaining 2x probably includes bounds checking, garbage collection overhead, etc.) (Note that moving from C++ vector to C++ new'd and delete'd built-in arrays, I saw runtime performance stay about constant. That is a power of C++ and static dispatch: some abstraction without performance penalties.)

Yes, the 80-20 rule applies, and a majority of my company's product could be written in Java and a small portion written in C++ without suffering any real performance penalties. That is, however, a red herring. Java is slower. Dynamic dispatch is slower. To claim that "oh, but you can write the part that matters in a language with static dispatch" is itself admitting that dynamic dispatch is slower at runtime. However, the opposite extreme is just as bogus; you don't have to write it all in assembly to get a comparatively fast product. AlternateHardAndSoftLayers


Hardware Bias?

Static languages tend to fit existing CPU's better. This is because benchmarks are usually for commercial apps written in C/C++ (which is influenced by Fortran's types), and other static languages tend to use the same base types that C/C++ does. Chips that supported dynamic typing may do better relative speaking. For example:

In short, the base types of the hardware tend to match static languages rather than dynamic ones. Other dynamic-language features besides types may also be considered, but tend to vary widely per language. -top

I agree that there is a 'hardware bias' for certain language features. Much could be done to enhance hardware and OS support for garbage-collection, laziness, data-flow programming, persistence, vector computations and wide-memory operations, concurrency, and content-addressable memory. But I do not believe you can reasonably blame the relative performance deficit of common dynamic languages on the hardware - not with the above HandWaving conjecture. ProfileFirst. Run a simulation. I suspect memory-to-CPU bandwidth, locality-of-reference, and the overhead from conditional statements, will defeat many of the benefits you're imagining. And I wouldn't be surprised if the features you're suggesting would fail to help as much as you imagine from a return-on-investment standpoint, even if used effectively.

If you're aiming to improve performance of dynamic languages, look into late-bound static analysis and code-generation (like JustInTimeCompiler or OpenCl or CUDA, perhaps LowLevelVirtualMachine). Then avoid features that make the language difficult to analyze or optimize (like MetaObjectProtocol and RuntimeReflectionIsaDesignSmell). Static typing is just one form of static analysis that lets you take better advantage of limited hardware resources.

Can't you start any conversion without insults ("HandWaving"). Grow some people skills, for petesake, Mr. Wavy.

                        if( evidence weaker than CONJECTURE on EvidenceTotemPole) {
                          label = HAND_WAVING;
                        } else if( reasoning_is_fallacious ) {
                          if( in_bad_mood ) { AssumeMaliceNotIgnorance(); label = SOPHISTRY; }
                          else              { AssumeIgnoranceNotMalice(); label = STUPIDITY; }
                        } 

If you want respect, then find a different forum for your ideas. You've long ago lost the respect of most everyone on WikiWiki (even if you think you're in the right), and you do much to reinforce those opinions. If you want respect, you won't earn it by whining about other people's "people skills", because that's just hypocrisy. Grow some research skills, TopMind. Pursue your ideas rather than spewing half-formulated conjectures and WeaselWords across the pages of WikiWiki. If you're going to say that certain ill-defined new ComplexInstructionSetComputer operations "may do better relative speaking", you should have an argument for it. You do not. What you have is WishfulThinking backed by HandWaving.

I question your popularity assessment; but more important, repetitious rudeness is NOT going to "fix" me (make me behave in the way you want). I dissagree with your general criticism of me, and rudeness won't change that. I am not evil and wrong in my mind. If by chance I am delusional and really am bad and wrong without knowing it, insults STILL won't fix that. I think YOU are the jerk who has a wrongheaded approach to evidence; the Ted Nelson of mass bloated type-a-trons.

Your faith in yourself is noted, as is your hypocrisy or delusion (delusion if you believe calling me a 'jerk and wrongheaded' will 'fix' me, hypocrisy otherwise). I have no delusional expectation that insults will 'fix' you, but since you're suggesting I should focus on useful contributions perhaps I should write an anti-TopMind script.

For somebody who pretends to be "logical", why do you require insults if you know they will not fix the allegedly fucked-up toppie dude? Why not spend energy on the subject rather than spend it on repetitiously complaining about how bad/dumb/evil/delusional I allegedly am? YOU have a problem.

Logic is in the reasoning for an argument. Wisdom is in the prioritizing which arguments are worth the effort of reasoning about. Don't confuse the two. One can make perfectly logical arguments that are utterly wasted effort. I must admit: that I bother engaging you at all runs doesn't say anything positive about my wisdom. Anyhow, it was your choice to reply to my above comment by focusing all your attention on a perceived insult "HandWaving" - which, you should note, other persons did not consider an insult - rather than focusing on the greater bulk of my reply which was very subject oriented. If your goal was to focus on the subject, you FUBAR'd that opportunity with your futile whining.

I was simply raising the issue of chip primitives matching language primitives. Why you perceive that as a bad thing behooves me 100%. I suspect you have asberger-like mental/social problems and don't weigh social issues the same as "normal" people.

If you're "simply raising an issue", then phrase it as a question or describe your thoughts. Don't make assertions. You made assertions - about the reasons CPUs are fit static languages, about what may help - without evidence. Oh, and since you feel adding off-topic suspicions to the end of on-topic claims is productive: I suspect you fit in quite well with a certain class of "normal" people - UnskilledAndUnawareOfIt, arrogant, think they know how others should do their jobs, and inconsistent in thinking much of themselves despite their self-esteem being so far into the gutter that they're greatly affronted by any perceived insult - the whiny, pointy-haired bureaucrats and middle-managers.

If you wish me to reword it, then simply say so, perhaps with a sample reworked sentence or the like to suggest a direction to go in. Whiners are a dime-a-dozen. Learn to be a fixer.


In his defense, I must agree that hardware has a "bias" towards static languages. I could easily see dynamically typed languages gaining a boost with new hardware instructions. And this is coming from a C++ guy. -Impartial Observer

Which hardware instructions, in particular? And why do you believe they'd aide a dynamic language more so than a static one?

Sadly, I'm not the most familiar with hardware, so what I said may be your "Hand Waving". However, could it not benefit from perhaps some specialized instructions for calling virtual functions, or maybe specific string comp functions. Maybe I am wrong. My little understanding tells me otherwise just as a gut feeling. It's just that all of hardware today is predicated on static branching, static function calls, etc. What if you had a chip which natively supported dynamic function calls like how C++ does in software? That might help a lot. -Impartial Observer

Funny thing about intuition and alimentary titillation: they become far more accurate in the presence of actual knowledge and experience. I'm familiar with hardware (written and tested hardware drivers, written a compiler, taught assembly language to undergrads) though it is not my interest. I'm moderately experienced with implementing languages - both dynamic and static.

When interning for LSI Logic (www.lsi.com) I tested firmware drivers for RAID cards which had interesting features for 'wide memory operations' that operate on a region and range - i.e. memory copy, compare, rotate, bit-shift, add, or, and, xor - for regions of memory. The RAID card had its own dedicated CPU, Memory, and OS, but these wide-memory ops were actually implemented by a separate processing unit (not a full CPU) sitting near the memory itself. Such a feature would be useful for string equality/gt/lt comparisons... though not so much for parsing.

But it's worth noting that (a) Dynamically typed languages usually already have fast string-comparisons once they get around to deciding they need to compare two strings. I.e. they call a 'static' implementation under-the-hood. (b) Implementations for dynamically typed languages do not typically perform string-comparisons for determining type information or selecting functions. They more often perform some sort of symbol-interning (http://en.wikipedia.org/wiki/Intern_%28computer_science%29) reducing string-comparisons to simple hash or pointer comparisons. As a consequence, I don't expect that speeding up string-comparisons would help dynamically typed language implementations all that much.

Similarly, I can think of many other paired memory+CPU features that would be nice for supporting DataflowProgramming, PromisePipelining, and EventDrivenProgramming, and lazy and lenient evaluation styles (especially with concurrency). The VirtualMemory? design already in CPUs would be excellent for PersistentLanguages and for realtime, concurrent GC - but the common OS's don't currently offer the necessary interaction with the vmem to do this efficiently in a language running atop the OS, so some OS features wouldn't hurt. But, much like the above, I don't see these features helping dynamic languages more than static languages. And then there's many features that really require typing to be used at all... i.e. CPU support for per-core memoization tables would be very nice (i.e. memoizeK might put a jump in one register and up to K one-word arguments in K other registers/memory offsets, then have at least one result value. CPU support would allow an already-memoized application to be tested easily) - but such a feature requires some sort of effect-typing (i.e. for "purity") at the language layer to automatically leverage at the CPU level... though even dynamic languages could leverage it for certain 'pure' actions (like vtable lookups).

'Virtual functions' lack any real CPU presence; a virtual function to the CPU only means a pointer to a function. One might be able to parallelize a lookup a bit - i.e. add support for 'hashtables' of some sort - but we need to face issues of AmdahlsLaw. To determine whether a faster 'virtual function lookup' could help, you need to profile: (i) determine how much percentage of overall time is spent in these hash lookups (since that's the maximum theoretical improvement), (ii) determine how much of that lookup cost is in the CPU as opposed to costs on the cache BUS or obtaining the arguments (since this determines the maximum benefit you can actually obtain by a new CPU instruction - which may be less than you could obtain with a larger architectural change or new 'memory' instructions).

Without ProfileFirst or some research, even I couldn't say exactly how much you could theoretically benefit from enhanced hash lookups. My 'gut feeling', though, is that it isn't as much as you're imagining.

As far as HandWaving goes, I don't consider brainstorming and WishfulThinking to be HandWaving by themselves. It's assertions as a consequence of WishfulThinking that I consider HandWaving. When you say "X might help a lot", but you can't defend "X would help a lot" any better than you could defend "X would not help at all", then it's clear that you're making wild stabs in the dark. Prayer to the great, invisible spaghetti monster (http://en.wikipedia.org/wiki/Flying_spaghetti_monster) might help a lot, too. Right? (Surely we should raise the spaghetti monster "issue" for serious discussion.) I consider it dishonest (and sophistry unless there is self-deluding dishonesty involved) to make stabs in the dark without making explicit effort to be totally transparent about it. (How would you appreciate it if you went to a forum where car-repair 'professionals' congregate, and you read advice or suggestions without it being clear they were just random WishfulThinking and thoughts? Misleading a potential audience is not a prerogative or privilege that should be permitted.)


Gee, reading the communication (lack there of) that is taking place on this page, causes me to raise the question: "This is how professionals communicate?". I'm glad that I am a very much armature in this field. There have been some comments lately that I would say answer to the issue of the difficulty of this conversation. Based on the probability of one comment being a 90% (or greater) correct health diagnosis, I recommend to any party that can do so, to quit this discussion ASAP and save your own health and sanity.

Is this how professionals communicate? No, not among peers. Experts in one area are quite often average in other areas. Most experts know the limits of their knowledge. Problem is, a certain individual who 'contributes' to the WikiWiki is not only UnskilledAndUnawareOfIt: he's often willfully ignorant of many topics outside his expertise, but decides to contribute regardless of his ignorance. He thumbs his nose at prior research, and does no research of his own. The above... is not a discussion between peers. Peers, even rivals, will often have mutual respect. The individual I speak of is worthy of respect when it comes to his own little niche expertise. But his repeated 'contributions' outside that niche have lost him the respect due a professional and peer.

And your suggestion to abandon the discussion ASAP is undoubtedly a wise one. Perhaps I shall do so.

Because you know that would make me happy, and you cannot have that happen.


Q: Could someone please provide an example of "and the overhead from conditional statements" (above) in a common programming language(s)? Thanks.

A: You can't "see" overhead from conditional statements by examining program code. The best such program-code could do is help explain what conditional statements 'are', which typically includes if-statements and if-expressions. At the CPU-level, conditional statements are commonly BEQ (branch if equal), BNE (branch if not equal), BLT (branch if less than), BLE (branch if less or equal), etc. - though names will vary based on assembly language.

At the CPU-level, conditional statements are relatively expensive. To understand why, you need to know how most modern CPUs 'pipeline' instructions (http://en.wikipedia.org/wiki/Instruction_pipeline). For example, an 'add' might take multiple CPU cycles, but pipelines would allow many such operations to occur at once, resulting in a low average cost. The WikiPedia article can explain it better than I can. Problem is, when code might branch, the CPU pipeline has a couple options: (a) stall until the pipeline tells you which branch to take, or (b) make a guess, process on one branch, and cancel that work and restart if the wrong branch was taken. Seeing as (b) is more productive on average, it's the common choice.

To help make good guesses, the CPU keeps some form of 'branch predictions' table (http://en.wikipedia.org/wiki/Branch_prediction). Sometimes programmers can also help out a bit via profiling and compiler techniques, or by taking advantage of 'default' branch predictions (e.g. 'if' conditions compile often to 'default: predict branch-false' because the "if else if else if else if else" pattern tends to 'fall through' multiple conditions).

A language with DynamicTyping will typically involve more branches than a language with StaticTyping. These branches occur when operating on the language elements, since tests are required to validate operands, to determine whether a conversion is necessary, to decide which operation the '+' operator means (i.e. is it concatenate-string or is it add-integer or is it add-bigint-to-smallint or is it add-doubles...), to perform conversions, to decide which etc. These branches aren't visible in the program code: they exist as part of the language runtime and implementation. A language with StaticTyping - or even one with SoftTyping - can eliminate many of these hidden branches by taking advantage of the type analysis.

Of course, the branch-prediction is still in effect - and there's no reason to suspect the predictions will be worse than in static languages - but having more branches means that the predictions table is under equivalently greater stress. And then there's the simple instruction-cost of these extra branches... which can be a non-trivial percentage of overhead.

I'm generally against overloading operators such as "+" (see ComparingDynamicVariables for alternatives). Such is not a necessary requirement of a dynamic language. I'd still like to see further examples of necessary or common increases in internal test-and-branch.

Removing operator overloading by introducing even-more-overloaded primitive functions like those described in ComparingDynamicVariables is not going to help eliminate excessive test-and-branch internally. Try to spend some time implementing it. And if you are truly intent on seeing examples of common increases in internal test-and-branch, you are certainly free to spend some of your time comparing JIT code for Perl, Python, and JavaScript to C/C++ LowLevelVirtualMachine (LLVM) code (http://www.llvm.org/demo). You can do your own research. Do not ask me to spoon-feed you.


Function Calls

Most (?) dynamic languages implement function calls by looking up the name in a hashmap to get the function pointer. That's a shitton of branches compared to a static function call. Even with a virtual lookup cache, that's a lot of branches compared to 0.

There are at least 2 possible improvements. One is to implement the look-up on the chip, perhaps even using some parallel processing to "fan out" the hash traversal. The second is to replace the name with the actual address in a parsing step when producing P-code (or something similar). True, the function definition may not exist yet during load; but a slot can be reserved even before the definition is encountered (initialized with an "undefined" flag).

The name-to-address translation would mostly be during code loading and pre-processing, not run-time. Thus, function calls don't need to use names and could use raw address pointers at run-time. Most of the heavy-use dynamic languages I've used over the years use something similar to P-code rather than operate on raw text (although I don't know their function name handling techniques). Thus, pre-processing is common. Note that static languages commonly support dynamic link libraries or something similar, which often require a few levels of indirection.

{Increasing "fan out" of hash traversal is feasible; both static and dynamic languages could take much advantage of that sort of pseudo-content-addressable memory (i.e. find the first K words equal to W between pTable and pTable+RANGE, then use the offset(s)). That said, it ain't worth the complexity-expense unless the speedup is a large factor by putting it all in CPU, so must be compared to simply unrolling a lookup loop in L1 cache and the sort of multi-branch prediction described above. The second concept - lookup when compiling code - is already performed by many language implementations, but does not much aide for locating methods under a particular object or if any other form of polymorphism is involved.}

Well, then cut down on polymorphism. In other words, one's style of programming and/or the features of the language may affect whether efficient or inefficient constructs are used. I'll agree that dynamic languages in general have "more opportunities" for sluggish areas, but that does not mean that using dynamic languages or techniques by themselves result in poor performance; or that performance tuning tricks couldn't be relocated to hardware.

{Polymorphism is a dynamic technique, is it not? And it was a cause of a lost optimization opportunity, was it not? That's a rather common pattern with dynamic techniques. Which dynamic languages or techniques are you imagining that don't have a performance cost? And what sort of hardware features are you imagining?}

{As far as "poor performance": that's not at issue - it was covered in the opening line. IsDynamicTypingSufficientlyEfficient must be asked in context. Only for some cases - rendering for state-of-art video-gaming, for example - is it not efficient enough on an absolute scale. And then there are a few more cases where a tenth of a percent is worth your salary, especially in scientific computing, rendering farms, statistics analysis. Sometimes PerformanceMatters. More often, it does not - predictability of performance (i.e. 'realtime'), or scalability, or ease of distribution, or ease of development often matters more, and dynamic typing doesn't hurt the former two and evidence suggests it helps the last two. When predictability of performance is the concern, it's "concurrency" (including LazyEvaluation) and "GarbageCollection" that are the big culprits.}


It may be possible to substitute addresses for method names in a similar way to functions, at least those with an "easy path". Consider this:

 function X(){
   a = new A();
   b = new B();
   if (2==readInput()) {
     Y(a);
   } else {
     Y(b);
   } 
 }
 function Y(myObject) {
   myObject.foo();
 }

I don't see any way "foo" can be statically bound to a RAM address. Either an IF (perhaps internal) or a name look-up seems necessary even in a static version.

{Neither name look-up (no name compares) nor an 'if' is needed in static languages. But a LayerOfIndirection (through a VeeTable) is required. In C++, for example, there is one indirection in memory, but no lookups or if-conditions. Instead of being bound to a RAM address, "foo" is bound to a particular offset in the object's VeeTable. The 'particular offset' approach isn't very feasible with DuckTyping, since it would mean every method table would need to have the offset for "foo" even if the method doesn't exist for that object/class.}

The VeeTable would at least be as costly as a conditional it would seem.

{That depends on context. But it isn't often that a single conditional can replace a single VeeTable lookup, so the comparison might not be too relevant.}


Re: "I work in a domain where performance is very important."

Have you considered that perhaps you are not using the TheRightToolForTheJob? If existing dynamic languages are too slow, then perhaps a compiled language may suit it better.


See: PerformanceMatters


CategoryPerformance


SeptemberZeroNine


EditText of this page (last edited January 11, 2011) or FindPage with title or text search