Well Factored Code Leads To Better Optimizations

Does WellFactoredCode lead to better optimizations?

I think that sometimes it depends on which kind of optimization you are pursuing. If you are a game developer, you may be trying to squeeze every cycle from the CPU, and thus a not-well-factored code may be better than a well-factored one (See also MostGamesProgrammersDontGrokObjectOrientation). But if you are an application developer, you are looking for other kinds of optimization: scalability, mantainability etc. In these cases, having a WellFactoredCode is more a requirement than a desire.

If I were looking at a piece of code with a view to performance optimizing, I'd certainly hope to start with well-factored code. Identifying that a 400-line function is a CPU bottleneck is all very well, but which bit? Furthermore, since optimization and readability/maintainability are often mutually exclusive, being able to optimize the hell out of a small function whilst keeping the rest of the code well factored is always a bonus. -- JayBell

Many processors are sensitive to stack depth. Deeper stacks, which you have with well-factored code, is significantly slower.

That's a strange comment. Did you mean simply that function calls are slower than not doing function calls? Because actually, processors that are literally sensitive to stack depth are quite rare. Basically, those that have a fixed size hardware stack that supports only e.g. 8 levels of calls, as is seen in some microcontrollers - sure. But I feel like that's not what you're talking about. It's been a while since I thought about register windowed architectures; hmm... are you talking about some cpu that has enough registers to cache all shallow calls in registers but not, of course, for arbitrarily deep calls? Does the Sparc do that? I've forgotten. In any case, none of this adds up to "many processors". For almost all processors, the stack depth is completely irrelevant to speed.

My game developer friends tend to the same opinion, and would consider the overheads involved in jumping around between lots of small functions to be a waste of time. Some (many?) optimizations lead to less well factored code, which is why we're urged to avoid optimizing until we really need it and try to keep it contained.

The question then is whether to get the game running slowly in a well-factored system and then (via small refactorings like InlineMethod) move to something which is very poorly factored but far more efficient. Most of my friends would say the extra step would be a waste of time, better to keep it poorly factored but efficient all the way through the process. Has anyone tried highly optimizing a well-factored piece of code?

Certain optimizations are more destructive in terms of readability/maintainability than others. For example, rewriting a piece of C code in assembler will pretty much paint you into a corner, whereas replacing a dumb algorithm with a more intelligent one needn't. With games programming, monitoring performance is something you do often and early, not just at the end of the project. If one of your performance targets is to run your game at 60Hz at all times, you need to be sure that you're within manageable reach. Expecting to achieve a 10-20% speed improvement via optimization is not unrealistic. If you're after 200%, then you need to optimize... NOW! -- JayBell

Has anyone tried highly optimizing a well-factored piece of code?

Yes indeed. We had something that was taking 210 msecs when it desperately needed to take no more than 50 msecs. There were like 12 method calls, deeply nested. I inlined a few methods and low and behold we were below 50 msecs. I and several others were quite astonished it was that easy. All sorts of draconian solutions had been proposed. But the inlining did with about 10 minutes worth of effort.

The nice thing about InlineMethod is that your code remains well-factored!

One area I've seen where good refactorings can have significant performance benefits is with distributed systems. Many newcomers to technologies such as CORBA, DCOM, Java RMI, etc. will "design" their network interfaces quite poorly, and then refuse to change those interfaces after initial design. By following principles such as the LawOfDemeter, one can often refactor the interfaces in ways that dramatically decrease the number of network round-trips.

The types of refactorings that are beneficial to distributed systems are not the same as those that are beneficial to in-process code. Rather than having lots of small methods, one needs to think about how to simplify the interface to minimize the number of cross-process calls required to perform any given transaction.

This sounds like refactoring is an optimization in itself

Clean code lends itself well to high-level optimizations, such as being smarter in dealing with I/O bottlenecks.

Peephole optimizing too early can obstruct high-level optimizations that would have provided much larger gains. (BlindAlley)

Sounds like we are generally agreed that WellFactoredCode is certainly no barrier to optimization... but is it easier to optimize, or does it lead to better optimizations?

[D.C. I do not agree that the following is better deleted; was that an accident?]

InlineMethod is certainly the most obvious optimizations for WellFactoredCode, and can yield amazing results, particularly if your target hardware has a tiny I-cache (stand up, Mr PlayStation 2). Once you've made it past this fairly obvious milestone, is WellFactoredCode going to be any easier to optimize?

Inlining is also not controllable in C++. The compiler may or may not inline your code. Some later change can turn off the inlining and nobody will know what happened. In these cases you need to unfactor your code.

Using gcc, extracting a method tends to lose you the optimizations that can be done across the method boundaries due to all kinds of reasons. However, if you mark the function "static" (file scope) suddenly gcc can optimize it as if it was inlined - even if it was called from multiple places within this file. gcc often inlines the entire function, or at the very least bypasses the usual stack handling that's required. I personally think that compilers should be able to InlineMethods on their own during optimization.

Old Smalltalkers who read about KentBeck before XP came may remember he used to advise: MakeItWorkMakeItRightMakeItFast, in that order. First make it work. Second, refactor it correctly. Third, after profiling!! make it fast enough by doing the one most bang-for-the-buck optimization, then the next, then the next, etc.

He gave all sorts of reasons why you would want your code well factored before you tried to optimize it. One big reason being it isolated the slow code into a single place.

Having read this and the MostGamesProgrammersDontGrokObjectOrientation, there's a picture forming why some (especially games:) people consider well-factored code an obstacle to optimization.

Basically, factoriing introduces artifacts that run exactly counter to extreme optimization goals. A well-factored program usually has small routines, and small(ish) data structures. Unfortunately, those tend to get scattered all over creation. Each method invocation is costly not only for the call, but also because it blows out the instruction cache. Loading data from several location ruins cache coherency for the data cache.

Those don't necessarily matter for most application - it's only noise. If my main algorithm takes up 50ms, (as per the above example) saving an additional 0.5ms doesn't matter. Saving half a millisecond in a game, per frame, promotes you to optimization god status. Games, at least at their core, are all about utilizing CPU and bus as close to 100% as possible. In that case, well-factored code prevents you from achieving the goal.

Still does not answer the question whether well-factored code leads to better optimizations in this particular domain. If optimization was only rearranging and inlining code, then I'd tend to agree. What makes it hard to go from well-factored to raw performance is the fact that you need to rearrange data structures. (Which begs the question if game engines weren't better served with a language that is concerned with data flow... But that's another topic)

I still hope to be proven wrong, and I'll keep experimenting. If anybody has any hard data on this, a few pointers would be much appreciated.

Many modern games use an architecture with AlternateHardAndSoftLayers, so worrying about method size is not really relevant. Optimization is required only in those parts of the code that move huge amounts of data around or are very CPU intensive: graphics and audio processing and, perhaps, physics simulation. Even there, performance gains are often better achieved algorithmically: BSP trees, octrees, hidden surface culling, dynamic level of detail reduction, and so on.