Distribution Is Optimization

Original statement (false):

Distribution is an optimization technique: you distribute a program across multiple computers because you want it to scale, or you want the user interface to be responsive, or for some similar reason.

Therefore, the decision to distribute should be made according to the RulesOfOptimization, and never up front.

Revised statement: (controversial)

Distribution is sometimes an optimization technique. Examples include scalability and responsiveness. When distributing for these reasons, the decision should be made according to the RulesOfOptimization, and never up front.

Distillation of discussion:

Distribution is not just an optimization technique. There are legitimate needs for distribution. See below for examples.

When distribution is done as an optimization, the traditional approach is to design this into the architecture up front. The DistributionStories page includes anecdotes about this approach succeeding and failing. No one has yet contributed a story about distributing an application after first developing it under a non-distributed architecture.

See also:

Arguments in favor:

Distribution definitely seems like it would be optimization. The impression I get is that people do it up front because the decision to make software distributed can have such a radical effect on the program's design that the idea of adding distributedness at a late stage is too painful to contemplate. But, then, I have little-to-no experience actually working on distributed apps; the experiences of those who have would be useful to hear here. -- FrancisHwang

Arguments against:

There are many different reasons for distribution, and optimization is only one of them. A common reason for distribution is to allow communication between people and systems that are geographically distributed or managed by different organizations. For example, the airline booking system allows travel agents around the world to book seats on flights of any major airline. Distribution is a necessity, not an optimization, in this application. Banking networks have similar constraints.

This example does not require distribution, but only a client-server network. Distribution is something different. This example could also be used to describe dumb terminals connected to a time-sharing mainframe.

For me, a "client-server system" is a distributed system. The server logic is centralized, but the system as a whole (which includes the clients) is distributed. Your mainframe example is also a distributed system, no matter how dumb the clients are. And anyway, your example is extreme. Most client-server systems have at least some small intelligence on their clients. And if the client can talk to the server via a protocol, it's already intelligent enough for me. Hmm... maybe centralization is just an abstraction: everything, if inspected closely enough, is a communicant, distributed system?

A similar reason for distribution is to share limited resources among people. This is an economic optimization, not an engineering optimization. E.g. (1) not every university can afford their own supercomputer, so they share a supercomputer and send jobs to it over the network; (2) It is a waste of money to attach a printer to every PC in an organization, so put one or two printers on the network and send print jobs to them from PCs using a distributed printing protocol such as NetBIOS.

Again, this is not an argument for distribution, but for client-server resource sharing.

RedundancyIsInertia - If you like your services to have good uptime, it's not a bad idea to have a bunch of servers that do the same things.

Interesting points:

Sequentialization is an optimization that eliminates the overhead of distributing the task. Most tasks are inherently parallel, so the simplest description should be parallel in nature. -- DaveWhipp (Tongue only slightly in cheek)
Is seti at home an optimization or the only way to solve the problem it wants to solve with today's technology?

: They are optimizations. SetiAtHome could've been done by buying a massive supercomputer; instead, they decided to use other resources. You could design ATMs to not be distributed, with one centralized access point that everybody would have to share. (Arguably, this is a function bank teller windows used to fulfill.) In these cases, of course, optimization seems extremely necessary. So perhaps we're dealing with a special kind of optimization that doesn't have to follow the RulesOfOptimization?
: SETI could not afford a supercomputer powerful enough to process data as fast as SETI at home. Therefore, the use of distribution was based on economic factors, not optimization factors.
: SetiAtHome could also have been accomplished on one workstation; it simply would require millions of years of processing time. Distribution in this case is an optimization - it offers an increase in speed without increasing the core functionality.
: A SetiAtHome implementation that can only run on a single machine is not unoptimized - it is broken, since it is simply not a solution. Or, if you have a powerful and expensive machine it is just a solution that is too expensive (for today's machines, at least) and brings nothing new.

Is having ATMs distributed across physical space an optimization or is having only one ATM not the best solution?

: ATM networks are not an optimization. Would a bank's customers travel from all over the globe to the single ATM machine to get cash? Of course they wouldn't. Therefore, the distributed nature of ATM machines is a functional requirement of the system, not an optimization. Similarly, a bank's customers want to be able take cash out of ATM machines owned by other banks. Thus the banking network federates the systems owned and managed by different, competing organizations. Banks will not allow their competitors access to their systems, and will not run their business systems on machines owned by another bank. Therefore, distribution is a functional requirement.

Apart from number crunching apps, such as SETI, or key factoring apps, distributed systems are rarely built to take advantage of parallelism. Most are built to connect geographically distributed people and resources or to share a few expensive resources among many people. Thus the choice of distribution vs centralization is usually a result of functional requirements and economic factors.

: I disagree. Many Enterprise Java applications are distributed specifically for "scalability." It's a choice made specifically for performance reasons, not as a result of functional or economical requirements.
: Many EJB applications are built because EJB is a trendy buzzword, or because the system needs part of what EJB provides (usually transactional persistence), or to ensure that application code is independent of the underlying database and transaction monitor. Distribution is then a side effect of using EJB not the reason for using EJB.

Are there any real-world cases of software evolving over such a large hump such as moving from being undistributed to being distributed? Part of me feels like this is too much to ask out of EvolutionaryDesign. Another part of me feels that if you have the right scaffolding in place (UnitTests, etc. etc.) then this is not only possible, but desirable. -- FrancisHwang

: Okay. I was responding to the page title and argument. It is generally considered, and agrees with my experience, that it is often easier to start from scratch than distribute a mature monolithic system. This depends on the quality of the system, however. I tend to write systems with a view to later distribution which obviously makes it a lot easier. This process can be applied retroactively as an architectural refactoring, but not many shops employ (or even recognize the existence of) coding architects (grumble moan ;)). -- RichardHenderson
: I agree that the conventional wisdom is that distribution must be planned in advance. Is this actually true, though? Have you tried to distribute a well-factored single-process system? If so, what was the driving force behind that decision, and how did it turn out? -- JimLittle

I specialize in architectural refactoring of big systems. The forces, among others, can be: performance, contingency, scaling, geographical cohesion. I did it, with some pain. I watched a couple of other groups fail, but then they didn't hire me :). They then tried to write it from scratch and still had problems. The key issue they ignored is physical failure. See DistributedTransactionsAreEvil. -- RichardHenderson

Richard wrote:

I tend to write systems with a view to later distribution which obviously makes it a lot easier.

Can you say more about what that means, and it particular how such a system might differ from a system that was simply well-factored?

What could distribution possibly have to do with factoring?

To my mind, writing systems "with a view to later distribution" could mean one of two things:

You're adding "we might need this hook six months from now" code - which means you're spending time on something the client doesn't need today. YouArentGonnaNeedIt.
You're writing code that does only what it needs to do, but is extremely well-factored, so when the call to distribute code comes down six weeks or months or years from now, the process will be as painless as possible.

Or perhaps I'm missing something in that original sentence.

There are several reasons to distribute a system that have nothing to do with optimization:

FaultTolerance / FaultIsolation
Re-use of existing equipment and applications, or use of off-the-shelf systems
Integration of systems from multiple vendors or development teams
Maintenance requirements
Security requirements
Taking advantage of differences in operating systems or hardware (for example, use a real-time OS for the real-time stuff, and a general purpose OS for the user apps).
Access from lots of geographic locations

I'm sure someone will argue that all of these are optimizations in some way, but they don't match the common definition of "optimization" which is "making something faster than the simplest implementation by using a different algorithm". Connecting diverse systems using the network is simpler than is creating a new monolithic system with all the functionality of the existing systems.

I have worked on several distributed systems over the years, and hope for future scalability is, in my mind, the stupidest reason to go through the difficulties of designing a distributed system. That is often a case of YouArentGonnaNeedIt.

Perhaps, some of the confusion comes from the negative association with optimization one gets from a maxim like OptimizeLater. There are cases in which optimization may be part of what the client asks for, from day one: Nobody in the SETI project needed to work to prove the requirement for optimization-through-distribution.

So optimization isn't always a bad thing. Seems to me that the forces behind OptimizeLater are that:

Programmers need to guard against their understandable impulse to make code faster, because doing so might distract from whatever else might give the customer more value
Technical implementation is hard, and it's often impossible to predict how slow software will be until you've implemented it, and then profiled it.

The first one is negated if the customer asks explicitly for optimization, i.e., optimization is customer value. The second one is negated if you already know enough about the implementation to know you'll need to optimize. For example, the SETI people can say, right at the beginning, "We don't know how long it'll take to compute this all on one workstation, but we know it'll take way too long. Distribute it."

Actually, SetiAtHome is probably a bad example, since one of the goals (and possibly the goal) was to increase awareness, hence funding, for SETI, and you don't do that by running it on your supercomputer. Their main goal was to involve the public and show them some pretty pictures on the computer, which is apparent from the fact that they kept sending the same data blocks for months.

CategoryOptimization