Handling Infinite Work Loads

{I suggest thinking of better possible names for this topic. Perhaps HighScaleWebSystems? or something. "Workload" implies human labor.}

Infinite work streams are the new reality of most systems. Web servers and application servers serve very large user populations where it is realistic to expect infinite streams of new work. The work never ends. Requests come in 24 hours a day 7 days a week. Work could easily saturate servers at 100% CPU usage.

Traditionally we have considered 100% CPU usage a bad sign. As compensation we create complicated infrastructures to load balance work, replicate state, and cluster machines.

CPUs don't get tired so you might think we would try to use the CPU as much as possible.

In other fields we try to increase productivity by using a resource to the greatest extent possible.

In the server world we try to guarantee a certain level of responsiveness by forcing an artificially low CPU usage. The idea is if we don't have CPU availability then we can't respond to new work with a reasonable latency or complete existing work.

Is there really a problem with the CPU being used 100% of the time? Isn't the real problem that we use CPU availability and task priority as a simple cognitive shorthand for architecting a system rather than having to understand our system's low level work streams and using that information to make specific scheduling decisions?

We simply don't have the tools to do anything other than make clumbsy architecture decisions based on load balancing servers and making guesses at the number of threads to use and the priorities for those threads.

We could use 100% of CPU time if we could:

Schedule work so that explicit locking is uncessary (though possible). This will help prevent DeadLock and PriorityInversion. See LockFreeSynchronization.
Control how much of the CPU work items can have. See RealTime OperatingSystems.
Decide on the relative priority of work and schedule work by that priority.
Have a fairness algorithm for giving a particular level of service to each work priority.
Schedule work CPU allowance across tasks.
Map work to tasks so as to prevent dead lock, priority inversion, and guarantee scheduling latency.
Have mechanisms to make work give up the CPU after its CPU budget has been used or higher priority works comes in, in such a way to give up locks to prevent dead lock and priority inversion.
Control new work admission so that back pressure can be put on callers.
Assing work to objects and process work in order to guarantee protocol ordering.
Ideally, control work characteristics across the OS and all applications.

The problem is we don't have this level of scheduling control. If we did then a CPU can run at 100% because we have complete control of what work runs when and in what order. There's no need not to run the CPU at 100% because we know the things we want to run are running.

It is interesting frameworks like Struts and JSP rarely even talk about threading. I don't think application developers really consider the threading architectures of their applications. Popular thread local transaction management mechanisms, for example, assume a request will be satisfied in a single thread. That's not true of architectures for handling large work loads.

Scaling a system requires careful attention to architecture. In the current frameworks applications have very little say as to how their applications run.

It is quite common to see a mainframe running at 100% utilization, not because it is overtaxed, but because it is efficiently balancing interactive and background/batch work. The above text couches this idea as somewhat of a novelty, but this is nothing new. I doubt most mainframe sysops would agree with "traditionally" as used above.

Agreed. Even on clusters of low end VMS systems (not close to mainframes in terms of cost or performance) we expected to see them utilize all of their CPU all of the time. If they weren't that meant we were wasting money on idle hardware. And we did have most (perhaps all) of the tools listed above. I suspect the author of the top section comes from a personal computer background.

[Mostly unix servers and embedded systems.]

Perhaps it's a Unix mind set then. There are domains where CPU time is highly valued and there's always something the CPU can be doing. I remember stories about supercomputers calculating more decimal places for pi as a low priority task and some companies being reluctant to share their discoveries because it let everyone know they weren't using their supercomputers efficiently.

I guess you can do proteing folding or seti, but that's not quite the same thing.

That's not what I said. The pi calculation was an example of how CPU time is coveted in some domains.

On many systems, the vast majority of processes are IoBound?, so a high CPU utilization is frequently a sign of a runaway process, rather than useful work being performed.

And 100% CPU utilization is a problem for machines with interactive users--it tends to slow the UI (especially a GraphicalUserInterface) down to a crawl. Perhaps fixable with appropriate priority settings, but many machines are not suitably tuned.

In short, it all depends on what the machine is being used for.

Currently you have no way to decide to take a network packet from a machine you can put back pressure on or dedicate that CPU and resources to the UI handling. That's the larger point. My take on these issues can be found at http://www.possibility.com/epowiki/Wiki.jsp?page=ArchitectureDiscussion and http://www.possibility.com/epowiki/Wiki.jsp?page=AppBackplane

Again, "currently" depends on your view. IBM's workload manager is a very mature scheduler. IBM's "sysplex distributor" dynamically assigns connections to be processed by different mainframes in a sysplex, depending on their current workload state, so you have dynamic transaction load balancing across multiple systems. I'm sure this degree of efficiency is true of all big iron; I'm only familiar with IBM. UNIX has a lot of catching up to do.

Granted, a lot of the efficiency has to do with the availability of background/batch work that you can afford to defer temporarily in favor of interactive work. UNIX systems are often single-purpose, so they don't have the benefit of using batch CPU as a cushion.

Currently being in the pc, unix, embedded world.

How big is this problem? What data-centric web systems are really large enough such that these concerns are a worry? Ebay maybe? Some kinds of systems can simply be split up based on "accounts" or something. But this is only if there is not a lot of sharing between accounts. See WebStoresDiscussion.

Another type of traffic problems are "burst" conditions, where everybody comes at the same time due to some deadline.

As far as web servers and similar systems, the problem is that they are not just processing data streams. They are soft RealTime systems: a request must be handled within a reasonable amount of time (a few seconds at most) or visitors/users/customers will get fed up and leave. Given finite CPU resources, once you hit 100% utilization performance will degrade as the number of requests continues to grow: either you have to take longer to serve each request (which will drive away users as the site becomes unusably unresponsive), or you have to start rejecting requests (which will drive away users as more of their attempts to navigate the site fail entirely). Further, the request load generally varies with time of day, and peak load can be several times greater than the off-peak load; to handle the peak load adequately, you'll need more CPU resources available than are necessary for off-peak times. This will tend to leave you with "underutilized" CPU at non-peak times unless you've got batch processes to run (like the pi stuff above).

I should note that being "past 100%" is not in and of itself guaranteed to give unacceptable performance. A given system may be able to sustain good response times well beyond the load point that fills all CPU time, but for some given load it will drop below the acceptable level and fail. If you're only barely adequate at that 100% point, then it's going to be sooner rather than later.

[Of course, a CPU running at greater than 100% efficiency isn't possible. What happens when the CPU stops idling is that specific tasks take longer to complete--resulting in degraded service or outright failure, depending on circumstances.] Don't tell the folks into OverClocking... ;)

See http://www.possibility.com/epowiki/Wiki.jsp?page=HandlingInfiniteWorkLoads