Very Lightweight Threads

Threads are termed 'heavy' or 'light' based on two primary factors.

The first of these factors is the cost of the context switch between threads. In addition to the cost of changing the processor context (registers, program counter), this includes the other costs of changing the computational context: largely interrupts (common to timer-based thread models), scheduling elements (deciding which context to switch to), and costs associated with caching (including both processor caches (cache of RAM), the TLB, branch predictions, and paged memory (eqv. to cache of HDD)). The context-switch cost effectively dictates the maximum frequency at which threads may be profitably utilized; it can essentially be subtracted from the time-slice each thread receives.

The second of these factors is the memory cost of maintaining state for halted computations... or, put another way, the memory cost for maintaining some (generally massive) continuations. This cost includes a duplication of all registers and any data-space (often a stack or stacks, a heap, etc), PLUS all memory associated with tracking the thread-state (e.g. a list of all threads, trees in the scheduler, etc.). The memory cost imposes an absolute limit on the number of threads the system can effectively handle since any physical system will possess a finite amount of memory.

'Light' threads are those that have low costs in both these categories, allowing a system to handle many of them efficiently at a high frequency. Threading models currently claimed to be 'light' are generally user-mode threads that occur within a process by some voluntary mechanism (e.g. yielding control), which bypasses a 'kernel' context switch, a TLB flush, most memory paging, and allows for scheduling optimized to the application. One can reasonably assume that 'VeryLightweightThreads' would be cutting costs even further, pushing to the extremes of what is possible... e.g. 'light', but even more so. VeryLightweightThreads could be used on (sufficiently powerful) systems where you want hundreds of thousands of threads... e.g. for actor languages.

There are other factors than 'weight' that should go into any threading model. These include safety, security, perceived 'smoothness' by users, and fulfillment of any realtime guarantees (which is usually also for 'safety', though of a different sort). However, this page is dedicated to VeryLightweightThreads, so discussion here shall remain on where costs can be cut, which are necessary and unnecessary, and possible approaches.

Scheduler Costs

For VeryLightweightThreads, the amortized cost of scheduling must be O(lg(N)), ideally O(1). Intelligent scheduling can reduce the context switch cost a great deal, and provide better 'smoothness', but wherever possible this must be done in O(lg(N)) time and, where that isn't possible, it can be done in a O(N*lg(N)) action spread over N scheduler-actions (e.g. performing a partial-sort action at each scheduler event).

The reason for O(lg(N)) is simple. If the cost is higher then, as you add more and more threads, the cost of context-switching will increase a great deal... thus the more you have to do, the less you can get done. It's a negative feedback loop if fed by a regular stream of new tasks. At O(lg(N)), this cost increases linearly for every doubling of processes... which is, at least, quite manageable. As a general rule of thumb, when dealing with scaling rules in computation systems, O(lg(N)) load on any given system is the way to go.

Safety and Security Costs

One of the greatest costs of the modern threading model is the cost of context-switching between kernel and user contexts (not just modes), plus the cost of maintaining separate data-space for each process (esp. when it is time for one process to communicate with another). Both of these were designed into current systems to provide a modicum of security and safety; the current process model can be done without explicit processor support, but the processor support prevents low-level language code (like C) from stomping on the other processes in the system.

In moving towards VeryLightweightThreads, it may be better (or even necessary) to search for alternative mechanisms to provide safety and security. In particular, getting rid of the current process model can significantly reduce context-switch costs. The TLB would no longer need changing, and the cost of maintaining state for process threads could be reduced by use of shared state (especially for common code and service access). As an additional benefit, the cost of communications could be reduced to inserting a pointer to a reference-counted value-object into a message queue of another process or thread, even for messages to 'kernel' services. Of course, if we have such a thing, we must still provide a few guarantees: (1) either individual process-threads cannot crash or, if they crash, they will not take out other processes. (2) it is possible to guarantee security of other process-threads, such that in a multi-user system it is not possible for one user to violate the privacy or corrupt the actions of another user (at least through the threading model). In examining this problem, it would be excellent to also seek and find solutions to the problem for more than just the threading model; security and safety can't just exist at any given level.

I propose tying typing and security as an aspect to every communication. That is, every message on the system has a known type (not just string of bytes) and known privacy information, and every message service on the system has attached security stuff (like trust, authorization, capability)... even in messages to and from such things as RAM and HDD. These can be known implicitly, at compile-time, and be turned into dynamic code only where it cannot be proven that it is safe statically (e.g. in IPC communications, privacy and safety might be doable without encryption between some processes or threads).

A potential approach is the use of a high-level language as the only 'official' language of the machine... something that can still be compiled down for the processor, but also something that allows checking prior to running for security and safety violations (e.g. through advanced typing systems: typing privacy, typing security, typing safety). It might be possible to use a type-system to find even many forms of malware. Sanity checking of a language statement (of any sort) requires that the language represent both intent and action so a prover can ensure that the intent matches the action, and that the intent itself is sane; for type-safety, the 'intent' is that a variable be usable in certain ways, and the action is the use. For 'privacy', the intent is that messages only be read by 'trusted' processes according to some predication on 'trust', and the action is sending a message. One can expect that it would require a rather sophisticated language to handle privacy, security, AND safety. However, once it is installed in the system, one can make guarantees about the whole system using that language before execution, and eliminate a great many dynamic checks hand-written by programmers. This does, however, require a common and safety-conscious language and compiler... which is not something programmers are likely to enjoy unless the language is capable of (insert your favorite language here) syntax.

By the pressure to be rid of the current process model and the associated very 'heavy' process threads, VeryLightweightThreads would tend to encourage an ExoKernel or 'no kernel' approach to OS design where typically 'kernel' services are just services like any other, though they are likely the only services with initial capability over certain computer components (like the RAM or network card).

Processor Context Switch Costs

One easy means of cutting down on processor context-switch costs is to make context switching voluntary wherever possible. By doing so, you can also ensure that the computation has the opportunity to save data - exactly and only that data necessary for continuing the computation - prior to yielding control. If a common compiler is used, it can insert these automatically at optimum 'stopping spots', perhaps due to outside constraints (e.g. realtime constraints). However, the advantage of this change is reduced somewhat for processors that have fast context save/load commands. It can be of great gain for processors with hundreds of registers of which you only want to save four or five: not only can it reduce memory costs per thread (on average); if the number of instructions can be minimized, it would reduce the total latency to save or load just a few registers instead of many.

CategoryThreading?