Cooperative Threading

A limited form of multi-threading that requires a thread's cooperation in yielding the CPU. A thread switch cannot take place in response to an external event. This means that, in many implementations, CooperativeThreading is useless for real-time work, because real-time response means that a low priority thread can be suspended to dispatch a thread in order to service the external event.

Frequently implemented with CoRoutines, but not identical to them.

However, despite popular belief, CooperativeThreading can take advantage of multiple processors. As many cooperative threads run as there are processors, and they stay on their processor until they yield. But of course, synchronization requires real locks; the cooperatively threaded software can no longer assume that as long as a thread does not yield, it owns a critical section.

This is the AchillesHeel of cooperative threading: programmers get lazy and start making assumptions about the concurrency. Those assumptions not only break if the software has to be ported to preemptive threading, or SMP, but they can break even under the single-processor cooperative model. When software becomes sufficiently complex, it's hard to be sure that some function you are calling does not contain an internal yield somewhere! If you assume that a the function call did not yield, but it in fact did, then the state of the world may have been changed by another thread.

If you ever have to deal with cooperative threading package, it's best to treat it like preemptive threading and use proper synchronization everywhere.

It's noteworthy that traditional Unix kernels are cooperative; when you are running kernel code, another process won't be dispatched, unless you explicitly suspend by adding yourself to a wait queue and calling the scheduler. Only user space code can be preempted! Linux works like this too. When SMP was introduced to Unix kernels, it was done by extending the semantics of cooperative threading to multiple processors. Once you get that right, because you have to do real locking to make it work, you can start adding real preemption while you are at it. This is what the relatively recent (2003) preemptive patches for Linux are all about.

In Unix kernels there can arise some pretty tricky cooperative threading issues. For example, code that looks like it obviously does not yield may do so anyway! Why? Because it hits a page fault accessing some user space data, for instance, and the handle for that page fault will suspend the process until the data is swapped in. Oops, a hidden yield in a simple memory access instruction! This is why explicit interfaces are used to validate and copy user space data. Once it is copied into kernel space you can work with it without the risk of interruption.

Linux, and other Unix varieties, have had pre-emptive kernels for quite some time now.

I have implemented "Cooperative Threading" using multiple cores and almost none of the above points apply. I am not implying that these comments are incorrect in some specific implementation, but the title implies a concept rather than some specific implementation.

One good point for "Cooperative Threading" CT is that it can be made to incur very little overhead in switching and it doesn't necessarily need much locking. If you have control over when context switching takes place, you can avoid doing it when you are executing blocking code or when it is not a convenient place in the code to make the switch. You don't need to save registers and in my case, I have no problems with the execution stack because each program has it's own stack on the heap.

A problem with CT is that a program can just not yield to another program (all programs stop), it can loop continuously and prevent swaps or it can block in the operating system. If a program behaves badly, this can effect other programs as they normally share the same memory address space.

I solve most of the problems of using CT by running byte code on my own VM (I also have control over what code is created at compile time) and using stacks that are defined on the heap. Program executing state is stored all the time in a table (that might have pointers to globally accessible memory or local memory) and a context switch is no more than changing an index from one row to another. Each executing program has it's own execution stack and local memory. Global resources (big memory, disk etc) must be synchronized using locks, but my system looks after these automatically. There is no mechanism for programmers to set locks and I make sure that all locks cannot deadlock or race so many problems with locks are just avoided. All code that can block is put in a separate SMP thread rather than in CT threads. A comment above says that "programmers get lazy and start making assumptions about the concurrency", but concurrency is automatic in my system, with no programmer involvement whatsoever, so they can be as lazy as they like.

For random binary programs (developed by different programmers) that need to be run concurrently, CT isn't a very safe way to implement concurrency, BUT in special circumstances (like my system), CT is significantly more efficient and easier to implement than "Pre-emptive Multi-Tasking". Each has its place and I use both.

-- DavidClarkd

CategoryOperatingSystem CategoryRealTime CategoryConcurrency