Cooperative Threading Operating Environment

Portable operating environments are desirous because they make it possible to run programs within those environments that can be run on any compatible host operating system. The JavaVirtualMachine is an example of a successfully ported operating environment for running programs.

A future operating environment could benefit from a CooperativeThreading model for multitasking rather than to piggyback on the host system's threading model and default to preemptive multitasking. Tasks would take the place of threads in the process model in the environment, and task switching would be a manual process implemented by the application developer.

The benefits I can envision in the cooperative multitasking model are:

In the coming years, common computers will be loaded with two or more CPU's running in a SymmetricMultiProcessing? (SMP) or other parallel organization. Each processor contains its own memory cache(s), whose cost to load is expensive. If a program can frequent memory represented by the cache, its performance will be improved. Operating systems allow for native threads to have an affinity with one CPU. By binding all tasks in a single process to a single native thread which has affinity to a single CPU, tasks have a better chance to hit the CPU cache. Host thread context switching within the CPU could be reduced as well since the number of threads the operating system would need to schedule would be reduced. Inter-task communication would be cheap as well, and would not require CPU cache synchronization.

The disadvantages to such a model would be things like:

I believe there are solutions to all of these disadvantages to be found through innovations in computer language development.

-- DavidWoldrich

My initial thought was that multithreading is already pretty hard and this adds yet another burden to the programmer. We know from early Windows and Mac applications that co-operative multitasking doesn't work. On the other hand, we're talking multithreading, not multitasking, and if this truly reduced synchronization issues, it might be a very good thing. The biggest problem I see is that it would result in a lot of "yield()" clutter as well as obscure liveness defects. --JimShore (arguing already!)

EH? How dare you argue! Grumble grumble ... *ahem* As I said, I think new language/vm developments would be able to solve many of the disadvantages that are apparent on the surface of CooperativeThreading. I think for argument's sake that I will give an example of a language/environment idea I was kicking around: a ZombieTaskException?. If you have an errant task that never yields or can be in a state where it will fail to yield in a reasonable amount of time, the environment will throw an unchecked ZombieTaskException? at the task. This would force programmers to design systems around that hard/harsh requirement and not be so lazy with their yield()'ing. Obviously, there are more peculiarities that arise with this ZombieTaskException?:

Am I wrong to seem so confident in believing that those last bullets could find solutions in trivial language, deployment, and tools innovations? No, I think that I am not wrong. I mean, how could I be wrong if JimShore is weighing in? My gut tells me that the more Jim says I'm wrong, the more right I must actually be. :) (just kidding!!!) That first bullet would be a tougher nut to crack, but I think we can work it out! -- DavidWoldrich

I've written my own cooperative threading (in assembly) so I know it can work, but I'm not clear why I would want to do that anymore. Can't the OS just schedule threads from the same process on the same CPU to get the advantages you discuss above? -- EricHodges

I don't see anything new being suggested on this page whatsoever, except for the suggestion that some kind of new features are needed in computer languages. If you've got suggestions in that area, please make them.

The rest of it perhaps arises from insufficient exposure to history. Every conceivable combination has been implemented and widely deployed.

For instance, whether threads all share one OS process versus each having their own OS process. Both of those models are widely deployed today. Linux, for instance, went from one model to the other in recent history, and you can do both kinds under current versions of Linux.

Or the key idea of doing cooperative threading; this has been a central part of much of the real time programming I've done over the years under different OSes, most recently with pthreads. You can use scheduling flags to choose preemption versus run to completion (under a real time OS, anyway). Etc.

I've also implemented my own thread packages and had them behave cooperatively any way that I like, and be as lightweight as my creativity allows, recently when pthreads fundamentally didn't do what I needed, and previously before pthreads were available.

So let's clarify this, just exactly what new are you suggesting? Or are you attempting to extract some DesignPatterns from existing practice? Or what?

-- DougMerritt

I mean only to explore a tried and true concurrency model mixed with some innovations and the ideas that we've popularized in the last couple years of language and environment development. My experience with explicit scheduling in programming versus preemption has shown me that I can get similar results from both models. I have often wondered if there are threading AntiPatterns we could find in programs written today that we could use as examples of what not to do when designing the "next big thing". I guess the thing I am most frustrated with is that many languages do little to abstract the host operating system's concept of threads. And, threads by themselves don't do much for you to help implement concurrent designs. The new thing I am suggesting is that cooperative multitasking might be an ideal operating model for a next-generation operating environment. -- DavidWoldrich


It strikes me that the whole business of pre-emptive task management is based on the idea that there is some gadget out there that 1) can "interrupt" you, 2) knows what time it is, 3) knows about physical events through the application of hardware magic.

I've been in the polled -vs- event-driven arguments. What gets missed is that the processor, in cooperation with other hardware, detects interrupts in between other things it's doing and then, at a time convenient for the CPU diverts program control to some pre-designated vector. Simply put, the CPU actually polls its interrupt lines to see when it ought to interrupt the executing program.

From the programmer's point of view, this is an unregulated, unexpected, unpredictable event, so the vector that receives control is obliged to preserve the integrity of the context it wakes up in.

In any language that's not directly executing against the "BareMetal?" of the CPU, it is possible for the language interpreter to accomplish an analogous effect by executing "pre-emption" in between other things it's doing at a time convenient for the interpreter while preserving the illusion of surprise for the programmer.

The challenge is deciding how surprised you want to be, and engineering the interpreter to cooperate.

-- GarryHamilton

ErlangLanguage does something like this, or did last time I googled for their scheduling algorithm. The scheduler runs every preset (1000?) number of function calls. Since Erlang is pure-functional, uses tail calls, and has the language primitives implemented as built-in functions, this usually gives a good approximation of running time. And since anything that needs to be atomic is implemented as a built-in function, and scheduling always occurs between functions, there's no need for surprises.

I'm considering something similar with a language I'm developing. I'm planning to use a CheneyOnTheMta algorithm, but run the scheduler on each trampoline bounce (in other words, after each minor garbage collection). Thus, I get threading with essentially zero runtime cost. It basically allots running time based on how much garbage a thread produces, which isn't too bad since all allocation with CheneyOnTheMta is on the stack.

The problem is always trading off fairness for efficiency. To be fair, you need to check whether it's time to context switch at every safe point. But that can eat up CPU cycles unless the granularity is sufficiently course that most running time is spent on program execution, not scheduling. And if you implement optimizations for efficiency (like - in the above example - implementing inner loops as regular function calls instead of CPS transformations, so they can be inlined), you risk seriously skewing the fairness of the scheduler.

Hardware support would help a lot. Didn't some old processors have a test & set instruction that could be used to implement locks very simply, that was lost with the CISC->RISC transition? If the hardware could somehow be made aware of where the "safe" points in the program were, it could just drop a flag on the clock interrupt and the OS would context-switch at the next opportune moment. -- JonathanTang


Garry, you expressed my sentiments perfectly. I hate the surprises of hardware interrupts, and really you hit the nail on the head with that observation. For the programmer of a preemptively multitasked application, there is a lot of perfection expected. How many times have I read threading best-practices discussions list as recommendation #1, "Don't use threads (if you don't have to)!" Something stinks about threads if people recommend not to use them.

To continue your ThreadedSurprise? thought, I would say that in a CooperativeThreadingOperatingEnvironment there are many opportunities for preemption to be effectively used in the environment's kernel. IO, interrupt handling, and other host service handlers might be best serviced by waiting threads. Communication between the environment and applications could be achieved effectively with message queues, (one of the places where CPU cache data would always need to be synchronized.) But in application logic, my opinion is that threads are too complicated a model for the benefits they bring. In other words, I propose a surprise-free programming environment for application developers. Are there any parallels that could be made with CooperativeThreading and the AspectOrientedProgramming paradigm?

-- DavidWoldrich


Let me step out from my programmer shoes. As a user I was extremely frustrated by both Windows 3.1 and, on occasions, MacOsClassic. I wish no software engineer in his right mind would try to expose users to such frustrating experiences ever again. In my humble opinion there's no technical justification for not having pre-emptive, multi-threaded and real time OS (I mean OS, not real time toolkits) for the poor users, other than the spaghettis we inherited from yesteryear. Something to remember when your windows is unresponsive because of a bad CD or your Linux is equally unresponsive because of a runaway program that you can hardly wait to kill but you can't. Screw the suffering of programmers, as a users we demand satisfaction, and after so many decades of improvements OSes are far from satisfactory. I don't really know, but as a user I find that the current kernels can spend pretty significant amounts of time without ever bothering to react to an interrupt, and it so happens that I am the guy with the need for that interrupt to succeed. I just pressed Ctrl-Alt-Del for crying out loud.

That the programmer may suffer from the resulting non-determinism is another argument, I am not willing to buy as a user. EwDijkstra told you three decades ago, that programs should be designed non-deterministically to begin with. So screw the programmer, screw the interpreter, screw the OS designer: if my 2 GHz PC is more unresponsive than a stubborn donkey, it is clear that they've all done a pretty bad job. -- CostinCozianu

Agreed (wow). As programmers, it's very easy to forget that the primary goal should be user satisfaction, not programmer convenience. Threading is hard because non-determinism is hard, but moving the scheduling responsibility onto the programmer isn't going to fix that. And programmers are notoriously bad at predicting where their programs will be spending the majority of their time. Cooperative multithreading usually leads to long periods of unresponsiveness.

I think a lot of the dangers of threads can be removed by taking mutable state out of the equation (see SharedStateConcurrency). Erlang is a significant advance here, though hardly the be-all-and-end-all of concurrency. If the bulk of the computation has ReferentialTransparency, it can be interrupted at any point. Communicating by value-messages (see MessagePassingConcurrency) eliminates the possibility of corrupted/unsychronized data. You still need to worry about deadlocks and possibly messages received out of order, but you have to worry about those regardless of how much language support you have. There's no substitute for good design. -- JonathanTang

The bad old days of Windows and Mac cooperative threading should not even come up in any discussion, because no one would be insane enough to propose that, so AssumeGoodFaith.

Seriously. In the 1980s I refused to buy a PC or a Mac while that was the state of affairs. Instead I bought an Amiga because it had preemptive tasking, and a PC once I could run Linux, with preemptive tasking, because it was and is that important. It's not like this is a new lesson; people who had a Unix or other preemptive experience before PCs appeared never took them seriously in the first place.

So let's ignore that boogeyman here. -- DougMerritt

Yes, boogeymen and bad juju about stalled processes due to unreadable CD's locking badly written interrupt handlers due to faulty operating system internals (which I agree is terribly frustrating) aside, I think we can still get that corpse called CooperativeThreading to rise outa the grave to pour us a cup of coffee before falling down dead again. In my opinion, Windows 3.1 and MacOS 9 had more problems to them than just CooperativeThreading, and they have since been seen as the black eyes for CooperativeThreading. But, I think this confirms my observation about threading represented in any form: if there is nothing offered by the host system but the thread itself to implement concurrency, then not much in the way of productivity is gained (and why bother?). Truly, preemptive scheduling does more for the application developer, but it still falls far short of what I am looking for.

We now routinely discuss as "crosscutting concerns" things like logging and transactions. Is concurrency a crosscutting concern, but one that might be impossible to compile statically with preemptive threading? Maybe by having a CooperativeThreading operating environment coupled with a fancy AspectOriented compiler, we could abstract the concurrency to some level that was manageable (in the human brain sense), comprehensible, and responsive to users. Seems like the effort to debug threads in current systems and get them to behave perfectly would be reduced greatly if we had language support and a matching clueful debugger. I'm just putting ideas out there, not sure if they are possible. I am thankful that so many valuable opinions have been voiced thus far.

Thank you! -- DavidWoldrich

So you mean abstracting the threading into the language definition, so that the compiler inserts the necessary "yield" statements into the finished code? This has been suggested on NeoKernel, and I asked my OS Design professor about it. The problem is that compilers are not much better at predicting where a program will spend the bulk of its time than programmers are.

Consider a program with an inner loop that goes from 0 to some user-specified ceiling. If the ceiling is low, then the bulk of programming time is spent elsewhere, and a yield in the loop might lead to starvation. If the ceiling is high, then omitting the yield will freeze other threads until the loop exits. The only safe solution would be to insert yields on each iteration, or at least a counter that's checked to see whether it's time to yield. But either of those can lead to big performance losses if the loop body is small.

-- JonathanTang

Well, EwDijkstra was of the well-informed opinion that non-determinism is of the essence and cannot be retrofitted in a deterministic design. So I am rather skeptical that cooperative threading or aspect orientation will be able to address this problem.

The ways to make non-determinism manageable by the brain of the poor programmers, are many and varied. The following formalism may or may not help: CommunicatingSequentialProcesses, PiCalculus, PetriNets, coloured petri nets, TLA, and quite a few others that make your head hurt. According to one of the experts (I'll try to remember who exactly ) we do not have a definitive answer to what's the best way to systematically and predictably construct non-deterministic programs, with desired behaviour. However various approaches are showing promising results, and programmer's talent and intuition has to play a role just like it plays for deterministic program. I liked a lot the UnityLogic? described by JayadevMisra? in the book "AdisciplineOfMultiProgramming?", but I can only use it informally through some very nice design patterns and solutions I found in that book. --Costin


Given your further explanation, I think you're looking for the same thing I've been searching for for years. I'm in the midst of implementing several such mechanisms right now.

The most controlled mechanism, one which will rarely cause you grief, is CoRoutines (a page I largely wrote :-). They've been raised to a high art in IconLanguage, and not so much so anywhere else, although available here and there. They're useful enough that it causes me pain to not have them, e.g. in C/C++/most languages, a little like it would be painful to not be allowed to do recursion in C, even though I don't do that much recursion in C, unlike in Lisp. But being able to when you need to is a big deal. CoRoutines aren't as useful as recursion, on the other hand; their power is significant but limited. Icon's "co-expressions" help widen the scope of where they're handy.

Erlang has it right to some extent; its mechanism is considerably more powerful. Consider UnixShell pipelines. As long as you're doing ProducerConsumer? sorts of things, you won't get into trouble, and they're quite powerful. But they're not ConcurrencyComplete?.

The mechanisms that Costin mentions above seem to me to be more powerful but also too low level, but I need to investigate more (e.g. I don't remember if I ever heard of Unity logic before).

Costin has actually said quite a bit about concurrency here that is pretty much all worth reading; we should come up with a list of such pages.

Scheme architects and fans talk about continuations a lot, and you can do pretty much anything with them, but I have yet to run across non-toy examples. I mean, you can do anything with bits, too, but strong demos are helpful. :-) Anyone know where I could look to see powerful good practice with continuations, as opposed to the much more common theoretical comments?

In other words, ok, I've implemented cactus/spaghetti stacks...now what? :-)

-- DougMerritt

The KillerApp for continuations is the web. Every time you have a couple hidden input tags and a link that uses them, you're invoking a continuation. Any WebApplication can be thought of as a series of function calls (links) and parameters (HTML input elements) that call each other using CPS. There was a big discussion of this a while back in the LL1 mailing list archives, but I can't remember the date.

It's apparently a big enough deal that continuations have been faked (or have been planned to be faked) in the following languages:

CallWithCurrentContinuation lets you write the bulk of your application's flow logic as a series of procedure calls, which generate the appropriate links to insert into the HTML. Save the continuation on the server (it requires a memory-resident app, CGI won't cut it), and then pass a reference to the continuation with each link. From a user-interface perspective, the big advantage is that hitting the back button or accessing the pages out of time order won't screw up the server. -- JonathanTang

I think I recall a discussion of a web approach this way (PaulGraham or LambdaTheUltimate or somewhere...). References to code would be welcome...again, I get it at the very abstract level, but I've never seen non-trivial examples. -- DougMerritt


Multithreading is one of those areas that seems impossible to get right. (Witness how early versions of Java went through several synchronization models. In fact, doesn't the current beta introduce yet another synchronization model?) At the same time, there are some things that just can't be done without multithreading or some hacked-up equivalent. Sometimes you just want your program to do two things at once.

I would love to see a programming model that made threading easy--even preferred--yet made liveness and safety a non-issue, to the point where it was simply impossible to deadlock or race.

Cooperative threading for a single program isn't the disaster that it is for an operating system. I don't think it solves the problem, but maybe it could under the right conditions. I think it's an interesting idea to explore.

A similar idea would be to have a language that was not quite TuringComplete, such that the maximum runtime of any routine could be statically calculated in a reasonable time. Such a language could require that all threads exit (not yield) within 10ms. It would result in an unusual event-oriented programming style and would solve the liveness problem of a cooperative model. I've kicked this idea around for a long time and never gotten around to actually trying it. It's hard to say if it would actually make programming easier, and something more would be needed to address safety.

--JimShore

That wouldn't do it as stated. First off you need to consider loops and recursion; to keep such things from going over the time limit, you'd have to make each new iteration a new thread. Or every BasicBlock?, to use the ultimate building blocks of control structures.

Then you'd have to consider what to do about a thread waiting on an external event such as a socket in a web server. If you made a special exception for that case, then you'd have to worry about the same thing done asynchronously, where it computes something endlessly in a loop but checks for incoming socket data once in a while.

If you apply reductio ad absurdum, and make every instruction its own thread, then every pure computational thread/instruction would exit far faster than 10ms...but would be functionally identical to the current situation, where, yes, every instruction exits very fast, and the scheme would have gotten nowhere at all.

The idea of having language support has enticed me for a long time, but one has to figure out the underlying model first; one that clearly would work if done carefully by hand, and only then automate that manual solution into the compiler.

-- DougMerritt


I've read discussions about assigning a time cost for every instruction executed in the environment's VM. VM kernel calls would all be counted at some fixed cost that would account for the worst case time spent outside of the user code. With my ZombieTaskException? idea, the VM could accumulate time costs and 'know' when a task is going over the limit. Unfortunately, loops and recursion make it impossible for a compiler to warn a developer that a block or set of code would be a candidate for Zombie. So, this ZombieTaskException? is going to be a regular source of those surprise concurrency bugs that I am seeking to avoid.

So, perhaps CooperativeThreading (with a governor as hard as a ZombieTaskException?) is too simple an execution model on it's own for authoring reliable multithreaded applications.

A blend of PreemptiveThreading? and CooperativeThreading might be the answer. Groups of cooperatively scheduled tasks could be bundled together and preemptively scheduled. This would mean that the developer could have control over individual tasks that could even starve the rest of the group if need be. But, the environment's VM could preemptively schedule thread groups at it's discretion. I think this scenario is only moderately better than the preemptive scheduled thread mess I see today. Perhaps there would be a useful way to abstract the grouping and let the compiler do it or let the environment 'discover' the task groupings at runtime, or perhaps the the queueing of messages between the thread groups could be automated somehow.

-- DavidWoldrich

I don't think I've ever seen a pre-emptive threading model that didn't allow you to cooperatively release the CPU and unschedule the thread (sleep). Doesn't that give the best of both worlds? What problem are you trying to solve? -- EricHodges

The problem is that in a preemptive environment, you can never be sure that your thread won't be interrupted and have some other thread stomp all over its data. In a pure cooperative model, there's no need for sychronization primitives. Any time you have an operation that needs to be atomic, you just refuse to yield, and you can be sure that nothing will happen in between yields. It makes the program much easier to reason about.

Unfortunately, it tends to make for a rather trying user experience. So you want some sort of pre-emption. But once you've added any kind of pre-emption, you need to worry about threads being interrupted in the middle of an operation. Add shared state, and you need locks around any reads or writes to that data. -- JonathanTang

In a pre-emptive environment I can't be sure my thread won't be interrupted, but I can be sure some other thread won't stomp all over its data. I have synchronization for that. It's genuinely hard to figure out the best way to design that, but I wouldn't call it a "mess" as David does above. It's difficult to code because it's a difficult problem (as Costin has pointed out).

What is the goal? I've done cooperative threading. I've written my own cooperative thread manager. It's pretty easy, easy enough that I could write one to use in a pre-emptive environment if I ever saw the need for it. I haven't use cooperative threading in the last 8 years because I can achieve the same results with synchronization and sleep. Is the goal to ignore the difficult problem of which operations should be atomic? Is the goal to get the machine to figure that out for me? -- EricHodges

The issue is that current approaches to synchronization result in defects. The mechanics are deceptively simple, proper usage is very hard, and testing nearly impossible. Result: buggy code. Jonathan states the core point very clearly: CooperativeThreading makes the program much easier to reason about, but it leads to liveness issues. Dave feels that the liveness problem can be addressed and proposes ZombieThreadExceptions as one mechanism. I'm not convinced but would like to see discussion. So far, unfortunately, most of the contributers have jumped to dismiss the idea, often misinterpreting it in their rush to disdain. --JimShore

I think the goal is to be able to reason about multithreaded programs as easily as you can reason about singlethreaded programs. CooperativeMultitasking? allows this, but at a cost that's too high for most users to bear. Pre-emption requires locks to maintain data integrity, and the more locks, the more potential for deadlock. I don't think the goal is attainable; concurrency is fundamentally hard, and new programming models aren't going to change that. But I think we can do better than we currently do.

Erlang's a good start. If each computation has ReferentialTransparency, then the scheduler can preempt any of them without worrying about corrupting data (because there's no state shared between threads). All data sharing uses MessagePassing, which goes into a "mailbox" asynchronously. So they only time you pick up new data is through receive(), which blocks unless there's data available. There's still the possibility for deadlock, but you don't have to hand-lock variables, and if you sketch out the whole design of the system, you can avoid the cyclic dependencies that result in DeadLock.

EeLanguage also claims to have a concurrency model where it's impossible to deadlock, because all communication between threads happens through an asynchronous "promise/fulfill" mechanism. But I'm not really sure how it works, because if you don't have the data for the computation, what do you do? You kinda have to block until someone fulfills their promise, and as long as you can block without fulfilling promises, there's still the potential for deadlock. -- JonathanTang


This discussion on Concurrency-Oriented Programming on LambdaTheUltimate might have a strong bearing on what you're interested in: http://lambda-the-ultimate.org/classic/message9289.html

-- DougMerritt


Regarding CooperativeThreading. While Windows 3.x and MacOsClassic clearly demonstrate that it is inappropriate as a means for scheduling independent applications (a pre-emptive model is clearly superior for that), especially in a multi-user environment; for scheduling multiple threads of computation within an application, most of its objectionable features go away.


A useful distinction to make, at this point in the discussion, is the distinction between what I call ConcurrencyInTheSmall? and ConcurrencyInTheLarge?. (These names are obvious rip-offs of ProgrammingInTheSmall? and ProgrammingInTheLarge?). What suffices for one does not suffice for another. The following defitions are woefully imprecise, for which I apologize.

ConcurrencyInTheSmall? refers to the synchronization of access to simple data structures like message queues, simple shared variables, etc. This sort of concurrency usually has the following properties:

ConcurrencyInTheLarge? refers to the synchronization of access to complex data structures, including the global state of some application--such as locking an entire database. This sort of concurrency usually has the following properties:

ConcurrencyInTheSmall? is pretty much a solved problem in computer science; and programmers have numerous tools at our disposal (semaphores, mutexes, monitors, spinlocks, etc.) to deal with low-level synchonization issues. ConcurrencyInTheLarge? is an unsolved problem, and is very difficult. EwDijkstra and TonyHoare spent much of their professional lives (Hoare is still working on the problem) working on this problem, and no general solution has been found. When Dijkstra made the remark alluded to above, I surmise he was referring to ConcurrencyInTheLarge?, and reacting to the tendency of many programmers to design algorithms which depend upon being able to atomically observe/mutate the entire state of an application--an approach which a) does not scale, and b) does not at all model the RealWorld.

There are good partial solutions to ConcurrencyInTheLarge?, CommunicatingSequentialProcesses (and other forms of MessagePassingConcurrency) is one example--it also has the advantage that it scales to distributed systems (whereas SharedStateConcurrency doesn't). ReferentialTransparency is always a good way to avoid the problem when you can use it; however the real world is one of constantly changing state, so the functional paradigm doesn't always apply. LindaTupleSpaces is an interesting way to model SharedStateConcurrency in a distributed system, though it still seems to be an area of research.

It is an AntiPattern, IMHO, to extend the structures and techniques developed for ConcurrencyInTheSmall? and expect them to be suitable solutions for ConcurrencyInTheLarge?. But that's what many languages and middleware, supposedly for large-scale distributed systems, try to do. (JavaLanguage, many RDBMS's, etc.). It is also an AntiPattern, however, to do the opposite--to proclaim semaphores, multithreading, and such to be unsuitable for all applications save for the lowest level of the OperatingSystem; and demand that even trivial concurrency tasks be solved with things like CommunicatingSequentialProcesses.

--ScottJohnson

Very nice.


CategoryConcurrency


EditText of this page (last edited August 2, 2006) or FindPage with title or text search