Von Neumann Bottleneck

In a machine that follows the VonNeumannArchitecture, the bandwidth between the CPU (where all the work gets done) and memory is very small in comparison with the amount of memory. On typical modern machines it's also very small in comparison with the rate at which the CPU itself can work.

The obvious solution is parallel processing. It has many problems of its own.

Why isn't the obvious solution to increase the bandwidth between the CPU and memory?

The other obvious solution (which doesn't require abandoning a VonNeumannArchitecture), which I'm surprised hasn't been mentioned on this page, is moving memory into the same package as the processor. Either in explicitly-separate address spaces (large memory files, special "processing" memories, etc.), or via caching.

That's not abandoing the VonNeumannArchitecture, that's increasing the bandwidth between the CPU and memory. The architcture still meets Von Neumann's requirements even if the CPU and memory are in the same physical device.

A partial solution I've seen in actual use (on RAID cards, at a memory layer between the RAID-card's CPU and the disk drives) is to add some intelligence to the memory for wide-memory operations (such as zeroing, XORs, copies). The RAID-card's CPU can thus issue a few higher-level commands to the memory than just get+set, mostly to optimize around the VonNeumannBottleneck. One could presumably extend this to support small microprograms at the memory in the same sense as GPU shaders, though one would probably wish to guarantee termination. On the RAID driver, some sort of locking scheme was in use where, when the CPU requested a memory address in the read or write region of the program over the memory, it would end up waiting on that request until the program completed, and if the CPU tried to change the memory-program it would also wait until completion.


digressing even further

Actually, most CPUs manufactured these days *do* integrate all of RAM onto the same chip as the CPU. (Well, at least they did in 1997 http://en.wikipedia.org/wiki/Microprocessor#Market_statistics -- is there a more up-to-date reference? )

Most microcontrollers include all the program memory and the RAM (usually not DRAM) on the same piece of silicon as the CPU. As of 2009, there are several microcontrollers that include on the order of 1 MBytes of Flash that can be read and re-written by the software.

But they still have a von Neumann bottleneck.


Other research material is Hyperthreading Pipe-Lining Multiple ALU Prefetch - Matt Pettit (UK) 2004


Is it true that the graphics synthesizer for the next generation PlayStation used "Computational RAM" http://david.carybros.com/html/vlsi.html#cRAM ?


The other "obvious solution" is not to throw hardware at the VNB, but to rebuild programming from scratch with another paradigm. See OmnigonInternational for less tips here...

See CanProgrammingBeLiberatedFromTheVonNeumannStyle


EditText of this page (last edited November 20, 2014) or FindPage with title or text search