HarryChesley and JimBesemer were assigned to write image processing programs for the Illiac IV. The Illiac was a huge, SIMD machine developed by the U. of Illinois. At the time it was the fastest computer on the planet - a bit TOO fast, as it turned out.
The basic architecture was an array of 64 processors which executed a single program in parallel. There was a global bit mask which provided for "turning on and off" individual processors. There was a fairly constraining method for passing data back and forth between processors: a family of complicated "shift" instructions provided for processors passing the contents of a single register left or right by one or by eight processors. Other than that the instruction set was unremarkable. The architecture had a provision for wiring four of the basic configurations together, for 256 parallel processors, though I'm not sure if this was ever done.
(A few comments, I'm Dennis Ernst, and after graduating from Purdue in 1975, I was recruited by Charlie Pfefferkorn (a former Purdue professor) to work on the Illiac IV. The execution of the instructions to turn off individual processors, shifting data, looping, and other control functions was handled by the Control Unit(CU).
The instruction streams for the CU and the PEs were intermixed, which solved some synchronization problems while creating others. Yes, the data shifter was fairly low bandwidth, but most people got around it because most operations were overlapped. The synchronization instructions for the four quadrants never got used because the other three were never built. BTW, the result would have been a 256 processor machine)
Of course the whole thing was experimental. We'd get daily e-mail messages saying things like, "we increased the clock rate to 3.2 Mhz; report any problems to sysops." To make things even more interesting, there was a from-scratch, experimental, new OS and a from-scratch, experimental, new programming language to deal with. The language folks didn't even pretend to be ready for prime time so assembly language was the order of the day. And the OS added so little to the equation that it, too, could be ignored, other than taking up some low RAM and supervising the starting and occasionally the successful termination of your program.
(There was a high level compiler call Glypner(sp?) which ran on a Burroughs B6700. It was Algol like (what else would you expect from Burroughs). I never used it, and I'm not sure when it first became available. When I arrived, they just done a complete rebuild of the machine. A number of systematic problems were dealt with during that time. The machine used wire-wrap and card sockets. One of the problems was the discovery that copper in the traces at the fingers at the card edges can migrate through the gold plating over it. When the copper atoms arrive at the surface they oxidize and that substance is a non-conductor. They now put a layer of nickel between the two metals to stop migration. Of course, all they could do is remove the oxidation at the surface. This continued to be a problem throughout the life of the machine. Another problem was a plastic curing problem in the card sockets. Over time, the plastic would soften and the fingers that pressed against the card would have less force and (with the oxidized contacts on the card) produced an intermittent connection. They fixed that by bending the fingers in the card sockets. Another problem was accidental contact between wirewrap pins. Most of the time this wasn't a problem because an oxidation buildup on the wire-wrap wire prevented shorts. Unfortunately, the I4 was located at NASA Ames next to the BIG wind-tunnel. It used 6 6000 HP motors (the called PGE before running it to warn them) to move a lot a air. In the process, it produced a considerable amount of infra-sonic vibration (you could feel it in your gut when it was running), and that caused the wirewrap pins to vibrate and clean off the oxide insulation layer and short out. So the I4 would usually fail when the wind-tunnel was fired up. Enough for now. DE)
What good is a fast computer if it doesn't have fast I/O? The I4 main memory was connected to a super fast array of hard disks, providing several gigabytes of fast disk access. (Capacity was on the order of what I have today on the PC I'm using to compose this text, though the I/O MAY have been faster.) Algorithms were significantly impacted by the disk I/O strategy, so submitting a job involved specifying how your data would be physically laid out on the disk drives. Of course these same physical drives were shared by all users, so for the most part the entire farm was only a fast scratch area. Consequently, copying user's data to and from the disk array dominated the time consumed by most jobs, particularly programs under development. There was a provision for specifying how you wanted your data laid out in relative terms and there was a complicated scheduler which tried to identify different jobs which could safely be overlapped, but that was only successful if the programs actually did what the layout specs said. Most users defined their data in absolute terms.
Jobs were composed on and submitted from a PDP-10 running Tenex. When your job was accepted for execution, all your input files would be copied to the indicated sectors on the disk farm and your program would be loaded and executed by the CPUs. If your job was successful your output files were copied back to the PDP-10. Ultimately, all developers ever saw was a heavily bogged-down Tenex command interpreter plus occasionally listings describing what happened to your job. Rather anti-climactic.
Harry's and Jim's initial task was to implement a two-dimensional FFT, as this was the basis for many image transformations. A 2D FFT consists of taking a 1D FFT of each of the rows and of each of the columns. For the columns, this was easy on the I4: each CPU did it's FFT in parallel on contiguous data in it's own address space. But this couldn't be done on the rows. Turns out the simplest solution was to transpose the array, but even that turned out to be an extremely hard problem, given the limited inter-CPU communication capabilities. Never really got it to work.
About this time the department got a PDP-11 and moved all of their image processing to it. It took about an hour to do the 2D FFT on a 512x512x24 bit image. Although that was lots slower than the theoretical maximum throughput on the I4, it was lots faster than the I4 turned out to be in practice, especially via remote control over the Arpanet.
I guess the lesson is: a cycle in the hand is worth 2,000 in the net.
About the time we got off of it, the I4 became stable enough to actually begin billing customers for the computer time they used. The job execution strategy was as follows:
Even though I've written programs for it, I've only ever seen pictures of it. Still, it's size is impressive. Each CPU comes in a half-height 19" rack - on wheels. There was a feature (the power supply?) which was hidden behind a garage door built into the cabinets.
Years later, leafing through a National Geographic in a Doctor's office, I inadvertently came across the final days of the Illiac IV. I would have missed the photo except the distinctive garage door caught my eye. Nearby in a huge heap were filing-cabinet sized CPUs, as if piled there by a bull-dozer. Tons of useless scrap. The microprocessor revolution had won.
There used to be (late '80s) a CPU board from the Illiac IV displayed in a glass case in Loomis Hall at UIUC - the undergraduates taking Physics 106 and 107 had to pass it to get out the door. I wonder how many people like me ever noticed it...
I was recently at an Apple reunion at the Computer History Museum in Mountain View and ran across one of the actual Illiac IV processors on display. Although I'd programmed the machine nearly 30 years earlier, that was the first time I'd seen it in person. (HC)