A processor optimized for DigitalSignalProcessing--numeric compuations, in particular those involving relatively simple mathematical functions performed on large datasets, often streamed in an out (such as digital filters, convolutions, etc.)
Typical attributes of a DSP:
- Multiple pipelines and HarvardArchitectures
- Early DSPs had segmented internal memories, optimized for different functions (one memory for data, another for coefficients, etc). In general, this is done to allow multiple memory operands given to a single instruction to be fetched simultaneously.
- Single-cycle MAC (MultiplyAccumulate) instructions, as well as specialized instructions for other computations.
- Often, special read-only memories for certain mathematical functions (a sin x table in ROM, for instance).
- See below for more
In the past, the difference between a DSP and a "normal" CPU was tremendous--DSPs were very "application specific" processors, with limited memories, simple or no MMUs, and register sizes tailored to meet applications. (The 56k series was a 24-bit processor; as 24-bits was and still is commonly used in audio processing). With the increasing number of gates made available by
MooresLaw, the CPU and the DSP are rapidly converging. The vector processing units found in many modern processors (
AltiVec, MMX, etc.) are quite good at many DSP applications (especially couples with superscalar architectures and large on-chip caches that can emulate the special-purpose memory architectures of traditional DSP chips); and modern DSPs are rapidly replacing the "special purpose" architectures with more general purpose computing structures. Many DSPs are now capable of hosting a modern protected
OperatingSystem.
[from DigitalSignalProcessing]
On a basic level, a bona fide DSP chip must, as a minimum
requirement be able to optimally perform the convolution
summation used to compute the output of an FIR (finite impulse
response) filter.
N-1
y[k] = SUM{ h[n]*x[k-n] }
n=0
y[k] = FIR filter output
x[k] = FIR filter input
h[k] = impulse response of FIR filter (also FIR coefficients)
To do this in N instructions, one must be able to in ONE
instruction:
- Do a MultiplyAccumulate (MAC). In the Motorola DSP56K, a 24 by 24 bit to 48 bit multiplication (with the 48 bit result added to a 56 bit accumulator) is implemented as a single MultiplyAccumulate (MAC) instruction.
- Fetch both the delayed sample, x[n-k] and the FIR coefficient h[n] for the next MAC. To do both, this requires at least 2 Data memory spaces: one for coefficient, one for the signal. The 56K has another space for the program instruction opcodes. All 3 memory spaces can be accessed in parallel (simultaneously). This is called a "HarvardArchitecture".
- Have "guard" bits (greater than MSB) to be able to handle overflow from the additions. The 56K has 8 guard bits. This lengthens (sign extends) the 48 bit product to 56 bits to be added to the 56 bit accumulator.
- Be able to saturate the results to the maximum positive or negative values if the result (with the guard bits) exceeds the range. Saturation is an error but it's better than the 2's complement wrap around a normal CPU will do.
- Besides being able to saturate to the "rails" based on the guards bits, the DSP's ALU should be able to round to the nearest LSB of the high order word based on the low order word of the accumulator.
- To address the delayed samples in memory, the ability to do modulo arithmetic on the pointer to x[k-n] is necessary. This implements a "circular queue" or "circular buffer" or "first-in-first-out" (FIFO) buffer which, in the audio DSP world, is just called a "delay line". The DSP programmer should not have to worry about when a pointer increments beyond the boundary of a circular queue requiring the pointer to be adjusted or wrapped around. The DSP should make that modulo adjustment to the pointer automatically. In many applications, it is not sufficient to limit the modulus to powers of 2
In addition, for the
FastFourierTransform (FFT), the ability to do bit reversal or bit reversed addressing is necessary along with some other handy instructions that make for efficient coding of FFT butterflies.
It is suggested getting a good DSP textbook like Oppenheim & Schafer "DiscreteTimeSignalProcessing" (ISBN 0137549202 ) to understand the mathematical needs of digital signal processing and the DSP User's Manuals for some particular DSP chips to see how these chips satisfy those mathematical needs. This might help one understand the DSP architecture better.
(contributed by Robert Bristow-Johnson rbj@audioimagination.com )
DigitalSignalProcessorFamilies