Why Use DSP?: An Introductory Course in DSP System Design, Part 2
Apr 1, 1998 12:00 PM,
David Skolnick and Noam Levine
If you have read the first part of this series (see S&VC, December 1997) orare already familiar with some of the ways a DSP can work with real-worldsignals, you might want to learn more about how digital filters can beimplemented with a DSP. This article will introduce you to modeling filtertransform functions, relating the models to DSP architecture andexperimenting with digital filters. This series seeks to describe thesetopics from the perspective of analog system designers who want to add DSPto their design repertoire. Using the information from articles in thisseries as an introduction, designers can make more informed decisions aboutwhen DSP designs might be more productive than analog circuits.
Modeling filter transform functionsThe previous article compared analog and digital filter properties andsuggested why one might implement these filters digitally (using DSP); thispart focuses on some of the mechanics of digital filter application.Digital filtering is used because it provides a closer approach to idealfilter approximations, allows adjustment of filter characteristics viasoftware and offers compatibility of filter response with sampled data.
The two best-known filters described in the previous article are the finiteimpulse-response (FIR) and infinite impulse-response (IIR) types. The FIRfilter response is called finite because its output is based solely on afinite set of input samples; it is non-recursive and has no poles, onlyzeroes in its s-plane. The IIR filter, on the other hand, has a responsethat can go on indefinitely (and can be unstable) because it is recursive;its output values are affected by both input and output. It has both polesand zeroes in its s-plane. Figure 1 shows the typical filter architecturesand summation formulas that appeared in the December article.
To model these filters digitally, one might take two steps. First, viewthese formulas as programs running on a computer. This step consists ofbreaking down the formula into the mathematical steps and identifying allof the additional operations necessary for a computer to perform, such ashandling instructions and data or testing status, in order to implement theformula in software.
Second, take those operations and write them as a program. This step can bea fairly arduous task. Fortunately, you can purchase a great deal of cannedsoftware, often available in a high-level language (HLL) such as C, whichsomewhat simplifies the job of programming. From the point of view oflearning, though, it may be more instructive to start with assemblylanguage; assembly language algorithms are also often more useful than HLLwhere system performance must be optimized. At the level of abstraction ofsome high-level languages, the program may not look much like theequations. For example, Figure 2 shows an example of an FIR algorithmimplemented as a C program.
Many analysis packages support algorithm modeling. Now, continuing thediscussion of the process, after these filter algorithms have been modeled,they are ready for implementation in DSP architecture.
Relating the models to DSP architectureFor programming, one must understand four sections of DSP architecture:numeric, memory, sequencer and I/O operations. This architecturaldiscussion is generic (applying to general DSP concepts), but it is alsospecific as it relates to programming examples later in this article.Figure 3 shows the generalized DSP architecture that this section describes.
Numeric section: Because DSPs must complete multiply/accumulate, add,subtract, and/or bit-shift operations in a single instruction cycle,hardware optimized for numeric operations is central to all DSP processors.It is this hardware that distinguishes DSPs from general-purposemicroprocessors, which can require many cycles to complete these types ofoperations. In the digital filters (and other DSP algorithms), the DSP mustcomplete multiple steps of arithmetic operations involving data values andcoefficients to produce responses in real time that have not been possiblewith general-purpose processors.
Numeric operations occur within a DSP’s multiply/accumulator (MAC),arithmetic-logic unit (ALU) and barrel shifter (shifter). The MAC performssum-of-products operations, which appear in most DSP algorithms, such asFIR and IIR filters and fast Fourier transforms. ALU capabilities includeaddition, subtraction and logical operations. Operations on bits and wordsoccur within the shifter. Figure 3 shows the parallelism of the MAC, ALUand shifter and how data can flow into and out of them.
>From a programming point of view, a DSP architecture that uses separatenumeric sections provides great flexibility and efficiency. There are manynon-conflicting paths for data, allowing single-cycle completion of numericoperations. The architecture of the DSP must also provide a wide dynamicrange for MAC operations, with the ability to handle multiplication resultsthat are double the width of the inputs and accumulator outputs that canmount up without overflowing. On a 16-bit DSP, this feature equates to16-bit data inputs and a 40-bit result output from the MAC. One needs thisrange for handling most DSP algorithms, such as filters.
Other features of the numeric section can facilitate programming inreal-time systems. By making operations contingent on a variety ofconditional states which result from numeric operations, these can serve asvariables in a program’s execution, testing for carries, overflows,saturates, flags or other states. Using these conditionals, a DSP canrapidly handle decisions about program flow based on numeric operations.The need to feed data into the numeric section constantly is a key designinfluence on the DSP’s memory and internal bus structures.
Memory section: DSP memory and bus architecture design is guided by theneed for speed. Data and instructions must flow into the numeric andsequencing sections of the DSP on every instruction cycle. There can be nodelays, no bottlenecks. Everything about the design focuses on throughput.
To place this focus on throughput in perspective, one can look at thedifference between DSP memory design and memory for other microprocessors.Most microprocessors use a single memory space containing both data andinstructions, using one bus for address and another for data orinstructions. This architecture is called von Neumann architecture. Thelimitation on throughput in a von Neumann architecture comes from having tochoose between either a piece of data or an instruction on each cycle. InDSPs, memory is typically divided into program and data memory, withseparate buses for each. This type of architecture is referred to asHarvard architecture. By separating the data and instructions, the DSP canfetch multiple items on each cycle, doubling throughput. Additionaloptimizations, such as instruction cache, results feedback and contextswitching also increase DSP throughput.
Other optimizations in DSP memory architecture relate to repeated memoryaccesses. Most DSP algorithms, such as digital filters, need to get datafrom memory in a repeating pattern of accesses. Typically, this type ofaccess serves to fetch data from a range of addresses, a range that isfilled with data from the real-world signals to be processed. By reducingthe number of instructions needed to manage memory accesses (overhead),DSPs save instruction cycles, allowing more time for the main job of eachcycle—processing signals. To reduce overhead and automatically managethese types of accesses, DSPs utilize specialized data address-generators(DAGs).
Most DSP algorithms require two operands to be fetched from memory in asingle cycle to become inputs to the arithmetic units. To supply theaddresses of these two operands in a flexible manner, the DSP has two DAGs.In the DSP’s modified Harvard architecture, one address generator suppliesan address over the data-memory address bus; the other supplies an addressover the program-memory address bus. By performing these two data fetchesin time for the next numeric instruction, the DSP is able to sustainsingle-cycle execution of instructions.
DSP algorithms, such as the example digital filters, usually require datain a range of addresses (a buffer) to be addressed so that the addresspointer wraps around from the end of the buffer back to the start of thebuffer (buffer length). This pointer movement is called circular buffering.(In the filter equations, each summation basically results from a sequenceof multiply-and-accumulates of a circular buffer of data points and acircular buffer of coefficients). A variation of circular buffering, whichis required in some applications, advances the address pointer by valuesgreater than one address per step but still wraps around at a given length.This variation is called modulo circular buffering.
By supporting various types of buffering with its DAGs, the DSP is able toperform address modify and compare operations in hardware for optimumefficiency. Performing these functions in software (as occurs in general-purpose processors) limits the processor’s ability to handle real-timesignals.
Because buffering is an unusual concept, yet key to digital signalprocessing, a brief buffering example is useful. In the example illustratedin Figure 4, a buffer of eight locations resides in memory starting ataddress 30. The address generator must calculate next addresses that staywithin this buffer, yet keep the proper data spacing so that two locationsare skipped. The address generator outputs the address 30 on to the addressbus while it modifies the address to 33 for the next cycle’s memory access.This process repeats, moving the address pointer through the buffer. Aspecial case occurs when the address 36 gets modified to 39. The address 39is outside the buffer. The address generator detects that the address hasfallen outside of the buffer boundary and modifies the address to 31, as ifthe end of the buffer is connected to the start of the buffer. The update,compare and modify occur with no overhead. In one cycle, the address 36 isoutput onto the address bus. On the next cycle, the address 31 is outputonto the address bus. This modulo circular buffering serves the needs ofalgorithms such as interpolation filters and saves instruction cycles forprocessing.
Sequencer Section: Because most DSP algorithms (such as the examplefilters) are by nature repetitive, the DSP’s program sequencer needs toloop through the repeated code without incurring overhead while gettingfrom the end of the loop back to the start of the loop. This capability iscalled zero-overhead looping. Having the ability to loop without overheadis a key area in which DSPs differ from conventional microprocessors.Typically, microprocessors require that program loops be maintained insoftware, placing a conditional instruction at the end of the loop. Thisconditional instruction determines whether the address pointer moves(jumps) back to the top of the loop or to another address. Because gettingthese addresses from memory takes time-and availability of time for signalprocessing is critical in DSP applications-DSPs cannot waste cyclesretrieving addresses for conditional program sequencing (branching) in thismanner. Instead, DSPs perform these test and branch functions in hardware,storing the needed addresses.
As Figure 5 shows, the DSP executes the last instruction of the loop in onecycle. On the next cycle, the DSP evaluates the conditional and executeseither the first instruction at the top of the loop or the firstinstruction outside the loop. Because the DSP uses dedicated hardware forthese operations, no extra time is wasted with software evaluatingconditionals, retrieving addresses or branching program execution.
Input/Output (I/O) Section: As noted again and again, there is a need fortremendous throughput of data to the DSP; everything about its design isfocused on funneling data into and out of the numeric, memory and sequencersections. The source of the data-and destination of the output (the resultof signal processing)-is the DSP’s connection to its system and the realworld. A number of I/O functions are required to complete signal processingtasks. Off-DSP memory arrays store processor instructions and data.Communication channels, such as serial ports, I/O ports and DMA channels,transfer data into and out of the DSP quickly. Other functions, such astimers and program boot logic, ease DSP system development. A brief list oftypical I/O tasks in a DSP system includes the following (among manyothers):
Boot loading: At Reset, the DSP loads instructions from an external source(EPROM or host), usually through an external memory interface.
Serial communications: The DSP receives or transmits data through asynchronous serial port (SPORT), communicating with codecs, ADCs, DACs orother devices.
Memory-mapped I/O: The DSP receives or transmits data through an off-DSPmemory location that is decoded by an external device.
Experimenting with digital filtersHaving modeled the filter algorithms and looked at some of the DSParchitectural features, one is ready to start looking at how these filterscould be coded in DSP assembly language. Up to this point, the discussionand examples have been generic, applying to almost all DSPs. Here, theexample is specific to the Analog Devices ADSP-2181. This processor is afixed-point, 16-bit DSP. The term “fixed point” means that the pointseparating the mantissa and exponent does not change its bit locationduring arithmetic operations. Fixed-point DSPs can be more challenging toprogram, but they tend to be less expensive than floating-point DSPs. The”16-bit” in “16-bit DSP” refers to the size of the DSP’s data words. ThisDSP uses 16-bit data words and 24-bit wide instruction words. DSPs arespecified by the size of the data rather than instruction width, becausedata word size describes the width of data that the DSP can handle mostefficiently.
The example program in Figure 6 is an FIR filter in ADSP-2181 assemblylanguage. The software has two parts. The main routine includes registerand buffer initialization along with the interrupt vector table, and theinterrupt routine that executes when a data sample is ready. Afterinitialization, the DSP executes instructions in the main routine,performing some background tasks, looping through code or idling in alow-power standby mode until it gets an interrupt from the A/D converter.In this example, the processor idles in a low-power standby mode waitingfor an interrupt.
The FIR filter interrupt subroutine (the last segment of code) is the heartof the filter program. The processor responds to the interrupt, saving thecontext of the main routine and jumping to the interrupt routine. Thisinterrupt routine processes the filter input sample, reading data andfilter coefficients from memory and storing them in data registers of theDSP processor. After processing the input sample, the DSP sends an outputsample to the D/A converter.
Note that this program uses DSP features that perform operations with zerooverhead, usually introduced by a conditional. In particular, program loopsand data buffers are maintained with zero overhead. The multifunctioninstruction in the core of the filter loop performs a multiply/accumulateoperation while the next data word and filter coefficient are fetched frommemory.
The program checks the final result of the filter calculation for anyoverflow. If the final value has overflowed, the value is saturated toemulate the clipping of an analog signal. Finally, the context of the mainroutine is restored, and the instruction flow is returned to the mainroutine with a return from interrupt (RTI) instruction.
The goal of this article has been to provide a link between filter theoryand digital filter implementation. On the way, this article covers modelingfilters with HLL programs, using DSP architecture and experimenting withfilter software. The issues introduced in this article include filters asprograms, DSP architecture (generalized) and DSP assembly language.