# The DSP Debate

Sep 1, 2002 12:00 PM,

Greg Duckett and Terry Pennington

The mountebanks are at it again, hawking their goods by telling tall tales. The new tale states that for pro-audio applications, floating-point digital signal processors (DSPs) are better than fixed-point DSPs — but that’s not quite so. You might be paying too much and receiving too little for DSPs.

A lot of confusion has been stirred up recently in the audio industry about the issue of DSP internals. A popular misconception is that because a box uses 32-bit floating-point DSP chips, it’s inherently better than a digital signal processor that uses 24-bit fixed-point DSP chips. Obviously, 32 is better than 24, right? Yes — if you don’t know the facts. A 24-bit fixed-point DSP box can operate in double-precision mode, making it a 48-bit box. In this case, 48 really is better than 32, only it has little to do with size. Given today’s choices, fixed point is superior to floating point for audio. Here is why fixed-point DSP makes for superior audio:

- less dynamic range (yes, that can be a feature in DSPs used for audio);
- double-precision-capable 48-bit processing;
- programming flexibility that can guarantee proper behavior under the adverse conditions presented by audio signals;
- lower power consumption (floating-point hardware is more complicated than fixed point, and more transistors require more watts).

A truly objective comparison of DSP architectures is not easy. In fact, it may not be possible. In the end, the application and the skill of the software programmer implementing the application determine superior audio performance. But people don’t want to hear that — they want the easy answer. Everyone is looking for the secret word, the single number that defines the difference and makes the duck drop down and deliver a $100 bill (apologies to readers who have not seen the original **You Bet Your Life**, hosted by Groucho Marx). Yet the truth is there is no single number that quantifies the differences — not the number of bits, the MIPS or FLOPS rate, the clock rate, or the architecture.

## THE DIFFERENCE

Two distinct types of DSP implementations dominate pro-audio applications: one uses fixed-point processing, whereas the other features a floating-point solution (see the sidebar “Terminology”). Both produce the same results under most conditions; however, the difference lies in those other conditions.

Looking under the hood of an IEEE 32-bit floating-point processor and a 24-bit fixed-point processor reveals that each DSP design offers the same 24-bit processing precision, but precision is not the issue. The defining difference is that the fixed-point implementation offers double precision, and the floating-point device features increased dynamic range.

In floating-point processors, scaling the data increases dynamic range, but scaling does not improve precision; in fact, it degrades performance for audio applications. It turns out that the strength of the fixed-point approach is the weakness of the floating point, giving fixed point a double advantage.

The benefit is most obvious in low-frequency audio processing. That is important because most of the energy in audio lies in the low-frequency bands (music and speech have an approximate 1/f spectrum characteristic — that is, each doubling of frequency results in a halving of amplitude). The floating-point technique struggles with large-amplitude, low-frequency computations. Building a high-Q, low-frequency digital filter is difficult no matter what method you use, but nevertheless, fixed-point double precision is superior to floating-point single precision.

To thoroughly explain, in a scientific engineering manner, the advantages and disadvantages of the two techniques as they relate to broadband audio applications is a vastly complex subject that is beyond the scope of this article. An objective direct comparison involves a slippery slope of complexities, qualifications, and definitions, all of which are necessary to avoid apples-to-oranges errors. That daunting task has been accomplished by Dr. Andy Moorer, cofounder of the Stanford University Center for Computer Research in Music and Acoustics and senior computer scientist at Adobe Systems. Moorer’s research is recommended to the detail curious and mathematically courageous. The goal is to draw a simplified illustration of the audio challenges faced by each DSP solution (see the sidebar “Let’s Be Precise About This …” for an in-depth, mathematically oriented example).

## DYNAMIC RANGE

Higher dynamic range is better, as is lower distortion and lower noise, right? A sales guide published by a hi-fi manufacturer in the mid-’70s had a “lower is better” and “higher is better” approach to equipment specifications. The author of that promotional material would be shocked to hear an audio manufacturer claim that higher dynamic range can be a problem. Nonetheless, it is a fact when examined in relationship to the ultrahigh dynamic range capabilities of the 32-bit floating-point processors found in some of the current DSP units.

Both DSP designs have a 24-bit processor for the mainstream functions. The fixed-point technique adds double precision, giving it 48-bit processing power, whereas the floating-point design adds an 8-bit exponent. The 8-bit exponent gives the floating-point architecture a dynamic range spec of 1,500 dB (8 bits equals 256, and 2256 equals approximately 1,500 dB), which is used to manipulate an operating window, within which its 24-bit brain operates. Floating-point processors automatically scale the data to keep it within optimum range. That is where the trouble lies. The dynamic range itself is not the problem so much as it is the automatic scaling over a 1,500 dB range. Fixed point, with its 48 bits, gives you 288 dB of dynamic range, enough for superior audio, but the programmer has to scale the data carefully. Floating-point programmers leave it up to the chip, but unless they are careful, that creates serious errors and noise artifacts. All the jumping about done by the continuous signal boosting and attenuating can produce annoying noise pumping.

Some manufacturers point to that DSP dynamic range as a plus. However, the dynamic-range specification of a DSP chip has little to do with the overall dynamic range of the finished product. The dynamic range of the box is bound by the A/D converter on its input and, to some extent, the processing in the center of the device (of which the DSP chip is a part) and on the D/A converter on the output. Even without converters, the output of both DSP types is a 24-bit fixed-point word. The dynamic range of a DSP chip is the ratio between the largest and smallest numbers it can handle. If a DSP device sporting an advertised dynamic range of 1,500 dB resides between the input converters and the output converters, its contribution to the overall dynamic range of the product is limited to the dynamic range of the converters. Is that a bad thing? Not of itself.

What’s bad about a floating-point processor with a dynamic range of 1,500 dB is that it scales its processing range based on the amplitude of the signal it’s dealing with. However, when dealing with signals of differing amplitudes (that is, real audio), the scaling may not be optimized for the mixed result. When dealing with audio signals, the installer cannot simply ignore the subtleties of system setup because he or she has a floating-point processor.

Consider the typical audio mixer scenario: at any given moment, a mixer can have multiple levels present at its many input ports. Input 1 might have a high-level sample to deal with, whereas Input 2 has a very low level, and Input 3 is somewhere in the middle of its upper and lower limits. A 32-bit floating-point DSP chip makes a determination about the appropriate window within which to work on a sample-by-sample basis but finally represents its calculations in the same 24-bit manner as its fixed-point counterpart. Even in a simple 2-channel stereo processor, signal levels between channels, though similar in average level, can be vastly different instantaneously because of phase differences. Nothing is gained by using a floating-point device in an audio application, but much may be lost. It does not have the 48-bit double-precision capability of a fixed-point solution, and noisy artifacts may be added.

## PROPER GAIN SETTING

The installer plays the final as well as most important role in maintaining the proper processing window alignment for a given installation, at least as long as converters and DSPs have finite precision and dynamic range. Improperly set gain structure can overload fixed-point processors. Floating-point DSPs give the flexibility to misadjust the system (too much internal gain) without noticeable internal clipping; however, they still suffer the unintended consequences of the misalignment (say, in trying to mix two channels of very different audio levels) that floating-point processors cannot fix. They merely mask the problem from the installer’s view. Or even worse, they produce an audible and annoying rise in quantization noise when filters are used below 100 Hz.

In that sense, the fixed-point processors force the installer to maintain the 144 dB processing window by avoiding internal clipping through proper gain structure and setup, which makes maintaining overall quality easier than floating proper processor-based boxes.

## DOUBLE PRECISION

The double-precision 48-bit processing is used when longtime constants are needed. That occurs when low-frequency filters are on the job and when compressors, expanders, and limiters are used with their relatively slow attack and release times. If only 24 bits are available when more precision is required, the result is a problem. The function misbehaves, and the least damaging result is poor sound quality. The worst result is amplifier or loudspeaker damage due to a misbehaving DSP crossover, making double precision a must-have for superior audio.

Floating-point evangelists like to use an example in which the processor is set up for 60 dB attenuation on the input and 60 dB makeup gain on the output. However, if you add a second input to the example, with the gain set for unity and a 0 dBu signal coming in, and then configure the processor to sum both the channels into the output and listen to the results, you will not like what you hear.

Another revealing example shows why you never hear floating-point advocates talk about low-frequency/high-Q filter behavior. The next time you get the opportunity, set up a floating-point box parametric filter for use as a notch filter with a center frequency of 50 Hz and a Q of 20. Listen to the increase in output noise, run an input sweep from 20 Hz to 100 Hz, and listen to all the unappetizing sounds that result. Audio filters below about 100 Hz require simultaneous processing of large numbers and small numbers — something fixed-point DSPs do much better than their floating-point cousins.

The real determinant of quality in audio DSP is the skill of the programmers. They must devise accurate and efficient algorithms; the better they understand the sometimes arcane math, the better the algorithm. The better the algorithm, the better the results. Fixed-point processing delivers a load of responsibility to the hands of the developer. It also delivers an equal amount of flexibility. A talented engineer with a good grasp of exactly what is required of a DSP product can fashion every detail of a given function down to the last bit. Floating-point designs don’t provide that flexibility. Their ease of programming makes them popular when engineering talent is limited, but they’re not the best choice. Programming ease means less control over the final results — and, you know, the results are what matters.

## THE RIGHT TOOL FOR THE RIGHT JOB

If fixed-point DSP devices are so good, then why do floating-point DSPs exist? Fair-enough question. They exist because DSP applications differ widely. Some of the more popular floating-point applications are found in physics, chemistry, meteorology, fluid dynamics, image recognition, earthquake modeling, number theory, crash simulation, weather modeling, and 3-D graphics. If you are designing an image processor, a radar processor, anything to do with astronomy, or a mathematics matrix inverter, the choice is clearly a floating-point solution. As always, the application dictates the solution.

That is not to say that floating-point DSPs will never have their day in achieving superior audio — it’s just not today. What will it take? For floating point to overtake fixed point, some pretty nasty requirements must be met. It must be a 56-bit, floating-point processor (that is, a 48-bit mantissa and an 8-bit exponent) or a 32-bit with double precision (requiring a large accumulator); the parts must run at the same speed as the equivalent fixed-point part; it must use the same power; and it has to cost the same. If and when those requirements are met, the choice will be made.

Another possibility is if the floating-point DSPs evolve to offer significantly more processing power for the same price (enough to overcome the low-frequency, high-Q issues in firmware) and provide a compatible peripheral chip set. That could tip the scales, even if the unit has only a 32-bit fixed numerical format.

Digital audio is a vast and complex subject with many subtleties, especially when it comes to superior signal processing. A brief article touches on some of the important issues but not all. For a more detailed exploration into the subject, check out John Watkinson’s **The Art of Digital Audio**, 3rd ed. The title says it all.

**Greg Duckett** is director of R&D/Engineering, and **Terry Pennington** is director of corporate information systems for Rane. Thanks to Ray Miller, Dana Troxel, and Michael Rollins for their technical contributions to the accuracy of this article.

## Terminology

**Double precision:** The use of two computer words to represent each number. That preserves or improves the precision, or correctness, of the calculated answer. For example, if the number 999 is single precision, then 999,999 is the double-precision equivalent.

**Exponent:** The component of a floating-point number that normally signifies the integer power to which the radix is raised in determining the value of the represented number (IEEE-100). For example if radix = 10 (a decimal number), then the number 183.885 is represented as mantissa = 1.83885 and exponent = 2 (because 183.885 = 1.83885X102).

**Fixed point:** A computing method in which numbers are expressed in the fixed-point representation system — that is, one in which the position of the decimal point (technically the radix point) is fixed with respect to one end of the numbers. Integer or fractional data is expressed in a specific number of digits, with a radix point implicitly located at a predetermined position. Fixed-point DSPs support fractional arithmetic, which is better suited to digital-audio processing than integer arithmetic. A couple of fixed-point examples with two decimal places are 4.56 and 1,789.45.

**Floating point:** A computing method in which numbers are expressed in the floating-point representation system — that is, one in which the position of the decimal point does not remain fixed with respect to one end of numerical expressions but is regularly recalculated. A floating-point number has four parts: sign, mantissa, radix, and exponent. The sign indicates polarity, so it is always 1 or -1. The mantissa is a positive number representing the significant digits. The exponent indicates the power of the radix (usually binary 2 but sometimes hexadecimal 16). A common example is the scientific notation used in all science and mathematics fields. Scientific notation is a floating-point system with radix 10 (that is, decimal).

**IEEE 754-1985:** Standard for binary floating-point arithmetic, often referred to as IEEE 32-bit floating point. This is a standard that specifies data format for floating-point arithmetic on binary computers. It divides a 32-bit data word into a 24-bit mantissa and an 8-bit exponent.

**Mantissa:** The fractional part of a real number — for example, in the number 1.83885, the mantissa is 0.83885. (The integer part of a number is called the characteristic. In the example, the characteristic is 1.) Floating-point arithmetic also calls this the significand.

**Radix:** The number base, such as 2 in the binary system and 10 in the decimal system.

**Radix point:** The binary equivalent of the decimal point — think of it as a binary point.

## The Big and the Small

In audio DSP processing, you run into the same simple arithmetic repeated again and again: multiply one number by another number and add the result to a third number. Often the result of this multiplication and addition is the starting point for the next calculation, so it forms a running total or an accumulation of all the results over time.

Naturally enough, adding the next sample to the previous result is called an accumulate, and it follows that a multiplication followed by an accumulate is called a MAC. MACs are the most common of all operations performed in audio DSP, and DSP processors typically have special hardware that performs a MAC very quickly.

As results accumulate, errors also accumulate. The total can get large compared with the next sample. To show that in action, return to the mythical three-digit processors. Say you have the series of numbers shown in the row labeled Samples in the table; a strange-looking set of numbers, perhaps, but it represents the first part of a simple sine wave. Multiply the first number by a small constant (say, 0.9) and add the result to the second number: 0*0.9 + 799 = 799. Multiply that result by 0.9 and add it to the third number: 799*0.9 + 1,589 = 2,308. Repeat that (2,308*0.9 + 2,364 = 4,441). Continue the pattern, and it forms a simple digital filter. The results using double-precision fixed point are shown in the row labeled Fixed-Point Results in the table.

What about the floating-point processor? Start with exponent = 0. The results are: 0, 799, and a following number that is too big, so increase the exponent to 1 … 2,290, 4,420, and so on. Notice that the floating-point values are smaller than they should be because the limited precision forces the last one or two digits to be zero. It’s easy to see that each result has an error, and the errors are carried forward and accumulate in the results. Algorithms with long time constants, such as low-frequency filters, are especially prone to these errors.

You’ll also notice that the accumulated values are getting larger than the input samples. The long time constant in low-frequency filters means that the accumulation happens over a longer time, and the accumulated value stays large for a longer time. Whenever the input signal is near zero (at least once every cycle in a typical audio signal), the samples can be small enough that they are lost. Because the accumulated value is large, the samples fall entirely outside the precision range of the floating-point processor and are interpreted as zero. The double precision available in the fixed-point processor helps the programmer avoid those problems.

## Let’s Be Precise About This …

An example is the best way to explain how you lose precision when floating-point processors scale data. Assume you have two mythical three-digit radix-10 (that is, decimal) processors. One is fixed point, and the other is floating point. For simplicity, this example uses only positive whole numbers. (On real fixed- or floating-point processors, the numbers are usually scaled to be between 0 and 1.)

The largest number represented in single precision on the fixed-point processor is 999. Calculations that produce numbers larger than 999 require double precision. This allows numbers as large as 999,999.

Let the floating-point processor use two digits for the exponent, making it a five-digit processor. That means it has a dynamic range of 0 to 999X1099, which equals a huge number. To see how that sometimes is a problem, begin with the exponent = 0. That allows the floating-point processor only to represent numbers as high as 999 — same as the fixed-point, single-precision design. Calculations that produce numbers larger than 999 require increasing the exponent from 0 to 1, which allows numbers as high as 9,990.

However, notice that the smallest number (greater than zero) that can be represented is 1X101 = 10, meaning numbers between 1 and 9 cannot be represented (nor can 11 through 19, 21 through 29, 31 through 39, and so on). Increasing the exponent to 3 only makes matters worse, but you can cover (almost) the same range as the fixed-point processor (as high as 999,000). However, the smallest number now represented is 1X103 = 1,000, meaning numbers between 1 and 999 cannot be represented. The next increment is 2X103 = 2,000, meaning the represented number jumps from 1,000 to 2,000. Now numbers between 1,001 and 1,999 cannot be represented. With exponent = 3, each increment in the mantissa of 1 results in an increase in the number of 1,000 and another 999 values that cannot be represented.

Is that as big a problem as it first appears? Yes and no. At first it looks like the floating-point processor has lost the ability to represent small numbers for the entire calculation’s time, but the scaling happens on a per-sample basis. The loss of precision occurs only for the individual samples with magnitude greater than 999. You might think that everything is fine, because the number is big and it does not need the values around zero. But a few wrinkles cause trouble. When calculations involve large and small numbers at the same time, the loss of precision affects the small number and the result. That is especially important in low-frequency filters or other calculations with long time constants.

Another wrinkle is that this happens automatically and beyond the control of the programmer. If the programmer does not employ the right amount of foresight, it could happen at a bad time with audible results. In the fixed-point case, the programmer must explicitly change to double precision — there is nothing automatic about it. The programmer changes to double precision at the start of the program section requiring it and stays there until the work is done.