Digital dharma
Feb 1, 1999 12:00 PM, Dennis Bohn
Among the many definitions for dharma is the essential function or nature of a specific thing, and along those lines. The audio industry has been radically and irrevocably changed by the digital revolution. Arguments will ensue forever about whether the true nature of the real world is analog or digital, whether the fundamental essence (or dharma) of life is continuous (analog) or exists in tiny little chunks (digital). Here we shall but resolve to understand the dharma of A/D converters.
Data conversion Once a waveform has been converted to digital format, nothing can occur to change its sonic properties. While it remains in the digital domain, it is a series of digital words, representing numbers. Aside from the gross example of having the digital processing actually fail and cause a word to be lost or corrupted, nothing can change the sound of the word. It is just a bunch of ones and zeroes. There are no fractions. The point being that sonically, it begins and ends with the conversion process. Nothing is more important to digital audio than data conversion. Everything in between is arithmetic and waiting. That is why data conversion is so critical. We could go so far as to say that data conversion is the art of digital audio while everything else is the science; it is data conversion that ultimately determines whether or not the original sound is preserved.
Because analog signals continuously vary between an infinite number of states and computers can only handle two, the signals must be converted into binary digital words before the computer can function. Each digital word represents the value of the signal at one precise point in time. Today's common word length is 16 bits or 32 bits. Once converted into digital words, the information may be stored, transmitted or operated upon within the computer. In order to properly explore the critical interface between the analog and digital worlds, it is necessary to review a few fundamentals and a little history.
Binary numbers Whenever we speak of digital, by inference, we speak of computers (computer is used here to represent any digital-based piece of audio equipment), and computers are really quite simple. They can understand only the most basic form of communication or information-yes or no, on or off-all of which can be symbolically represented by two things, anything from two letters to two numbers or two charges. To keep it simple, we choose two numbers-one and zero. Officially, this is known as binary representation, from Latin bini, meaning two by two. In mathematics, this is a base-2 number system as opposed to our decimal number system, which is called base-10 because we use the 10 numbers, zero through nine.
In binary, zero is a good symbol for no, off, closed and gone, and one is easy to understand as meaning yes, on, open and here. In electronics, it is easy to determine whether a circuit is open or closed, conducting or not conducting, has voltage or lacks voltage. Thus, the binary number system found use in the first computer, and nothing has changed. Computers just got faster, smaller and cheaper, and memory size increased in a decreasing space.
One problem with using binary numbers is that they become big and unwieldy in a hurry. For instance, it takes six digits to express my age in binary but only two in decimal. In binary, however, we had better not call them digits because that implies a human finger or toe, of which there are 10. To get around that problem, John Tukey of Bell Laboratories dubbed the basic unit of information (as defined by Shannon) a binary unit or binary digit, which became abbreviated to bit. A bit is the simplest possible message representing one of two states.
So, I am 6 bits old. Well, not quite. Nevertheless, it takes 6 bits to express my age as 110111. I am 55 years old in base-10 symbols, which stands for five ones plus five tens. Each digit in our everyday numbers represents an additional power of 10 beginning with zero. That is, the first digit represents the number of ones (100); the second digit represents the number of tens (101). We can represent any size number by using this shorthand notation.
Binary number representation is just the same except substituting the powers of two for the powers of 10. Therefore, moving from left to right, each succeeding bit represents 20 = 1, 21 = 2, 22 = 4, 23 = 8, 24 = 16, 25 =32 and so on. My age is thus represented as 110111, which is 32+16+0+4+2+1.
Harry and Claude The French mathematician Fourier unknowingly laid the groundwork for A/D conversion in the late 18th century. All data conversion techniques rely on looking at or sampling the input signal at regular intervals and creating a digital word that represents the value of the analog signal at that precise moment. The fact that we know this works lies with Harry Nyquist. While working at Bell Laboratories in the late 1920s, Nyquist discovered and wrote the paper, "Certain Topics in Telegraph Transmission Theory" in which he described the criteria for sampled data systems. He taught that for periodic functions, if you sampled at a rate that was at least twice as fast as the signal of interest, then no information (data) would be lost upon reconstruction. Because Fourier had already shown that all alternating signals are made up of a sum of harmonically related sine and cosine waves, audio signals are periodic functions and can be sampled without loss of information. This became known as the Nyquist frequency, which is the highest frequency that may be accurately sampled, and is half the sampling frequency. The theoretical Nyquist frequency for the audio CD system is 22.05 kHz, equaling half of the standardized sampling frequency of 44.1 kHz.
As powerful as Nyquist's discoveries were, they were not without their dark side, the biggest being aliasing frequencies. Following the Nyquist criteria guarantees that no information will be lost. It does not, however, guarantee that no information will be gained. Sampling an analog signal at precise time intervals is an act of multiplying the input signal by the sampling pulses. This introduces the possibility of generating false signals indistinguishable from the original. In other words, given a set of sampled values, we cannot relate them specifically to one unique signal. As Figure 1 shows, the same set of samples could have resulted from any of the three waveforms shown and from all possible sum and difference frequencies between the sampling frequency and the one being sampled. All such false waveforms that fit the sample data are called aliases. In audio, these frequencies show up mostly as intermodulation distortion products, and they come from the random-like white noise or any ultrasonic signal present in every electronic system. Solving the problem of aliasing frequencies is what improved audio conversion systems to today's level of sophistication.
Claude Shannon is recognized as the father of information theory. While a young engineer at Bell Labs in 1948, he defined an entirely new field of science. Earlier, while a 22-year-old student at MIT, he had shown in his masters thesis how the algebra invented by the British mathematician George Boole in the mid-1800s, could be applied to electronic circuits. Since that time, Boolean algebra has been the rock of digital logic and computer design. Shannon studied Nyquist's work closely and came up with a deceptively simple addition. He observed and proved that if you restrict the input signal's bandwidth to less than half the sampling frequency, then no aliasing errors are possible. Bandlimiting your input to no more than half the sampling frequency guarantees no aliasing, but it is not possible.
To satisfy the Shannon limit, you must have the proverbial brick wall-infinite-slope filter, which cannot happen in our universe. You cannot guarantee that there is absolutely no signal or noise greater than the Nyquist frequency. Fortunately, there is a way around this problem. If you cannot restrict the input bandwidth to prevent aliasing, then solve the problem by increasing the sampling frequency until the aliasing products that do occur do so at ultrasonic frequencies and are effectively dealt with by a simple single-pole filter. This is where the term oversampling comes in. For full-spectrum audio, the minimum sampling frequency must be 40 kHz, giving a useable theoretical bandwidth of 20 kHz, the limit of normal human hearing. Sampling at anything significantly higher than 40 kHz is oversampling. In just a few years, we have seen the audio industry go from the CD system standard of 44.1 kHz and the pro audio quasi-standard of 48 kHz to 8x and 16x oversampling frequencies of around 350 kHz and 700 kHz respectively. With sampling frequencies this high, aliasing is no longer an issue.
Quantization Quantizing is the process of determining which of the possible values (determined by the number of bits or voltage reference parts) is the closest value to the current sample; you assign a quantity to that sample. Quantizing involves deciding between two values and thus, always introduces error. How big the error or how accurate the answer depends on the number of bits. The more bits, the better the answer. The converter has a reference voltage which is divided up into 2n parts, where n is the number of bits. Each part represents one bit. Because you cannot resolve anything smaller than one bit, there is always error in the conversion process. This is the accuracy issue.
The number of bits determines the converter accuracy. For 8 bits, there are 28 (256) possible levels as shown in Figure 2. Because the signal swings positive and negative, there are 128 levels for each direction. Assuming a +/-5 V reference, this makes each division or bit equal to 39 mV (5/128 = .039). Hence, an 8 bit system cannot resolve anything smaller than 39 mV.. This means a worst-case accuracy error of 0.78%. Table 1 compares the accuracy improvement gained by 16 bit, 20 bit and 24 bit systems along with the reduction in error. This is not the only way to use the reference voltage. Many schemes exist for coding, but this one nicely illustrates the principles involved. Each step size, resulting from dividing the reference into the number of equal parts dictated by the number of bits, is equal and is called a quantizing step or interval. Originally, this step was termed the least significant bit (LSB) because it equals the value of the smallest coded bit, but it is an illogical choice for mathematical treatments.
The error due to the quantizing process, quantizing error, can be thought of as an unwanted signal that the quantizing process adds to the perfect original. An example best illustrates this principle. Let the sampled input value be some arbitrarily chosen value, say, 2 V, and let this be a 3 bit system with a 5 V reference. The 3 bits divide the reference into 8 equal parts (23 = 8) of 0.625 V each. For the 2 V input example, the converter must choose between either 1.875 V or 2.5 V, and because 2 V is closer to 1.875 than 2.5, then it is the best fit. This results in a quantizing error of -0.125 V; the quantized answer is too small by 0.125 V.
These alternating unwanted signals added by quantizing form a quantized error waveform that is a kind of additive broadband noise that is generally uncorrelated with the signal and is called quantizing noise. Because the quantizing error is essentially random (uncorrelated with the input), it can be thought of like white noise. This is not quite the same thing as thermal noise, but it is similar. The energy of this added noise is equally spread over the band from DC to half the sampling rate. This is a most important point, and I will returned to it when I discuss delta-sigma converters and their use of extreme oversampling.
Successive approximation Successive approximation is one of the earliest and most successful A/D conversion techniques. Therefore, it is no surprise it became the initial A/D workhorse of the digital audio revolution. Successive approximation paved the way for the delta-sigma techniques to follow.
The heart of an A/D circuit is a comparator, an electronic block whose output is determined by comparing the values of its two inputs. If the positive input is larger than the negative input, then the output swings positive, and if the negative input exceeds the positive input, the output swings negative. Therefore, if a reference voltage is connected to one input and an unknown input signal is applied to the other input, you now have a device that can compare and tell you which is larger. Thus, a comparator gives you a high output (a one) when the input signal exceeds the reference or a low output (a zero) when it does not. A comparator is the key ingredient in the successive approximation technique as shown in Figure 3.
In successive approximation, the circuit evaluates each sample and creates a digital word representing the closest binary value. The process takes the same number of steps as bits available; a 16 bit system requires 16 steps for each sample. The analog sample is successively compared to determine the digital code beginning with the determination of the biggest (most significant) bit of the code.
The description given in Daniel Sheingold's Analog-Digital Conversion Handbook offers the best analogy as to how successive approximation works. The process is analogous to a gold miner's assay scale or a chemical balance, which uses a set of graduated weights, each one half the value of the preceding one-1 g, 0.5 g, 0.25 g and so on. You compare the unknown sample against these known values by first placing the heaviest weight on the scale. If it tips the scales, you remove it; if it does not, you leave it and go to the next smaller value. If that value tips the scale you remove it, if it does not you leave it and go to the next lower value, and so on until you reach the smallest weight available. The sum of all the weights on the scale represents the closest value you can resolve. In the digital terms, we can analyze this example by saying that a zero was assigned to each weight removed and a one to each weight remaining. in essence creating a digital word equivalent to the unknown sample with the number of bits equaling the number of weights. The quantizing error will be no more than half the quantizing step. Again, the successive approximation technique must repeat this cycle for each sample. This remains a time-consuming process and is still limited to relatively slow sampling rates, but it did get us into the 16 bit, 44.1 kHz digital audio world.
PCM and PWM The successive approximation method of data conversion is an example of pulse code modulation (PCM). Three elements are required: sampling, quantizing and encoding into a fixed-length digital word. The reverse process reconstructs the analog signal from the PCM code. The output of a PCM system is a series of digital words where the word size is determined by the available bits. For example, the output can be a series of 8 bit, 16 bit or 20 bit words with each word representing the value of one sample.
Pulse width modulation (PWM) is simpler and quite different from PCM. (See Figure 4). In a typical PWM system, the analog input signal is applied to a comparator whose reference voltage is a triangle-shaped waveform whose repetition rate is the sampling frequency. This simple block forms what is called an analog modulator. A simple way to understand the modulation process is to view the output with the input held steady at 0 V. The output forms a 50% duty cycle (50% high, 50% low) square wave. As long as there is no input, the output is a steady square wave. As soon as the input is non-zero, the output becomes a PWM waveform. That is, when the non-zero input is compared against the triangular reference voltage, it varies the length of time, and the output is either high or low.
For example, say that there was a steady DC value applied to the input. For all samples when the value of the triangle is less than the input value, the output stays low, and for all samples when it is greater than the input value, it changes state and remains high. Therefore, if the triangle starts lower than the input value, the output goes high; at the next sample period, the triangle has increased in value but is still less than the input, so the output remains high; this continues until the triangle reaches its apex and starts down again;. Eventually, the triangle voltage drops below the input value, and the output drops low and stays there until the reference exceeds the input again. The resulting PWM output, when averaged over time, gives the exact input voltage. If the output spends exactly 50% of the time with an output of 5 V and 50% of the time at 0 V, then the average output would be exactly 2.5 V.
This is also an FM, or frequency-modulated system-the varying pulse-width translates into a varying frequency, and it is the core principle of most Class-D switching power amps. The analog input is converted into a variable pulse-width stream used to turn the output switching transistors on. The analog output voltage is simply the average of the on-times of the positive and negative outputs. Another way to look at this is that this simple device codes a single bit of information-a comparator is a 1 bit A/D converter. PWM i s an example of a 1 bit A/D encoding system, and a 1 bit A/D encoder forms the heart of delta-sigma modulation.
Delta-sigma modulation After nearly 30 years, delta-sigma modulation (sometimes sigma-delta) has only recently emerged as the most successful audio A/D converter technology. It waited patiently for the semiconductor industry to develop the technologies necessary to integrate analog and digital circuitry on the same chip. Today's high-speed mixed-signal IC processing allows the total integration of the circuit elements necessary to create delta-sigma data converters of awesome magnitude.
How the name came about is interesting. Another way to look at the action of the comparator is that the 1 bit information tells the output voltage which direction to go based upon what the input signal is doing. It looks at the input and compares it against its last sample to see if this new sample is bigger or smaller than the last one, that is, the information transfer-bigger or smaller, increasing or decreasing. If it is bigger, than it tells the output to keep increasing, and if it is smaller, it tells the output to stop increasing and start decreasing. It reacts to the change. Mathematicians use D to stand for deviation or small incremental change, which is how this process came to be known as delta modulation. The sigma came about by the significant improvements made from summing or integrating the signal with the digital output before performing the delta modulation. Mathematicians use S to stand for summing. Essentially a delta-sigma converter digitizes the audio signal with a very low resolution (1 bit) A/D converter at a high sampling rate. It is the oversampling rate and subsequent digital processing that separates this from plain delta modulation.
Considering quantizing noise, it is possible to calculate the theoretical sine wave signal-to-noise (S/N) ratio (actually the signal-to-error ratio, but for our purposes it is close enough to combine) of an A/D converter system knowing only n, the number of bits. Some math will show that the value of the added quantizing noise relative to a maximum (full-scale) input equals 6.02n + 1.76 dB for a sine wave. A perfect 16 bit system will have a S/N ratio of 98.1 dB, while a 1 bit delta-modulator A/D converter, on the other hand, will have only 7.78 dB.
To get an intuitive feel for this, consider that because there is only 1 bit, the amount of quantization error possible is as much as half a bit. Because the converter must choose between the only two possibilities of maximum or minimum values, then the error can be as much as half of that. Further, because this quantization error shows up as added noise, then this reduces the S/N to something on the order of around 2:1 or 6 dB.
One attribute shines above all others for delta-sigma converters and makes them a superior audio converter-simplicity. The simplicity of 1 bit technology makes the conversion process fast, and a fast conversion allows use of extreme oversampling. Extreme over-sampling pushes the quantizing noise and aliasing artifacts way out to mega-wiggle land, where it is easily dealt with by digital filters (typically 64x over-sampling is used, resulting in a sampling frequency on the order of 3 MHz).
To understand how oversampling reduces audible quantization noise, we need to think in terms of noise power. From physics, you may remember that power is conserved-changed but never destroyed. Quantization noise power is similar. With oversampling, the quantization noise power is spread over a band that is as many times larger as is the rate of over-sampling. For 64x oversampling, the noise power is spread over a band that is 64x larger, reducing its power density in the audio band by 1/64[superscript]th. (See Figure 5.)
Noise shaping further reduces in-band noise. Oversampling pushes out the noise, but it does so uniformly; that is, the spectrum is still flat. Noise shaping changes that. Using clever complex algorithms and circuit tricks, noise shaping contours the noise so that it is reduced in the audible regions and increased in the inaudible regions. Conservation still holds; the total noise is the same, but the amount of noise present in the audio band is decreased while simultaneously increasing the noise out-of-band. Then, the digital filter eliminates it.
As shown in Figure 6, a delta-sigma modulator consists of three parts-an analog modulator, a digital filter and a decimation circuit. The analog modulator is the 1 bit converter discussed previously with the change of integrating the analog signal before performing the delta modulation. The integral of the analog signal is encoded rather than the change in the analog signal as is the case with traditional delta modulation. Oversampling and noise shaping pushes and contours all the bad stuff (aliasing and quantizing noise) so that the digital filter suppresses it. The decimation circuit (decimator) is the digital circuitry that generates the correct output word length of 16 bits, 20 bits or 24 bits and restores the desired output sample frequency. It is a digital sample-rate reduction filter and is sometimes termed downsam-pling because it returns the sample rate from its 64x rate to the normal CD rate of 44.1 kHz (or 48 kHz or even 96 kHz). The net result is greater resolution and dynamic range with increased S/N ratio and far less distortion compared to successive approximation techniques, all at lower costs.
One more note. Due to the oversam-pling and noise-shaping characteristics of delta-sigma A/D converters, certain measurements must use the appropriate bandwidth or inaccurate answers result. Specifications such as signal-to-noise, dynamic range and distortion are subject to misleading results if the wrong bandwidth is used. Because noise shaping purposely reduces audible noise by shifting the noise to inaudible higher frequencies, taking measurements over a bandwidth wider than 20 kHz results in answers that do not correlate with the listening experience. Therefore, it is important to set the correct measurement bandwidth to obtain meaningful data.
Dither Now that oversampling helped get rid of the bad noise, let us add dither. Dither (from a 12th century English term meaning to tremble) means to be in a state of indecisive agitation, or to be nervously undecided in acting or doing. Dither is one of life's many tradeoffs. Here the tradeoff is between noise and resolution. We can introduce dither (a form of noise) and increase the ability to resolve small values, values, in fact, smaller than our smallest bit. Perhaps you can begin to grasp the concept by making an analogy between dither and anti-lock brakes. With regular brakes, if you just stomp on them, you probably create an unsafe skid situation for the car. Instead, if you rapidly tap the brakes, you control the stopping without skidding. We shall call this dithering the brakes. What you have done is introduce noise (tapping) to an otherwise rigidly binary (on or off) function. Therefore, by tapping on our analog signal, we can improve our ability to resolve it. By introducing noise, the converter rapidly switches between two quantization levels rather than picking one or the other when neither is correct. Sonically, this comes out as noise rather than a discrete level with error. Subjectively, what would have been perceived as distortion is now heard as noise.
The problem dither helps to solve is that of quantization error caused by the data converter being forced to choose one of two exact levels for each bit it resolves. It cannot choose between levels; it must pick one or the other. With 16 bit systems, the digitized waveform for high-frequency, low-signal levels looks very much like a steep staircase with few steps. An examination of the spectral analysis of this waveform reveals many nasty sounding distortion products. We can improve this result either by adding more bits or by adding dither. Prior to 1997, adding more bits for better resolution was straightforward but expensive, thereby making dither an inexpensive compromise.
The dither noise is added to the low-level signal before conversion. The mixed noise causes the small signal to jump around, which causes the converter to switch rapidly between levels rather than being forced to choose between two fixed values. Now, the digitized waveform still looks like a steep staircase, but each step, instead of being smooth, has many narrow strips, like vertical Venetian blinds. The spectral analysis of this waveform shows almost no distortion products at all, albeit with an increase in the noise content. The dither has caused the distortion products to be pushed out beyond audibility, and replaced with an increase in wideband noise. (See Figure 7.)
Life after 16 Current digital recording standards allow for only 16 bits, yet it is safe to say that for all practical purposes, 16 bit technology is history. Everyone who can afford the upgrade is using 20 bit and 24 bit data converters and (temporarily, until DVD-Audio becomes the new standard) dithering (as opposed to truncating) down to 16 bits. In going to 20 bits, you gain 24 dB more dynamic range, 24 dB less residual noise, 16:1 reduction in quantization distortion, and improved jitter (timing stability) performance. If it is 24 bits, add another 24 dB to each of the above and make it a 256:1 reduction in quantizing error with essentially zero jitter.
With today's technology, analog-to-digital-to-analog conversion is the element defining the sound of a piece of equipment, and if it is not done perfectly, then everything that follows is compromised. With 20 bit, high-resolution conversion, low signal-level detail is preserved. The improvement in fine detail shows up most noticeably by reducing the quantization errors of low-level signals. Under certain conditions, these course data steps can create audio passband harmonics not related to the input signal. Audibility of this quantizing noise is much higher than in normal analog distortion and is known as granulation noise, but 20 bits virtually eliminates granulation noise. Commonly heard examples are musical fades, like reverb tails and cymbal decay. With only 16 bits to work with, they do not so much fade as collapse in noisy chunks.
Where it matters most is in measuring small things. It does not make much difference when measuring big things. If your ruler measures in whole inch increments and you are measuring something 10 feet (3 m) long, the most you can be off is by half an inch. Not a big deal, but if what you are measuring is less than an inch, you have an accuracy problem. This is exactly the problem in digitizing small audio signals. Graduating our audio digital ruler finer and finer means we can accurately resolve smaller and smaller signal levels, allowing us to capture the musical details. Getting the exact right answer does result in better reproduction of music.
Candy, James C. and Gabor c. Temes, eds. Oversampling Delta-Sigma Data Converters: Theory, Design, and Simulation (IEEE Press ISBN 0-87942-285-8, NY, 1992).
"Delta Sigma A/D Conversion Technique Overview," Application Note AN 10 (Crystal Semiconductor Corporation, TX, 1989).
Pohlmann, Ken C. Advanced Digital Audio (Sams ISBN 0-672-22768-1, IN, 1991).
Pohlmann, Ken C. Principles of Digital Audio, 3rd ed. (McGraw Hill ISBN 0-07-050469-5, NY, 1995).
Sheingold, Daniel H., ed. Analog-Digital Conversion Handbook, 3rd ed. (Prentice-Hall ISBN 0-13-032848-0, NJ, 1986).
"Sigma-Delta ADCs and DACs," pp. 20-1 to 20-18, 1993 Applications Reference Manual (Analog Devices, MA, 1993).
The American Heritage Dictionary of the English Language, 3rd ed. (Houghton Miffin ISBN 0-395-44895-6, Boston, 1992).
Watkinson, John. The Art of Digital Audio, 2nd ed. (Focal Press, ISBN 0-240-51320-7, Oxford, England, 1994).
Acceptable Use Policy blog comments powered by Disqus














