The Modern Audio Analyzer
Understanding dual-channel FFT audio analyzers
Figure 1: The limitation of a single time window on the FFT analyzer. Frequency resolution is linear no matter what the time record length is, or how it is displayed. Click here to see a larger image.
It wasn’t long ago that the tuning of sound systems was seen almost exclusively in artistic terms. Engineers would play a tape or talk into a mic and walk around and make the important decisions about how the system was aimed, relative levels, crossover points, timing, equalization, and more. Systems were admittedly not as complex as now. Or were they?
It’s true that most systems had fewer channels, fewer subsystems, and a lot less signal processing than we see now, but there were other ways in which systems were much more complex. The signal path nowadays is often a one-stop flight from mix console to full-range powered speakers with a layover at the signal processor. In the old days, it was like going go coast to coast on Southwest. Console to equalizer to delay line to crossover to limiter to amplifier, and finally to the speaker. And that’s just one path to one high driver! It is the simplification and standardization of modern signal processing and speakers that make it easy for artistic sound designers to immerse themselves in soundscapes of ever-greater complexity. If the components of these super complex systems could not be optimized to work together in a predictable way, few designers would risk their shows on wayward subsystems that have creative ideas of their own. The modern audio analyzer is the tool of choice for reversing entropy in the sound system.
Analyzers have been out there for a long time, of course, but in the past they were crude tools, capable of only limited duties. The sound systems themselves were crude as well, and in many ways they have grown up together. The sound system moved out of the garage and the analyzer out of the laboratory in the 1980s and they have been on the road together ever since. An analyzer used to weigh 100lbs. and cost more than a mix console. Now they weigh nothing and cost less than a microphone. Everybody has one, but there are lots of us that don’t really know what’s going on inside of these tools and what the data really means. Our chances of making good tuning decisions increase greatly if we understand the perspective of the modern audio analyzer. Here is a look under the hood. Though we won’t dig deep enough to ready you to design your own analyzer, the goal is to make you a better driver.
Meet the Fourier Family
The modern analyzer is a digital audio device commonly known in our trade as the “Fast-Fourier-Transform” (FFT) analyzer. First let’s reverse engineer the analyzer’s name. “Transform” refers to the conversion of a sampled waveform from a timing sequence of amplitude values to a series of amplitude and phase values over frequency. A continuous stream of music goes in and is separated into slices by frequency. The simple expression for this is a transform from the time domain to the frequency domain.
“Fourier” refers to the Fourier Theorem. It is the math behind the transform, named after its originator, Jean Baptiste Joseph Fourier, in the 1800s. The Fourier Theorem goes both directions: time domain to frequency domain and back again. No waveforms were injured in the making of this transform.
The “Fast” part is where things get trickier. The Fourier Theorem is an equation that requires infinite iterations. We have a show at 8 p.m. so we can’t wait that long. The “fast” in this case means we reduce the computation to less than infinity. While this means there are some errors creeping in, they will be small enough to suit our application. It also means we can program the algorithm into our computer. Computers hate infinity!
The Modern Audio Analyzer
Understanding dual-channel FFT audio analyzers
Figure 2: The multi-time windowed FFT. Resolution and display stay constant over the full range of frequencies. Click here to see a larger image.
The FFT analyzer can give us amplitude and phase over frequency in more detail than we can ever use. Great. Unfortunately, in its raw form the data is marginally usable in the practical world of sound system optimization. We had to build in a lot of special features to adapt this tool for our trade. Here are a few examples of the limitations of the basic single-channel FFT analyzer: It creates a linear frequency response, a mismatch to our log hearing system. It requires a known (and precisely constructed) input signal; explain to the musicians what notes they can play, and when. The phase response is like a clock with just a second hand. Am I late or early? This is just a start.
How do we get around this? We throw money at it. We stack up lots of analyzers and run them in series and parallel. A typical modern system runs about 16 FFT analyzers together to make the composite frequency response display you see.
The basic transform flow goes from the sampled waveform to time record to real and imaginary numbers to magnitude (the more mathematical word for “amplitude”) and phase over frequency. First item of business: Forget about real and imaginary numbers. That is for math geeks only. That leaves us with two transitions: input to time record and then onto the frequency domain. The sampled input waveform runs by the same rules as the digital audio we deal with every day (i.e., we sample at >2X the highest frequency we want to use). We will use 50kHz as an example to get a usable 22kHz bandwidth.
The 50kHz sample frequency yields time increments of 0.02 milliseconds. Let’s start the counting game. If we take 250 samples, our time buffer will contain 5 milliseconds (250 x 0.02 milliseconds) of data. Notice that 5 milliseconds is the period for the frequency 200Hz, so exactly one cycle of 200Hz will fit into our time buffer. The time/frequency domain transform begins an interrogation of the data in the buffer. Let’s listen in to the conversation. “Did anyone here complete exactly one cycle? If so, (a) how big are you? and (b) what part of the phase cycle were you in when you entered (and left) the time buffer?” The former tells us the magnitude at 200Hz, and latter tells us the phase. The next step is to ask if anybody completed two cycles, which gives us the status report on 400Hz. The process continues for as long as we like (until we reach the highest frequency allowed by the sample rate). In short, a 5-millisecond sample has us slicing the spectrum into 200Hz increments. These are evenly spaced linear separations but variable separations on the log scale (1 octave between 200Hz and 400Hz, 1/2 octave between 400Hz and 600Hz, 1/3 octave between 600Hz and 800Hz, and so on).
Figure 3: Comparison of analyzer platform features and capabilities.Click here to see a larger image.
Infinity and Beyond
The first challenge we see is linear data in a log world (the one in our heads). The second challenge is more subtle. What if the there was data in the waveform from frequencies that are not integer multiples of 200Hz? How do we count 300Hz? Do we spread its amplitude across the 200Hz and 400Hz bins? The phase values at the beginning and end of the time record don’t match. Which is right? What can we do? If we capture a longer waveform, say 10 milliseconds, we will be counting in 100Hz increments. Now we can get 300Hz, but what about 350Hz? The problem divides down but never goes away. Remember how I told you that the full Fourier transform needed to be computed for infinity? You are seeing it right here.
The price for not measuring to infinity is making an assumption that the finite sample we have is representative of everything between the Big Bang and the end of time. Let’s look at this challenge with an analogy. If our time sample were the complete song “Stairway to Heaven,” we would be operating under the assumption that this has been played continuously back to back forever. Admittedly it feels that way. But the loop must have the whole song, so that the quiet whiny ending meets the quiet whiny beginning. If our time record cut off while Jimmy Page was wailing with his amp set to 11, the restart would be abrupt and we would know the song does not remain the same. Yes, that is a long analogy, but I am saving you from integral mathematics.
The Modern Audio Analyzer
Understanding dual-channel FFT audio analyzers
This is where the middle section, the time record comes into play. The time record is a modified version of the raw sampled waveform, making it ready for the transform. We cannot place limits on the program material coming through our system. We have to be able to measure any signal and we don’t have all day. We need our time record to give us a reasonably valid representation for all frequencies within the bandwidth. We will give up perfection for the few frequencies that are perfect multiples to gain equality for all.
The remedy is a time “window,” a mathematical simulation of a gain control between the buffered waveform and our second stage: the time record. Audio engineers can visualize the window function as a time triggered gate. Let’s resume with our example 5-millisecond time record. At the start (0 milliseconds) it is closed and then begins to open at some point after. Eventually but no later than the mid-point in time (2.5 milliseconds) the window is fully open. The second half mirrors the first and the window closes at the 5-millisecond end point. The final product is a modified version of the original waveform with amplitude weighting that favors the middle over the beginning and ending.
How does this help? First, we are assured that our Fourier transform assumption that we can place the time records end to end has been satisfied. Second, we can see there are certain to be costs to modifying the waveform. The costs vary by program material. Sine waves have distortion added to them. Transients may be ignored if they arrive at the beginning or end among other things.
There are different versions of the window functions and each has its own favorite source material. (None are fans of Zeppelin though.) They are all the same at the beginning (closed) and middle (fully open) but differ in the shape of the rise and fall between. The Hann and Blackman-Harris windows are often used for random sources (noise or music), while the Flattop window is favored for sine wave testing. How big are these errors? Most of them are 40dB-80dB down from a full-scale signal. Perfect? No. End of the world? No. (Our time record closed first.)
The Frequency Response
Now we have a full set of linear frequency response data, with amplitude and phase. The amplitude part is straightforward: bigger is bigger. The phase part is not as straightforward because the phase is just a position on a circle relative to … something. A steady sine wave will have a single-phase value. But if the input signal is random noise it will have random phase. Simply put, we can’t do anything with a single channel of phase data unless we have a time reference to compare it to, which is to say relative phase. As we will see, that is one of the key benefits to the dual-channel version of the FFT analyzer.
The next step is the move from linear to log. One option is to just stretch and squeeze the frequency response display to make the linear data fit in the log display. Visualize a Slinky. This is a video solution for an audio problem. The problem is not display; it’s resolution. We have very low resolution in the low-frequency range and very high resolution in the high end. Our 5-millisecond example has frequency data points every 200Hz. That’s 0Hz, 200Hz, 400Hz … 19,600, 19,800, 20,000. Notice that we skipped over the subwoofers? On the other hand we have 50 slices in the octave between 10kHz and 20kHz.
If we take a longer time record, we get more resolution in the low end (a good thing) and high end (too much of a good thing). This is yet another battle with infinity. Instead, let’s try a compromise approach: We can take both short and long time windows. Take the best and leave the rest.
The scheme is a sequential doubling of time records. If we start with 5 milliseconds, then we also take 10 milliseconds, 20 milliseconds, 40 milliseconds, and onward. Each time we double the time span we will halve the sample frequency. This results in the same number of data points and octave lower. The key is that we use only the top octave in each one, (which in our example is regarded as a 48 points/octave) format. The chosen octave derived from each band is spliced together to make a composite full range response with the same resolution per octave. Inside each octave the data is linear data stretched to a log display. But since each linear span is just one octave the stretching is barely detectible.
The Next Step: Dual Channel
We have gone to a lot of trouble to create a high-resolution frequency response. How is this anything more than a high-definition version of the old 1/3-octave realtime analyzer? It’s not significantly different until you add the second channel, and then it is a whole new ball game. Any single-channel analyzer can give you the basic amplitude response, but it takes two channels to tell time (phase). With two channels we turn our analyzer into a differential input: a device that sees the difference between its two channels. When placed across the input and output of any single device (or series of devices) the analyzer will give us the transfer function (i.e., the difference between the two points).
The RTA does not know processor from speaker from room, our show from a forklift, or whether the sound is arriving directly or has made six trips to Timbuktu and back. The dual-channel FFT analyzer even has a function (coherence) that tells you whether or not the analyzer knows the answer or is guessing. Knowing all of these things will not assure that you will make good decisions, but it surely ups the probability. The RTA has a simplistic worldview. It only reads amplitude over frequency, so an RTA user is inclined to believe that the solutions must be in amplitude adjustment (i.e., equalization). The modern dual-channel FFT informs us of so much more (phase, signal to noise, echo structure, and more) that we are guided to see the sound system challenges in their full complexity. This makes for better diagnosis, which makes for better treatment. Solution options such as speaker aiming, splay angle, delay, crossover alignment, and level tapering are all aided by the information the FFT provides.
We can tell time, detect distortion, compression, and even changes in sound speed all while the band is playing and the audience is dancing. Even better, the multiple time windowed frequency response closely resembles how our ears detect the tonal shape of transmitted sound waves. This makes for a superior tool for guiding the equalization process.
In my next column we will engage the second channel and show how all of this is put to use for system optimization.