Sound ImageThe costs and benefits 2/11/2013 3:34 AM Eastern
Feb 11, 2013 8:34 AM, By Bob McCarthy
The costs and benefits
In an ideal world, sound systems would be invisible and inaudible. It is easy for us to comprehend the invisible part since we are constantly told by architects and set designers that they can’t stand the sight of a stationary rectangular box. Amazingly, lights moving around changing color and brightness, spilling light out of their sides and backs do not bother these folks. But a black box, God forbid one with a tiny LED on it, is an abomination. Why the prejudice against seeing speakers? This ties in to the secondary desire for the speakers to be inaudible, and by that I mean that we strive to create the illusion that the sound is magically coming directly from the stage performers rather than the rectangular boxes. Visible boxes break the magician’s illusion, resulting in a strong desire to hide them. The troubling part of all of this is that hiding the speakers visually can actually make hiding them audibly much more challenging.
This is one of the tradeoffs in the game of sound image control. There are more, most notably in the categories of intelligibility, tonal modification, and uniformity. Maintaining a realistic sound image is a balancing act between relative level, time, distance, and angle. The first installment of this two-part article (part two will appear in a later issue) will explore how we perceive sound image and how we can control its placement with multiple speakers. The second part will cover examples of image placement and control in typical sound systems.
Sound Image Perception
Our sound image experience is comprised of two primary aspects and a variety of secondary ones. The dominant features are source direction and range, which give us the source location relative to our ears. The source angular relationship (its bearing) is subdivided into vertical and horizontal planes, which are decoded separately by the ear-brain system. These will be described momentarily. Our range perception is more complex, relying on a memory map that compares what we are hearing to our expectations regarding the particular sound source material. Our expectations are influenced highly by a secondary sense: sight, which gives us a framework with which to normalize what we are hearing. If we see a violin across the room, we compare what we hear to our memory of a distant violin sound, rather than what we experience when we are playing it ourselves (which, in my case, would sound like a nearby cat being tortured).
This is not at all to say that a blind person lacks range-finder capability. They will have mapped the range clues much more finely than sighted folks since they lack the secondary sense backup that the eyes provide. The secondary range clues include the sound level, frequency response, and direct/reverberant ratio. Seeing the source distance and the shape and materials of a room give context to the range expectations. We adjust our sonic expectations when we see that we are in a large reflective environment. In such a context, it can be difficult to carry on a conversation even at a fairly close distance. “Objects may be closer than they sonically appear” would be a fair warning in a highly reverberant environment. By contrast, if we are blindfolded in an anechoic chamber, we will find it much harder to determine the range. Adjustments of level and frequency response can, in fact, alter our range perception without moving the speaker.
Let’s set up an experiment to illustrate your sound image detection system in action. You are blindfolded in a room with a continuously moving sound source. How accurately will you be able to track the moving source’s bearing and range? The easiest aspect to localize is the horizontal position. This is because we have a two-channel detection system: our binaural hearing. The source location is double-checked by a pair of two-channel comparisons between the arrivals at our ears: relative time and relative level. As the sound source moves off of the horizontal center, it arrives first and louder at one ear. These two findings confirm each other to provide the localization clue.
The vertical location is found by each ear individually using a memory mapped signature. This is unique for each ear and for each person (and animal) because it is derived from memorizing the comb-filtered frequency response created by the reflections of our outer ear as the sound enters the ear canal. We have never heard sound that was not reflected off of this structure and therefore we have normalized our hearing to this response. Each vertical orientation of the sound source creates a slightly different set of reflections into the canal. These microscopic differences are recognized by the ear and linked to the memory of the vertical position of sound sources previously localized in our life.
Feb 11, 2013 8:34 AM, By Bob McCarthy
The costs and benefits
As I said before, vertical localization is two-individual analyzers, rather than a dual-channel comparator (as in the horizontal plane). The distinction is important since this means that relative time is not the driving force. This means that we will have a harder time distinguishing direct sound (which arrives first) from reflected sound in the vertical plane than we do in the horizontal plane. The comb-filter signature detection of the vertical plane becomes more challenging when the additional combing created by the summation of direct sound and reflections is added in. This is further complicated by the fact that the reflections are arriving from different vertical angles as well, so their vertical plane identifying signatures conflict with that of the direct sound. In short, horizontal localization holds up better in a reflective environment while the vertical localization becomes harder for us to pinpoint a source direction.
So far we have discerned the bearing of the sound source: its horizontal and vertical angle relative to us. Next is range. Is the source close or far? What are the clues we use to discern sound source range? We are not bats, so we can’t ping a sonar pulse and time the return.
Two of the factors in ranging are direct/reverberant ratio and frequency response. Level plays a part if the range is changing (and the level constant). But if the source is not moving, we cannot make conclusive range estimates based on level alone. One can easily see that if we brought a source closer by half the distance and reduced its level in half (-6dB), that the level of the direct sound would stay the same, negating level alone as a conclusive range finder.
The presence of reflections, however, provides a strong set of clues. For a given room, the proximity to the source will have a strong effect on the direct/reverberant ratio. If the source moves closer to us, we will detect an increase in the D/R ratio. This clues us in to the decreasing distance, even if the source has been adjusted to maintain a constant level at the listener. A stationary source in a given room will be more challenging to precisely range. A more reverberant room will lead to higher range estimates than a dry room because we associate the higher reverberation levels with larger spaces and longer distances.
The frequency response of the source also plays a part. There are two areas for this: the high end and the low end. The high frequency response is affected by the imperfect frequency response of our transmission medium: air. The extreme high frequency range is the most lossy over distance, the degree to which depends on the weather—mostly the humidity. In any case, the longer the transmission distance, the more lossy our top end gets. Note that this affects the direct sound the same whether you are in an anechoic chamber or an echo chamber. The losses, however, continue even after the sound has been reflected; therefore the later arrivals will have progressively more HF roll-off. We will also lose HF response if our location is off axis of the source (assuming a directional source). Our blindfolded listener in an anechoic chamber would likely enlarge their range estimate if the high frequency was filtered down since this mimics the air loss effects in the HF range that are factored into our range memory map. If we rotated the speaker in the anechoic chamber so that the HF response began to roll off, the brain can be fooled into extending the range estimate. Think this through from your own experience of walking along a row of seats and moving from on axis of a speaker to the off axis area along the aisle. You know you are at the same distance from the speaker and yet you feel farther away when you reach the off-axis area. If we added a side-fill speaker that restored the high frequency, to the outermost seats, we would effectively be restoring those seats to the same sonic range as the middle ones. Meanwhile, in the low end, we will see lots of constructive addition if the listening space has exotic architectural features such as a floor, walls, or ceiling. As distance between the source and listener rises, the frequency response tilts up in the low end (reflections) and down in the high end (air loss), resulting in a discernible range clue for our hearing system.
Imaging with Speakers
Now that we have established how sound image is perceived in our heads, it should be obvious how to get perfect imaging in your concert hall, theater, or house of worship: turn off the sound system. This would be fine if Canon’s old slogan, “Image is everything,” were true in our field of sound. Sound image is ever-present in the thoughts of sound engineers but way down the list for the folks who hear our work. For them, intelligibility, appropriate level, and natural tone weigh much more heavily on their experience. If folks don’t understand the words, you won’t be hearing about how great the imaging was.
So this puts things in perspective. Once we have satisfied the primary needs of intelligibility and expected level we can seek to enhance the experience with realistic imaging. Let’s clarify the sonic image goals. First, we want to create a sound image that is closely correlated in angular bearing to the live sound source. Second, (and here is the twist) we wish to create a sound image depth that is significantly closer than the live sound source, but not so much closer that it breaks the limits of plausibility and becomes a distraction. Therefore, we can measure our sonic image success (or failure) by the amount of angular and range offset between the perceived sound image and the intended source location. We seek to match the bearing and purposefully distort the range perception so that we bring the audience members sonically closer to the stage. We don’t tend to think of our sound system this way, but sonic image range reduction is the most indispensable part of the sonic image equation. If we are not going to decrease the sonic range to the listeners, then we should pack it up.
The first key factor will be the proportion of natural/amplified sound needed to get us up to the required level and intelligibility (the less amplified we need to add, the easier it will be to preserve imaging). The second will be our speaker locations. The closer they are to the natural sound source, the easier it will be to preserve the image. A speaker helmet would be the best location for imaging, but betrays the fact that image is not everything, and neck injuries and gain before feedback must be factored in to the equation. In practical terms, we strive to get speakers located near the planes where the natural sound originates. For a typical stage source or podium, we will try to get sources fairly low in the vertical plane. In the horizontal plane, we will be as central as possible. Excellent. We now know the best place for the speaker is center stage—an impossibly impractical location.
In practical terms, we have to look at each seat in the room as having a unique relationship to the natural sound source and the speakers that are reinforcing that source with added signal. Closer seats will tend to have large angular differences and small range differences, while distant seats will have the opposite. Fill systems and delay systems add another layer, since they have relationships with both the original source and the main speakers. While one speaker source (combined with the natural source) can minimize angular image distortion in one plane, there are relatively few practical main speaker locations that can help us in both planes for any single location, and even fewer that can do the job for a large part of any hall. Therefore, most sound reinforcement applications that place image preservation as a high priority will need to be comprised of multiple main systems and a variety of fill and delayed systems. Image control in the closer seats will be a mix of the original source and the speakers. As we get deeper into the room, the stage source will fade away and the game is played out between different speakers.
Every seat in the room has a bearing and range to the original sound source and the reinforcement speakers covering that area. If the horizontal bearings differ greatly, we will have to take steps to prevent a large-scale angular image distortion. As an example, we will consider a center source heard from a seat on the left that is covered by a reinforcement speaker on the left. Option one is to delay and reduce the reinforcement speaker level so that the source speaker arrives at around the same time and same (or better yet, less) level as the reinforcement speaker. This option can provide some inward image movement but may not be workable if high acoustic gain is required (thereby requiring the reinforcement speaker to be louder). Option two is to delay the reinforcement speaker even more so it arrives after the source. This can help “for a limited time only” (up to 7 milliseconds) but has the downside of creating comb filtering and reduced intelligibility when combined with the source. Option three would be to add another reinforcement speaker that is centrally located. This added arrival reinforces the central energy of the source and brings the horizontal image toward center. Bear in mind, however, that the center speaker is high above the stage. Therefore, the gains in horizontal image come with a cost of vertical image distortion. If we are close to the stage, we might get some help from the front fills, which are low and help ground the vertical. So now we have established an approach to image control, but it comes with some substantial costs. We need multiple speakers, signal processing channels, and the acoustical costs include the potential for intelligibility loss and tonal distortion that result from multiple speakers covering the same area. These are the tradeoffs we face for image control. In Part II, we will detail the means available to move the sound image and put this all to practical use.