Reaching the Audience
Oct 1, 1999 12:00 PM, Peter Mapp
Proven techniques for evaluating, maintaining and optimizing systems designfor speech intelligibility.
Although it might be the fundamental aim of every sound system to beintelligible, in practice, this desirable albeit obvious goal is not alwaysachieved. There can be many reasons for this. Sometimes the conditions orcircumstances are such as to render adequate intelligibility impossible.Mostly, however, the reasons may be due more to poor system design,installation or use. The aim of this article is to provide an introductionof how to optimize the intelligibility and clarity of a system, either atthe design stage or when it is up and running.
Over the years, I have encountered many causes of poor intelligibility. Aninadequate budget is always a good one to hide behind, often belatedlybrought sharply into focus when the building is on fire and you need to betold which way to go. Poor design, often caused by ignorance of acousticsor necessary operational requirements, can also include inadequacies inloudspeaker coverage, S/N ratio, direct-to-reverberant ratio, control oflate reflections and echoes, zone separation (interfering crosstalk), amppower, linear frequency response, and the appropriate use of loudspeakers.
Poor operation and set up are also often to blame. It must not be forgottenthat no matter how good the system, the human either operating it or makingthe announcements can all too easily bring about a startling degradation inperformance. Common faults are being too loud or not loud enough, creatingdistortion (overloading of amps or input stages), using poor mic locationor techniques, enduring poor announcer or user articulation, deliveringspeech too quickly, and running messages with insufficient recording orbandwidth.
Poor maintenance, a prominent culprit, leads to intermittent faults anddrop outs, crackling connections, hum, and inoperative circuits andequipment because of a failure to repair faults or equipment damage.Bungled repairs, such as using Scotch tape in lieu of solder to make aconnection, may also be to blame. In this category we can also include theresident so-called audio expert who mistakenly reckons that he can make itsound so much better even without professional training.
Whereas the ways of overcoming some of the above problems and shortcomingsare obvious, others are not. A good understanding of the underlyingprinciples behind speech intelligibility should therefore assist withdetermining the cause of poor speech clarity and, hence, a possible cure orcourse of action to improve the situation. An understanding of sound systemequipment and installation is assumed.
Clarity and audibility
One of the most common mistakes people make when talking aboutintelligibility and describing problems (or specifying requirements) is toconfuse audibility with clarity. Just because a sound is audible does notmean to say that it is or will be intelligible. Audibility has to do withthe ability to hear the sound - either from a human physiological point ofview or from the aspect of S/N ratio. Clarity, on the other hand, describesthe ability to distinguish the structure of the sound (speech) itself and,for example, to be able to hear the consonants or vowels of a word withoutmasking or other impediment.
Starting from this point, we can begin to see how and where to optimizeperformance. Primary factors that affect sound system intelligibility arebandwidth and frequency response, loudness and S/N ratio,direct-to-reverberant ratio and reverberation time, talker annunciation andrate of delivery, and listener acuity. Secondary factors influencing soundsystem intelligibility include distortion (THD or total harmonicdistortion), system non-linearities and compression, system equalization,uniformity of coverage, echoes, reflections and reflection direction,direction of source, direction of interfering noise, and vocabulary andcontext. Taking each of these factors in turn and briefly describing theirimportance and effect should provide a good background from which many ofthe optimization techniques should become self evident.
Frequency response and bandwidth
Speech covers the frequency range from approximately 100 Hz to 8 kHz,although there are important harmonics affecting the overall sound qualityand timbre above this. Figure 1 shows the relative contribution to normalspeech frequencies in terms of their octave band levels. As the figureshows, the main speech energy is around 250 Hz to 500 Hz and falls offfairly rapidly at the higher frequencies. The lower frequencies correspondto the vowel sounds, and the weaker upper frequencies to the consonants.Perhaps unfortunately (for sound system engineering that is), thecontributions to intelligibility do not follow the same pattern; indeed,they are almost the reverse. Figure 2 presents this information. Here wecan immediately see that the upper frequencies that contribute tointelligibility with the octave band centered on 2 kHz, provideapproximately 30%, and the 4 kHz and 1 kHz bands 25% and 20%, respectively.The importance of achieving a well-extended high-frequency respo!nse can therefore immediately be seen. This is particularly the case in noisy environments, where localnoise can mask speech announcements. It is particularly important underthese conditions to ensure that an adequate S/N ratio is achieved withinthe important intelligibility bands of 2 kHz and 4 kHz.
Poor bandwidth these days is generally not a problem. Most sound systemequipment can cover the frequency ranges important to speechintelligibility. There are, however, exceptions - some cheap mics, somereentrant horn loudspeakers (again generally the cheaper ones), and somedigital message stores (again the cheap ones).
By far, the most common problems with regard to frequency response arecaused either by loudspeaker-boundary and loudspeaker-room interactions orloudspeaker-loudspeaker interactions and interference. Quite remarkableresponse aberrations can occur. An example of this is shown in Figure 3.The on-axis (anechoic) response of this loudspeaker is surprisinglyreasonable. Once installed and allowed to interact with its surroundingsand monitored off axis, however, a different story appears. Although someimprovement can be brought about by appropriate equalization, theunderlying problem is rather more fundamental than this, and a differentloudspeaker and mounting arrangement should be sought. The off-axisresponse of a loudspeaker is often a forgotten parameter. Loudspeakersexhibiting a well-controlled, smooth response without excessive attenuationwithin the nominal coverage angle should be used.
Electronic equalization is a powerful tool that can make a remarkabledifference to the clarity of a system, but it needs to be carried outcarefully and with a full realization of what is happening acoustically.Remember, response peaks can almost always be attenuated (assuming that afilter of appropriate bandwidth and center frequency is available), butsharp response notches are generally caused by acoustic interaction betweensources or between a source and a boundary (or boundaries), and they cannotbe fixed by simple frequency domain EQs.
Adding bass to the sound system might make it sound impressive, but it willdo nothing for its clarity and intelligibility. Indeed, in a reverberant oreven semi-reverberant space, too much bass will adversely affect clarity.Contrarily, many operators, DJs and announcers actually think it makestheir voice sound better. A useful tip is to go easy on the bass for speech(apply a roll off in reverberant spaces), and design the system so thatmusic signals are routed through a different equalization path. A flathigh-frequency response, perhaps surprisingly, may also not be optimal.Such a response can sound far too bright and harsh. This heavily depends onthe acoustic environment, type of loudspeaker and the ratio of direct soundto reverberant sound created within the space. So again, sometimes agradual high-frequency roll off is useful. In many distributed systems,this will be naturally occurring, and a gentle boost is often required.
Loudness and S/N ratio
Fairly obviously, the sound level must be adequate for the listeners to beable to hear it. If the signal level is too quiet, many people(particularly the elderly or those suffering even a relatively mild hearingloss) will either have to strain to listen or miss certain words, evenunder quiet conditions. I am always surprised at the levels required bylisteners. Although face-to-face speech is often around 65 dBA, atconferences 70 dBA to 75 dBA is often demanded even under quiet ambientconditions.
In noisy situations, it is imperative that a good S/N ratio is achieved. Tothat end, various rules of thumb have been developed. As a minimum, 6 dBAis required, and at least 10 dBA should be the goal. Above 15 dBA S/N,although improvement still occurs, the law of diminishing returns sets in.Furthermore, under high-noise conditions, such S/N ratios might requireexcessive sound levels to be produced. In these circumstances, a fullspectrum analysis of the noise should be carried out and compared to thespeech spectrum. This can be highly informative and can show where mostbenefit can be obtained. From these measurements, the articulation indexcan be computed, and an objective measure of situation obtained. Figure 4shows a simple diagrammatic explanation of S/N ratio.
All too often, only the on-axis situation is designed for and the off-axisrequirements forgotten. A good off-axis frequency response and adequatecoverage is essential. Remember, the usual coverage angle for a loudspeakeris defined at the 6 dB down point, automatically, therefore, potentiallyreducing the off-axis zone S/N by 6 dB as compared to the on-axis response.To this must also be added any additional distance losses, but overlapcoverage from adjacent devices can be used in a positive way to helpcounter this.
An aspect of S/N ratio that is often forgotten is the noise climate at themic itself. In many cases, paging mics are located in noisy areas, and thespeech S/N ratio is degraded even before the announcement is broadcast.Directional mics can sometimes provide appropriate attenuation ofinterfering sounds, but this gain is often lost in reverberant spaces or bylocal reflections from the desk, ceiling or local surroundings. When themic has to be located in a particularly noisy environment, use agood-quality noise-canceling mic, and possibly provide a local noise refugein the form of an acoustic hood or enclosure to produce a quieter localzone. At least 20 dBA (preferably 25 dBA) S/N should be aimed for at themic end of things. Remember that it can only get worse (rubbish in =rubbish out). Nowadays, adaptive speech filtering techniques can also beemployed that help to separate the speech from the noise, but suchtechniques should only be adopted after good old-fashioned !acoustical engineering and simple filtering have been tried.
It is worth remembering that sound quality and intelligibility are not thesame thing. Often, a deliberately shaped system response with little bassand perhaps an accentuated high-frequency range can be clearer than a rulerflat distortionless system that would do credit to your home high-fidelitystereo system or home cinema system.
Reverberation time and direct-to-reverberant ratio
Just as noise can mask speech signals, so too can excessive reverberation.This is defined by the direct-to-reverberant ratio or, more accurately, bythe direct + early reflected sound and the late reflected sound +reverberant sound ratio. Unlike the simpler S/N ratio, the way in which theD/R ratio affects speech intelligibility is not constant. It depends uponthe room reverberation time. Whereas a positive value is desirable, it canbe allowed to become quite negative under appropriate conditions.Calculation is fairly complex and outside the scope of this article. Shownin Table 1, some rules of thumb, however, may be given in terms of thereverberation time itself.
When designing or setting up systems for reverberant and reflectiveenvironments, the main rule to follow is, "Aim the loudspeakers at thelisteners and keep as much sound as possible off the walls and ceiling."Automatically, this partially maximizes the D/R ratio, although inpractice, it may not be quite so simple. Figure 5 shows the effect ofdirectivity on D/R ratio and potential intelligibility. The measurementswere made in a reverberant space (2.6 seconds RT). An omnidirectional and ahighly directional loudspeaker were set up at the same position andmeasured at a fixed location. Clearly, the directional loudspeaker has ahigher D/R ratio (+8.7 dB) and produces a lower reverberant field ascompared to the omnidirectional source (D/R = -4 dB). The corresponding %ALcons measurements were 13% for the omni and 4% for the directionalloudspeaker. (Rasti = 0.48 and 0.70, respectively).
Talker articulation and rate of delivery
Many systems operating under difficult conditions - medium and highreverberation times - could offer a better perceived performance if theannouncer or mic user were to speak with better diction and at a slowerrate. Announcer training is either soon forgotten or not given at all.Often, totally unsuitable announcers or untrained personnel are employed,and message clarity suffers accordingly. Prerecorded messages loaded intogood quality (minimum 8 kHz and preferably 12 kHz bandwidth) digital storesmay overcome certain aspects of this problem. For highly reverberantspaces, the speech rate needs to be slowed down as compared to normalspeech. This can often be difficult to accomplish during normal use, butcarefully rehearsed, slower recordings can be effective. It is important torealize that sound systems do have limitations no matter how carefullydesigned. Ultimately, they need to operate in an acoustically acceptableenvironment. Feeding back either an electronically d!elayed version of the announcement or an acoustically picked up or reverberated one to the announcer'sheadphones can be an effective way of slowing down his or her rate ofspeech. Technology may soon be able to do this automatically, although notwhen the talker is in the same acoustic space.
Uniformity of coverage
When working in especially difficult spaces - noisy, reverberant or if youare unlucky, both - it is essential to provide uniform direct soundcoverage. Whereas under less onerous conditions, a 6 dB variation (i.e.+/-3 dB) may well be acceptable, such a variation in a reverberant spacecan lead to intelligibility variations of 20% to 40%. A 40% degradation ofclarity under such conditions is usually unacceptable. Again, the off-axisperformance of the selected loudspeakers becomes of critical importance.Where the listeners are free to move around, as in a shopping mall, then itmay be possible to relax the variation in intelligibility within the space.With a seated audience in an enclosed stadium or arena or cathedral,however, no such luxury can exist if all are to adequately hear andunderstand. A permissible variation in direct sound coverage of 3 dB orless should therefore be the target, particularly over the range 1 kHz to 5kHz. Admittedly, this is a tight requiremen!t, and it costs.
Echoes and reflections
In large reflective or reverberant (or even semi reverberant) spaces, latereflections can significantly degrade intelligibility, even though they arenot necessarily perceived as discrete events or echoes themselves. By latereflections we mean those arriving after approximately 50 ms to 60 ms.Again, the simple rule to adopt is, "Aim the loudspeakers at the listenersand keep as much sound as possible off the walls and ceiling or roof." Inparticular, focused reflections from concave surfaces or corners must beavoided. Simple straight-line ray drawing analysis or in-room experimentswith a laser pointer or narrow beam, high-intensity torch can oftenpinpoint the problem. Alternatively, a TEF or MLS analyzer and adirectional mic and loudspeaker can both accurately and rapidly track downthe cause.
Another cause of late arriving reflections is the use of too widely spacedloudspeakers. For example, in outdoor systems where the spacing exceedsapproximately 50 feet (15 m) or in systems where a local in-fillloudspeaker is used to provide coverage remotely to the main cluster orprimary source. Examples would be under or over a balcony in a theater orconcert hall or an outdoor remote fill. In the first (outdoor) case,assuming a distributed system, the spacing between loudspeakers should bereduced or a back-to-back technique employed. In the remote in-fillsituation, the arrival times should be brought back into synchronization bymeans of a delay line, an example of which would be delaying the signal tothe local in-fill loudspeaker until the sound from the primary soundarrives. See Figure 6.
Solving the problem
Effectively optimizing speech intelligibility means using a variety oftechniques. Aim the loudspeakers at the listeners, and keep as much soundas possible off the walls and ceiling and roof. Provide a direct line ofsight between loudspeaker and listener. Ensure an adequate bandwidthextending from at least 250 Hz to 6 kHz, preferably 8 kHZ 12 kHz. Avoidfrequency response anomalies; roll off the bass and ensure adequate but notexcessive high-frequency transmission. Try to avoid mounting loudspeakersin corners unless local boundary interactions can be effectively overcome.Minimize the distance between the loudspeaker and listener. Ensure a speechS/N ratio of at least 6 dBA, preferably more than 10 dBA. Ensure the micuser is adequately trained and understands the need to speak clearly andslowly in reverberant environments. Provide quiet area or refuge for theannouncement mic or use an effective close talking, noise-canceling micthat maintains a good frequency response!. Avoid long path delays (>50 ms); use electronic delays and inter-loudspeaker spacing of less than 45 feet(13.8 m). Use automatic noise level sensing and gain adjustment to optimizeS/N ratios under varying noise level conditions. Use directionalloudspeakers in reverberant spaces to optimize D/R ratios (modelsexhibiting a flat or smoothly controlled sound power response). Minimizedirect field coverage variations; variations of as little as 3 dB can bedetrimental. Consider improvements to the acoustic environment, and do notconsider the sound system in isolation. Lastly, under difficult conditions,use simple vocabulary and message formats. Keeping these techniques in mindwill go a long way toward ensuring that the listener in a given venue fullycomprehends the talker's words.