Measuring Intelligibility
Apr 1, 2002 12:00 PM,
By Peter Mapp
ON THE SURFACE, IT MIGHT seem that a sound system is either intelligible or it is not. But, in fact, intelligibility is a continuous function that can be valued and graded. There are distinct grades of intelligibility and several ways of measuring this acoustic parameter. Furthermore, some types of speech can undergo more degradation than others and still remain intelligible. Indeed, it is by studying how it degrades by noise and reverberation that we can measure a physical acoustic parameter that relates to intelligibility.
GAUGING INTELLIGIBILITY
there are two forms of speech intelligibility measurement: human-based, or direct, testing and machine-based, or indirect, testing. With direct testing, expert listeners monitor specially constructed speech samples broadcast over the sound system, and they mark the words or sentences they hear on a prepared test sheet. In indirect testing, either speech or a special test signal is broadcast over the system, and the received signal is picked up by a microphone and analyzed to produce the signal and degradation components. A ratio of useful signal to detrimental signal is computed.
Direct tests seem more useful because they use real listeners in real situations. Take emergency announcements, for example. We require verification that safety announcements will be clearly heard; the systems must also be accurately tested. Unfortunately, it is generally neither practical nor economical to undertake extensive direct testing with a panel of expert listeners. Clearly, there is a need for a simple, machine-based test. But what, exactly, is that test?
ADVERSE FACTORS TO CLARITY
before discussing strategies for developing a useful machine-based test, it is useful to summarize the factors that can affect intelligibility. Any worthwhile measurement system must take the following primary factors into account: sound-system bandwidth, sound-system frequency response, loudness, signal-to-noise ratio, talker enunciation and rate of speech, listener acuity and direct-to-reverberant ratio.
The direct-to-reverberant ratio is based on five factors: room reverberation time (RT60); volume, size and shape of the space; number of loudspeakers operating; distance from the listener to the loudspeaker(s); and directivity of the loudspeaker(s). Strictly speaking, a more complex characteristic than the simple D/R ratio should be used. Better correlation with perceived intelligibility is obtained by using the ratio of the direct sound plus early-reflected energy to late reflected sound energy and reverberation. That may be termed C50 or C35 depending on the split time used to delineate between the useful and deleterious sound arrivals. (C50, widely used in auditorium acoustics, has yet to gain wide acceptance for P.A. systems, though it could prove quite useful.)
The following factors are of secondary importance in affecting clarity: system distortion (harmonic or intermodulation), system equalization, uniformity of coverage, presence of very early reflections (less than 1-2 ms), sound focusing or presence of late or isolated higher-level reflections (greater than 70 ms), direction of sound arriving at the listener, direction of any interfering noise, gender of talker, talker vocabulary and context of speech and talker microphone technique.
Even this long list is not exhaustive. For example, the emotional state of the listener will have an impact on his or her perceptions. Figure 1 summarizes the factors that affect speech transmission through a sound system.
DEVELOPING AN INDIRECT TEST
how can we build a machine-based system of intelligibility testing that factors in all of these aspects of sound transmission and reception? Is there a way to mathematically calculate the real-world intelligibility of a space?
▪ Percent Alcons. In 1971, Peutz in the Netherlands published his early findings in speech intelligibility. One of his main discoveries was that intelligibility was proportional to the reverberation time of a room, the room’s volume, and the distance between the listener and the talker. He also found that there was a limiting distance that, once exceeded, effectively caused no further loss of intelligibility. Peutz noted that it was the loss of consonants, not vowels, that most reduced speech intelligibility. After modification by Klein, the familiar form of the %Alcons equation was established. It’s now called the architectural form to distinguish it from later developments; a more complex equation was published later that factored in background noise, however, the two equations did not converge and, therefore, gave different answers.
For calculating the direct-to-reverberant ratio at a given position for given room and acoustic conditions, see “Calculating %Alcons” on page 58.
In 1986 Syn-Aud-Con founders Don and Carolyn Davis held a workshop to investigate the effects of reverberation and loudspeaker directivity on speech intelligibility. The workshop involved more than 100 attendees and produced statistically valid word score data from a large number of listeners. One result of the workshop was establishment of an algorithm for the TEF analyzer to indirectly measure %Alcons. This method is based on the measurement of an energy/time curve produced by the sound system at a designated location within the listening area. Early decay time is established as the ratio of the Direct LD (early) to Reverberant LR sound energies. From these two measurements, the equivalent %Alcons can be computed.
In the initial version of the program, the operator set the cursors to obtain the appropriate values for the early RT, LD and LR. That often led to significant variations in the measured %Alcons value from one operator to the next. However, later work on the TEF 20 enabled the algorithm to be automatically implemented with the TEF directly placing the cursors to establish the early decay time (though these are still open to manual adjustment). A time window of approximately 20 ms delineates between the direct and early sound, late arrivals and reverberation. Figure 2 shows a typical ETC measurement with the cursors set to read out %Alcons.
Many assumptions are made when using the %Alcons measurement, and the method is open to question. In practice, significant errors have been noted using the technique, a result highlighted in the TEF intelligibility workshop last October (See “Loud and Clear, page 44). Although the method can give remarkably good agreement, the inconsistency of results suggests that it is neither accurate nor robust enough for verification purposes.
The main problem lies with the restricted measurement frequency and bandwidth. The TEF %Alcons measurement is made only in the ⅓-octave band centered at 2000 Hz. That’s just 10 percent of the available intelligibility information! The naïveté of this approach can be seen in Figure 3, which plots the directivity factor (Q) of a range of typical loudspeakers. As you can see, the Q value varies significantly with frequency, as does the D/R ratio, in direct proportion (the higher the Q, the higher the D/R). With most loudspeakers, the Q at 2 kHz is usually higher than the values at lower frequencies. Hence the D/R ratio at 2 kHz can be considerably higher than at 500 Hz. The average Q value over the speech range is rarely reflected by this single measurement. It can therefore be seen that sampling only within the 2kHz band can inevitably lead to inconsistencies and inaccuracies.
A classic example of the errors potentially caused by this narrow outlook was seen at the TEF intelligibility workshop. In Figure 2, the measured %Alcons is 11.7 percent (equivalent to 0.50 Speech Transmission Index). That would suggest reasonable intelligibility; however, subjectively, the location was rated as having poor intelligibility. Measuring the STI over its seven octave bands showed a rating of 0.41, equivalent to 18.5 %Alcons — a far cry from 11.7 percent! The STI value better predicted the poor intelligibility that was subjectively noted.
▪ Speech Transmission Index. At about the time that Peutz was carrying out his research, another team in the Netherlands at the government’s TNO research laboratories was also investigating speech intelligibility with the goal of developing a rather different measurement technique. In 1971 Steeneken and Houtgast published their first journal paper on the Speech Transmission Index.
Their method is based on modulation transfer function measurements and measures a sound transmission system in each of the seven octave bands covering the range from 125 Hz to 8 kHz. Although many people understand that speech signals produce energy in those bands, they don’t always understand the manner in which the speech signals vary temporally. Temporal variations effectively modulate the higher bands (125 Hz to 8 kHz) over a range of very low frequencies (0.63 to 12.5 Hz). Houtgast and Steeneken found a direct correlation between the reduction in a transmitted signal’s modulation depth and the resultant intelligibility. By measuring the modulation transfer functions for each of the octave bands, a 7-by-14-point frequency matrix is created with a total of MTF values. The Speech Transmission Index value is computed by weighting the average MTF value for each band in accordance with its general contribution to intelligibility.
The STI concept is illustrated in Figures 4a and 4b. Noise and reverberation reduce the modulation depth of speech signals and are both accounted for. When speech is used as the test signal, it adds significant signal processing requirements and results in reduced measurement repeatability; therefore, a specially modulated test signal is preferred. Because the STI measurement operates over the complete speech range, frequency response anomalies or frequency-dependent acoustic effects, such as RT, should automatically be taken into account.
▪ STI and RaSTI. The STI scale and its relationships with other measures is shown in Figure 5. Considerable processing power is required to measure and compute the full STI. Twenty-five years ago, the need for a portable measurement instrument was well recognized, but the processing power at the time was extremely limited. STI measurements were restricted to cutting-edge computers. RaSTI was conceived to overcome that problem. A scaled-down version of STI designed for portable measuring systems, RaSTI operates in just the 500Hz and 2kHz bands over nine modulation frequencies and reduces the computation requirements by 91 percent. Bruel and Kjaer introduced the first RaSTI meter in 1985. RaSTI was primarily intended for natural room acoustic transmission (auditoriums or classrooms), but it proved capable of measuring simple sound reinforcement systems, too, and the method was soon applied to many system types. It was also adopted by several European standards in the late 1980s and early 1990s, as well as by the Civil Aviation Authority for all aircraft P.A. system certifications.
SOME RECENT REALIZATIONS
the introduction of any standard takes a while to take effect, so it was not until the early 1990s that the implications of specifying a given value started to be realized.
The first shock was that it was not always possible to achieve good intelligibility. Acoustics are the final arbiter no matter how complex the system becomes.
Second, under certain conditions, RaSTI appeared to give too high or too low a reading. Most of the problems were traced to measuring in only two frequency bands. Whereas high-quality sound reinforcement systems using well-controlled directivity devices usually measured satisfactorily, the more inexpensive devices regularly gave rise to anomalies.
For that reason, the Combined Intelligibility Scale was developed in the United Kingdom and adopted throughout Europe. This enabled other measures to be used instead of the specified RaSTI method (for example, word scores). The CIS allows comparisons to be made between the various scales and was adopted in the late ’90s. However, RaSTI was well entrenched within the standards — in spite of its problems — and engineers and licensing authorities continued to require RaSTI measurements.
As portable computers became more powerful, Shroeder realized that the modulation transfer function could be extracted from the system impulse response, enabling other measurement systems to be implemented. Most notable of these were TEF and MLSSA. Both of those instruments enabled measurement of full STI and RaSTI. That, in turn, enabled STI verification measurements to be made of RaSTI readings, and the shortfalls of the simpler system soon became apparent. But the early STI implementations were slow, taking about 10 to 12 minutes for a single measurement, as opposed to 8 seconds for RaSTI. The need for a portable full-STI measurement system was obvious. Although the processing power was certainly there to meet the need, developing such an instrument would be a major undertaking. There was a large commercial barrier to cross before it could happen.
▪ STIPA or PASTI? In late 2001 Gold Line introduced a portable STI measurement option for the DSP30 Analyzer, based on a close collaboration with Bose and TNO. The meter displays both CIS and STI readings (and other versions may eventually measure RaSTI). See Figure 6. The Gold Line meter measures sound in each of the seven octave bands from 125 Hz to 8 kHz. A special optimized test signal is required, and is provided on a standard CD. As it is a modulated pseudo-random noise signal, there will be a slight variation between measurements; therefore, in critical or borderline intelligibility areas, several measurements may be required and the average taken.
▪ As the DSP30’s OPT STICIS has only just gone into production, it is too early to draw any firm conclusions about its accuracy, but extensive testing by TNO in the Netherlands showed its accuracy to be within 0.02% of their reference STI measurement system over a wide range of acoustic conditions. The only problem on the horizon appears to be naming the new measurement. The Gold line/Bose/TNO version is STIPA (STI for P.A. systems), but in the United Kingdom PASTI seems to be gathering popularity and is set to oust RaSTI as the arbiter of sound-system intelligibility.
Peter Mapp is senior partner of AMS-Peter Mapp Acoustics, an acoustics consultancy based in the United Kingdom. Mapp is S&VC’s sound reinforcement technical consultant. He can be reached at petermapp@btinternet.com.
Calculating %Alcons
▪ Direct-to-reverberant ratio used to determine articulation in a room (familiar/architect-ural version)
%Alcons=200 RT2 D2 (n+1)/VQ
Where RT is room reverberation time, V is volume of room, D is distance between loudspeaker and farthest listener, Q is Q of the loudspeaker, and n+1 is the number that compensates for the effects of multiple loudspeakers adding to the reverberant field.
Origins of Intelligibility Testing
The pioneering work on speech intelligibility was carried out by Harvey Fletcher at Bell Labs in the early 1940s, but little was published until some 30 years later. Fletcher and his team not only established the effects of bandwidth on intelligibility but also the degree to which each octave and ⅓-octave band contributed. The Articulation Index was based on this work and was one of the first standardized methods.
The AI is an excellent method for assessing the effects of noise on intelligibility. However, it was developed for communication systems rather than P.A. systems and cannot appropriately account for reverberation and temporal distortions. The method measures the signal-to-noise ratio in each octave or ⅓-octave band. These values are weighted according to their contribution to intelligibility, then combined and normalized to fit a scale extending from 0 (no intelligibility) to 1 (perfect transmission and reception).
For many years, AI and human-based word scores were the only means of intelligibility testing. The relationship between AI and word scores is seen in the figure above. The type and complexity of the test signal (isolated nonsense words vs. connected words in a sentence) has a major impact on the results. Sentences are easier to understand than isolated words, as listeners use context to fill in sounds that are masked by noise. However, acoustically there is little or no difference between the two signals. So a different scale curve has to be applied depending on the complexity of the test speech. You can imagine how difficult it is to conceive and construct a machine-based intelligibility test that measures not only interacting acoustic parameters but also the nature of the speech itself.
Although the AI system has been an effective means of evaluating intelligibility for communication channels or sound systems in rooms with reverberation times under 0.5 seconds, in situations where reverberation becomes a potential hindrance to intelligibility, AI becomes less reliable. Attempts were made to correct for reverberation, but these were unsuccessful. Clearly, another assessment method was required to account for room reverberation.