Sep 1, 1998 12:00 PM,
The ear is a highly non-linear device with amazing acoustical resolvingpowers. By understanding a little about the way the ear/brain mechanismhears and interprets sound, audio and sound engineers are in a betterposition to optimize sound system performance and perceived sound clarityand fidelity. This article presents some of the main characteristics andresolving powers of the ear in relation to sound systems engineering.
Dynamic frequency and rangesFigure 1 shows the dynamic range of the ear from the threshold of hearingto the threshold of pain together with the normal frequency and soundpressure level ranges for speech and music. In the latter case, the musicalrange is characteristic of conventional acoustic instruments and notelectronic or synthesized sound, which can have extended frequency rangesand higher levels. Equally, the ranges cover only the fundamental notes orsounds; extending the reproduction bandwidth to encompass the complexhigher harmonic sound structures is, of course, well known to improveclarity and fidelity. Approximating the hearing range from 20 Hz to 20 kHzyields a scale of 10 octaves bandwidth. Speech may be taken to cover justthe primary range from approximately 100 Hz to 5 kHz and acoustic musicfrom around 40 Hz to 12 kHz. The dynamic range of normal speech variestypically from around 30 dB to 70 dB, although non-sustained shouts of 90dB to 95 dB are possible. (The author has also measured 114 dBA screamsfrom his children, but these are thankfully not sustainable and carry nointelligible information content and cannot therefore be regarded asspeech.). The dynamic range of music extends from around 20 dB to 100 dB.The normal sounds we hear generally only exercise a limited part of ourhearing ability.
Frequency response and loudnessThe ear (by this we mean the ear-brain mechanism) does not hear all soundsequally. At low sound levels, the low- and high-frequency ranges aresuppressed or attenuated so that they do not appear as loud as themid-frequency sounds (for example around 1 kHz to 2 kHz). As the soundlevel increases, so too does the linearity of the ear, and the responsebecomes flatter. This effect is illustrated by the equal loudness contoursshown in Figure 2. The contours show how loud a tone at one frequency needsto be in order to sound as loud as that of a 1 kHz reference signal. Thecurves clearly show why at low sound levels, music appears to lack bass andhigh-end definition. (leading to the typical double-hump disco curve wherethe high- and low-frequency ranges are emphasized by means of graphic orparametric equalization. The loudness compensation switch on certain hi-fistereo amps is also based on these curves.
Another way of looking at the ear’s sensitivity to frequency can beobtained by inverting the curves. This then shows how the ear would respondto a frequency at a given level. Figure 3 shows this and is the inverted 90phon curve. When the first sound level meters were made, it was soonrealized that a linear scale did not give good correlation with thesubjective impression of a given sound’s apparent loudness. Weightingcurves based on the inverted equal loudness contours gave much betterresults. Initially, three curves were adopted forming the A, B and Cscales. The A curve was based on the inverted 40 phon curve to deal withreasonably quiet sounds through B and C to cater for louder sounds. Overtime, the A scale has been universally adopted as the measure of soundlevel most closely correlated and easy to use to assess a noise’s loudness.It is often forgotten, however, that its origins lie in the assessment ofsound at low levels. The C scale is often still found on sound level meterstoday because it enables a better assessment of lower frequency soundlevels (at 125 Hz, the A weighting will attenuate a sound by 16 dB). The Cscale, by comparison, has a flat response down to 50 Hz. In evaluating thesubjective response (and particularly the annoyance) caused by loud music,it has been long recognized that the A weighting scale underestimates theproblem. A spectral analysis (even if only in 1/1 octave bands) is a farmore useful approach and is often used to determine the offending levelover the normal ambient background in noise nuisance assessments.
The perceived loudness of a sound depends not only upon its level, but alsoupon the bandwidth of the energy of signal. The greater the bandwidth orwider the spectral energy, the louder the sound seems. To assess this moreaccurately, the sound needs to be analyzed in terms of the ear’s criticalbandwidths. Analyzers working at 1/3 octave intervals approximate the ear’sresolving power, but they are too selective at low frequencies. Theconstant percentage analyzer frequencies are equivalent to 23%, whichreasonably matches the 16% to 24% bandwidths of the ear from 450 Hzupwards. Any frequency components of a sound falling within a criticalbandwidth of the ear are perceived to be amalgamated and create a louderoverall sound. The trend nowadays towards either computer- ormicroprocessor-based acoustic instrumentation is allowing true loudnessassessments to be made more widely. Automatic real-time spectral analysisand loudness computation can now be undertaken in a handheld instrument. Itwill be interesting to see how long it will take for the measurement andassessment standards and codes to adopt these new procedures. A potentialproblem, however, is that once in the true critical bandwidth and loudnessdomain, the familiar dB and Hz frequency scales have to be abandoned andreplaced with Sones and Barks. The advantage of the Sone scale is that itis expressed in terms of linear loudness (a 2 Sones sound is twice as loudas a 1 Sone sound).
Another issue concerning loudness is that of temporal integration (asopposed to spectral integration). Figure 4 shows how the loudness of asound increases in duration up to a maximum of around 200 ms. For impulsesor tonebursts of a duration greater than 100 ms, the loudness isindependent of pulse width. Pulses greater than 200 ms are perceived to beas loud as continuous tones or noise of the same level. Another way oflooking at this is to understand that a 1 ms pulse needs to beapproximately 25 dB higher in level than a continuous tone of the samefrequency to sound equally loud. Figure 4 also shows why an integrationtime of 125 ms was adopted for the Fast scale on a sound level meter.
To measure the true loudness of a sound therefore requires not only thesound to be analyzed and integrated spectrally in terms of criticalbandwidths, but also to be temporally integrated and assessed. Again,modern analyzers are beginning to have the computation power to undertakethis complex process.
Pitch and frequencyThe pitch of a sound relates to the human subjective quantity, whereas thefrequency of the sound is an objective physical measure. It would have beenconvenient if pitch and frequency coincided, this is not the case. The unitof subjective pitch is the mel. The relationship between the mel andfrequency is shown in Figure 5. The barely distinguishable difference inpitch is 1/20 mel, regardless of frequency. Critical bands turn out to bealmost a constant number of mels wide. Also, rather interestingly, thefrequency ranges having equal contributions to the intelligibility ofspeech are also nearly a constant number of mels wide. The mel scale wasderived by asking test subjects whether the pitch of a given test toneappears to be either half or twice the test frequency or whether a tone ismidway is halfway between two reference tones. Additionally, the intervalbetween two pairs of tones was also investigated with the subjects judgingwhether two intervals between pairs of tones were equal. One surprisingresult is that equal intervals of subjective pitch seem not to agree withmusical intervals, as shown by the dotted curve in Figure 5.
Directional effectsThe ear-brain mechanism uses two different physical effects to determine asound’s origin. One makes use of the fact that we have two ears spaced some8 inches to 9 inches (203 mm to 229 mm) apart and that at most directionsof incidence, there will be a difference in arrival times at the two ears.The obvious exceptions to this are for sounds arriving directly in front,behind or over the listener, where the path lengths will be identical. Inthese instances, the second method of directional discrimination takesover, that of the received frequency or spectral response. This will bedifferent for each of the three cases listed above due to the shape andcomplex folds of the outer ear (pinna) and the position of the head. Soundarriving from any given direction will have a unique set of spectraltransfer functions. These can then be used to identify the direction of thesound arrival. However, an automatic reaction is also for a listener toturn his head, usually towards the sound, to confirm the exact direction byallowing changes to occur in both the temporal and spectral receptiondomains. Figure 6 presents a series of HRTFs (head-related transferfunctions), which clearly shows how the received frequency response changeswith the angle of incidence. The differences are due mainly to the combfiltering effects caused by reflections from the pinna interfering with thedirect sound path. For a sound arriving to the side of the head (90degrees), at 15 degrees above the ear, an interference notch at around 11kHz would typically occur while at ear level, the notch would be at 8 kHz,and at 15 degrees below, it would be around 6 kHz to 7 kHz. Theseinterference effects are caused by short delay times of 45 ms to 80 ms.These spectral effects are strong enough to allow one-eared listeners tolocalize sound direction accurately. HRTFs are becoming an important aspectof sound reproduction because by careful filtering and signal manipulation,it is possible to fool the ear into perceiving almost any required sounddirection from just two loudspeakers, a psychoacoustic feature currentlystarting to be exploited in teleconferencing and virtual reality.
Temporal and delay effectsWe have already seen that the duration of a sound affects its perceivedloudness and that short delays enable localization. In the real world, wealso hear early room reflections that give further psychoacoustic cues asto the distance and direction of a sound. Arrivals after approximately 60ms to 70 ms (53 feet to 62 feet or 20 m to 23 m) can give rise to distinctechoes being heard. Such echoes can significantly reduce perceived speechintelligibility. Investigations into the effects of local reflections haveproduced some interesting results and phenomena of great use to the soundsystem engineer if understood and appropriately applied. The effects arebased on temporal masking or the ear’s inability to distinguish closelyspaced time arrivals of similar nature (speech and a local reflection).Haas was among the first researchers to investigate the effect, and hisresults gave rise to the famous Haas Effect, whereby the localization of asound may be maintained provided that a secondary sound or reflectionarrives within an appropriate time frame even though it may be louder thanthe first. This means that the direct sound masks the reflection orsecondary sound. The reflection or secondary sound is generally regarded asfusing or integrating with the direct sound, thereby increasing itsperceived loudness. The integration period is generally taken as beingaround 30 ms to 35 ms, although some research suggests that speechintelligibility may begin to degrade after 20 ms.
It is often quoted that Haas found that the secondary sound could be up to10 dB higher than the direct sound before correct localization was lost.The effect is therefore invaluable in sound system design whereby a local,delayed reinforcement loudspeaker can be used to improve sound quality andintelligibility without upsetting the correct source localization. Inpractice, I have never been able to achieve such a high amount of gain; 6dB would seem to be around the maximum practical limit. Furtherinvestigation shows that in actual systems, the effect is very muchdependent upon the direction and spectral content of the secondary source,a hardly surprising effect based on the previous discussion regarding soundlocalization effects, and local natural reflection patterns and sequences.The most common form of the Haas curve is shown in Figure 7, but it mustnot be forgotten that the curve represents the 50% disturbance level (50%of those listening could detect the secondary source). A 10% disturbancelevel is probably rather more appropriate for today’s more sophisticatedand critical audiences. Figure 8 shows a slightly different aspect ofreflection and echo perception. In this case, two curves are showncorresponding to the threshold of detection of a reflection or secondarysource (broken line) and the threshold curve for when a secondary source isjust perceived as a second identifiable source. Note that the relativelevels and the crossover point at around 30 ms. At longer delaysreflections need to be well down in level to be undetectable (14 dB down at60 ms). The thresholds are for single reflections (anechoic sound with asingle reflection), but in practice, other local reflections andreverberation reduce the sensitivity and raise the thresholds a little. Inparticular, by the creation of a series of early reflections, the Haasfusion zone, for example, can be significantly extended. Figure 9summarizes some of the Haas and reflection perception effects with regardto speech.
I have long held the view that the direction of a speech signal andreflection can affect the perceived intelligibility. This is certainly thecase in noisy environments. Studies by NASA have shown that broadcastingvoice messages from directions different to a source of noise candramatically improve intelligibility, and in reverberant environments,binaural listening gives a distinct advantage to monaural. Figure 10 showsthe results of some recent research by Yando in Japan. Here, the directionof the reflection is clearly shown to affect the speech intelligibility,particularly with delay times greater than 50 ms.
Spectral masking and distortionsAnother important psychoacoustic aspect of direct relevance to soundengineering is of spectral masking. By this process, a sound at onefrequency can mask or render a second signal completely inaudible, althoughthe apparent loudness may be affected. Figure 11 shows the effect. Upwardmasking (that of a lower frequency masking a higher frequency) is the mostsignificant, although downward masking also occurs. Masking is ofparticular relevance when discussing distortion. For example, the maskingcurves of Figure 11 show the considerable masking of distortion that canoccur. Typically, it is the higher harmonic components that result inaudibility. Current research suggests that non-linear distortions of around0.05% are audible if harmonics are of high enough order. Lower orderharmonics are detectable at around 0.25% (-52 dB). Linear (amplitudedistortions) of just 1 dB are detectable while low Q resonances (Q=1) aredetectable in loudspeakers using pink noise as a stimulus at just 0.25 dB(resonance at 5 kHz).With normal program material, however, thesethresholds reduced significantly. Phase distortion is less audible at thefrequency extremes of the audio band rising from around 1 ms group delay at4 kHz to 6ms at 15 kHz.
Spectral masking is also of great importance when it comes to designingalarm or warning signals that may be used in the presence of noise. From aknowledge of the noise spectrum and characteristics, it is possible todetermine if a given frequency will be masked and audibility lost.Conversely, the same is true of speech in noisy environments as madeevident by the Articulation Index, which allows objective prediction of theresultant intelligibility.
There are many other aspects of psychacoustics that affect they way weperceive a transmitted sound, particularly with regard to spatial effects,perceived reverberance and music perception, all of which fall outside thescope of this article. When you consider the complexity and non-linearityof the hearing process, it may make you wonder how we manage to hearanything at all.