THE PSYCHOLOGY OF HEARING
Sep 1, 1998 12:00 PM,
Diana Deutsch
In everyday life, we are continuously bombarded with mixtures of soundsthat arise in parallel from many different sources. A major task for ourauditory system is to sort out the components of such mixtures so as toreconstruct the originating sound events. As I write this article, I canhear people talking outside my window, a dog barking and the rumbling ofconstruction machinery in the distance. All these sounds are blendedtogether as they reach my ears, and yet I hear each sound as a unifiedwhole, distinct from the others. Somehow my auditory system groups thosecomponents of the sound spectrum that have emanated from the same sourceand separates out those that have emanated from different sources. What arethe principles by which such grouping decisions are made?
Perceptual grouping principlesEarly in the century, the Gestalt psychologist Max Wertheimer proposed thatwe link elements of perceptual arrays in accordance with a number of simpleprinciples [ref. 1]. One, which he termed proximity, states that we formconnections between elements that are closer together in preference tothose that are spaced further apart. Figure 1a shows an array of dots thatare closer together along the vertical axis than the horizontal one. Wegroup these dots on the basis of proximity, and as a result, we perceive aset of columns rather than rows. A second principle, which he termedsimilarity, states that we form connections among elements that are similarto each other in some way. In Figure 1b, we group the green dots andseparate them from the purple ones with the result that we perceive a greentriangle against a purple and white background.
A third principle, termed good continuation, asserts that we formconnections among elements that continue smoothly in the same direction. InFigure 1c, we perceive the lines AB and CD, rather than AC and DB. Yetanother principle, termed closure, states that we tend to perceive elementsof an array as organized into complete units. For example, we interpret thepattern in Figure 1d as a circle that is partially occluded by a rectangle.A further principle, termed common fate, claims that elements that move inthe same direction are perceptually linked together. Envisage, for example,four rows of dots, with two of the rows traveling from left to right andthe other two traveling from right to left. We link these dots into twodifferent groupings based on their direction of motion.
The perceptual system has presumably evolved to form groupings inaccordance with such principles because they enable us to interpret ourenvironment most effectively. Consider, for example, the principle ofproximity. In the case of vision, elements that are close together in spaceare more likely to have emanated from the same object than elements thatare further apart. In the case of hearing, sounds that are proximal inpitch or in time are more likely to have arisen from the same source thanare sounds that are distant from each other along these dimensions.Analogous arguments can be made for similarity. Regions of the visual fieldthat are similar in color, brightness or texture have probably emanatedfrom the same object, and sounds that are similar in character (a series ofthuds or chirps, for example) are likely to have arisen from the samesource. A line that follows a smooth pattern has probably emanated from asingle object, and a sound that changes smoothly in pitch has probably comefrom a single source. A similar argument holds for common fate. A movingobject gives rise to elements that travel across the visual fieldcoherently with each other, and many musical instrument tones are composedof partials that rise and fall in synchrony.
Fusion and separation of a sound’s spectral componentsWe now consider the relationships between the components of a soundspectrum that cause us to fuse them into a single perceptual image andthose that cause us to separate them into different images. Two types ofrelationships have been shown to be important here. One is harmonicity.Sounds such as those produced by musical instruments and the human voiceare composed of partials that stand in harmonic (or near-harmonic)relationship; that is, their frequencies are integer (or near-integer)multiples of the fundamental. For example, a harmonic series whosefundamental is 100 Hz contains partials at 200 Hz, 300 Hz, 400 Hz, 500 Hzand 600 Hz, and so on. It makes sense, therefore, that the auditory systemwould make use of this feature so as to combine sound components that standin harmonic relationship to form a single perceptual image. As an example,when presented with two instrument tones playing together, we perceive twodistinct pitches, each resulting from one of the harmonic series that ispresent in the complex.
Another relationship that has been shown to be important here issyn-chronicity of onset. When components of a sound begin to soundsimultaneously, it is a good bet that they have come from the same source;if they begin abruptly at different times, it is more likely that they camefrom different sources. A related issue concerns temporal regularities inthe way the components of an ongoing sound fluctuate in frequency oramplitude.
HarmonicityListening to sounds produced by different musical instruments provides uswith many informal examples of grouping by harmonicity. For example,stringed instruments (such as the violin) and blown instruments (such asthe flute) produce tones whose partials stand in harmonic (ornear-harmonic) relationship. The pitches produced by such instruments areclearly defined. On the other hand, bells and gongs produce tones withnonhar-monic partials, and in listening to such instruments, we easilydiscern multiple pitches. Experiments using synthesized tones haveconfirmed this conclusion [ref. 2].
We can then ask: To what extent can a single component of a harmoniccomplex deviate from harmonicity and still contribute to the perceivedpitch of the complex? It has been shown that when a harmonic of a complextone is mistuned by less than about 3%, it still contributes fully to thetone’s perceived pitch. As the degree of mis-tuning increases, however, itscontribution to perceived pitch decreases, and at a mistuning of about 8%,the component no longer contributes to the pitch of the complex [ref 3].Related findings have been obtained with respect to the contribution of amistuned component to perceived vowel quality [ref. 4].
Another line of research has examined how well we can separate two complexsounds, as a function of the relationships between their fundamentalfrequencies. For example, as the fundamentals of two complex tones departfrom simple harmonic relationship, the tones are heard more clearly as distinctentities [ref. 5]. As a related effect, simultaneous speech patterns can bemore easily separated perceptually when built on different fundamentals-theamount of perceptual separation has been found to reach its maximum whenthe fundamentals differ by one to three semitones [ref. 6].
Onset synchronicityThe temporal properties of sounds provide us with other cues for grouping.Components that arise from the same source are likely to begin to sound atthe same time, and those arriving from different sources are less likely todo so. The brain makes use of such onset relationships in making groupingdecisions. This can be demonstrated by presenting a harmonic series in sucha way that its components begin at different times. Consider a series thatis built on a 300 Hz fundamental. We can start with the 300 Hz componentsounding alone, then after one second add the 600 Hz component, then afterone more second add the 900 Hz component, and continue in this way untilall the components are sounding simultaneously. The perceptual effect isstriking. When each component begins to sound, its pitch is first hearddistinctly, then it gradually disappears from perception so that finallyonly the pitch corresponding to the fundamental is perceived.
When two complex tones are played together, they are heard as perceptuallymore distinct from each other when they begin to sound at different times.An onset difference as small as 10 ms has been found to increase theperceptual salience of the tones in the mixture, and an onset difference of30 ms has a pronounced effect [ref. 5]. Using recordings of ensembleperformances, it was found that values of onset asynchrony for tones thatwere nominally synchronous ranged from 30 ms to 50 ms-a value that would beexpected to be useful in enhancing the perceptual salience of individualvoices in a mixture [ref. 7].
Frequency modulationDuring vibrato, the partials of a complex tone move up and down insynchrony with each other in such a way as to preserve the ratios formed bythe different frequencies. One might expect the perceptual system toexploit this feature in determining which components of a sound mixture tolink.
The composer John Chowning experimented with this issue in the process ofsynthesizing a singing voice by computer [ref. 8]. He found that in orderto produce the impression of a sung vowel, it was necessary to impose acoherent frequency fluctuation on all the components simultaneously.Chowning then synthesized three simultaneous sung vowels, the first singing”oh” with a fundamental of 400 Hz, the second singing “ah” at 500 Hz, andthe third singing “eh” at 600 Hz. When there was no frequency fluctuation,the mixture was heard as a chord consisting of three pitches. However, whenthe three sets of partials were differentiated from each other bysuperimposing different patterns of frequency fluctuation on each one, thenthree sung vowels were clearly heard, each at a different pitch. However,later experiments have shown that the effects of coordinated frequencymodulation on perceptual grouping are complex ones, and many issues remainunresolved [ref. 9].
Many sounds are composed of partials whose amplitudes rise and fall insynchrony with each other, so one might conjecture that coherent amplitudemodulation would also be used by the auditory system as a cue forperceptual fusion. However, clear evidence in support of this conjecturehas been difficult to obtain [ref. 9].
Ear of inputWhen two different sound components are presented simultaneously, one toeach ear, one might at first expect that this difference in ear of inputwould provide a strong cue for separating the sounds perceptually. Uponreflection, the situation is not that simple. In natural environments,sounds are subjected to numerous distortions as they travel from theirsources to each of our ears. Given such distortions, if we were to placeheavy reliance on ear differences as cues for perceptual separation, wewould risk separating components when they should instead be grouped.
Indeed, the auditory system shows a striking tendency to disregard ear ofinput as a cue for separating out the components of a complex sound, atleast when other supporting cues are absent. In one experiment, listenersidentified the pitches of two complex tones when their partials weredistributed across the ears in various ways. Pitch identification was onlyweakly affected by the ways in which the partials were distributed [ref.10]. Another experiment examined the effect of ear of input on theperception of speech sounds. The first two formants of a phrase werepresented, one to each ear. When the two formants were built on the samefundamental, listeners could identify the speech signal and also tended tohear a single voice-that is, they fused the input from the two ears into asingle perceptual image [ref. 11].
The fact that spatial location may be disregarded in favor of other cuescan be used to produce striking illusions [ref. 12]. These occur when twodifferent streams of tones are presented, one to each ear (or one to eachof two spatially separated loudspeakers). The scale illusion and itsvariants are produced by simultaneously presented ascending and descendingscales. These are made to switch from ear to ear such that when a tone fromthe ascending scale is in one ear, a tone from the descending scale is inthe other. In consequence, each ear is presented with a set of tones thatleap around in pitch. However, the pattern is not heard this way. Rather,two melodic lines are perceived, a higher one and a lower one, that move incontrary motion. Further, the higher tones are often heard as though comingfrom one spatial location and the lower tones from the other. Figure 2shows an example of this illusion, in this case produced by a two-octavechromatic scale [ref. 13].
A different type of spatial reorganization occurs in the glissando illusion[ref. 13]. This is created by an oboe tone that is repeatedly presentedtogether with a sine wave that glides slowly up and down in pitch. The oboetone and glissando are made to switch from ear to ear (or from loudspeakerto loudspeaker) in such a way that when the oboe tone is to the right, aportion of the glissando is to the left, and vice versa. Most people hearthe oboe tone correctly as switching back and forth between locations, butthe glissando appears to be joined together quite seamlessly. Sometimes theglissando appears to be consistently in one spatial location, andsometimes, it appears to travel from one side of space to the other as itspitch goes from low to high and then back again as its pitch goes from highto low.
Continuity illusionsInformation often arrives at our sense organs in fragmented form, and ourperceptual system has the task of inferring continuities between thefragments and filling in the gaps appropriately. For example, we generallysee branches of trees when they are partly hidden by foliage, and we inferwhich of the visible segments were derived from the same branch. When wemake such inferences, we are employing the principles of good continuationand of closure because we mentally fill in the gaps between the segments ofa branch so as to produce a smooth contour.
In the same way, our hearing mechanism constantly “fills in” lost fragmentsof sound so as to make sense of the world. For example, when two people areconversing near a busy street, they must perceptually restore fragments ofspeech that are being drowned out by passing traffic. Such perceptualrestorations can give rise to intriguing illusions. For example, if asofter sound is briefly replaced by a louder one; this sometimes producesthe impression that the softer sound is present without interruption. In anearly experiment, a sequence was constructed that consisted of a tone inalternation with a louder noise, with each sound lasting 50 ms. Onlistening to this sequence, subjects heard the tone as continuing rightthrough the noise [ref. 14].
Continuity effects have also been produced using more complex sounds. Inone experiment, a gliding tone was presented that rose and fell repeatedly,and the glide was periodically interrupted and replaced by a loud noise.Listeners did not hear the glide as fragmented, but rather as continuous[ref. 15]. Similar effects have been produced with speech sounds. Whensentences were presented with a portion of each sentence deleted andreplaced by a louder noise such as cough, subjects heard these sentences asthough they were intact [ref. 16]. In an experiment using musicalmaterials, recordings were made of well known piano pieces, and some of thetones were omitted and replaced by noise bursts. Again, listeners heard thepieces as though they were intact [ref. 17]. Perceptual restorations ofthis type must occur frequently when we listen to music in concert hallswhere coughs and other loud noises would otherwise cause the music toappear fragmented.
Grouping of rapid sequences of soundsGrouping by pitch proximity emerges strongly in sequences of tones that arepresented in rapid succession. Composers frequently exploit this phenomenonin the technique of pseudopo-lyphony, or compound melodic line. Here, aseries of tones is played at a fast tempo, and the tones are drawn fromdifferent pitch ranges; as a result, listeners hear two or more melodiclines in parallel, each in a different pitch range. Such passages occurfrequently in twentieth-century guitar music. In the example shown inFigure 3, taken from Tarrega’s Recuerdos de la Alhambra, the lower tonesform the melodic line while the repeating higher tones form a background,which is heard separately from the melody.
One consequence of such perceptual splitting of tones into differentstreams is that temporal relationships across streams become difficult tojudge. In one experiment, a repeating series of six tones was presented,three from a high pitch range and three from a low one. When the rate ofpresentation was as fast as 10 per second, subjects were unable to judgethe orders in which tones in the different streams occurred [ref. 18]. Atslower rates of presentation, there is still a gradual breakdown oftemporal resolution as the presentation rate increases, and also as thepitch distance between successive tones increases [ref. 19].
Another factor that influences perceptual grouping is sound quality, ormusical timbre-an example of grouping by similarity. Composers often placedifferent instrument tones in overlapping pitch ranges, recognizing thatlisteners will group these tones on the basis of instrument type providedthat the difference in timbre is sufficiently large. Many examples can befound in Schubert’s songs in which the piano accompaniment often overlapsin pitch with the singer’s voice, yet the pitch patterns that are heardcorrespond to those produced by each instrument separately.
In a demonstration of this effect, a three-tone ascending pitch line wasrepeatedly presented in which successive tones were made up of alternatingtimbres. When the difference in timbre was small, listeners heard theascending pitch lines, as expected. However, when this difference waslarge, they instead grouped the tones on the basis of timbre, and soperceived two, interwoven descending pitch lines [ref. 20].
Just as with grouping by pitch range, so grouping by timbre can affect alistener’s ability to make order judgments concerning sequentiallypresented sounds. In one experiment, listeners were presented with arepeating sequence consisting of four unrelated sounds (a high tone, a lowtone, a hiss and a buzz). When this sequence was played at a sufficientlyfast rate, listeners were unable to name the order in which the soundsappeared [ref. 21].
ConclusionIn the past, approaches to sound perception have tended to focus onlow-level factors, such as thresholds for pitch and loudness and maskingfunctions. Recently, however, the importance of higher-level, cognitivefactors has become increasingly evident, and there is growing recognitionthat the auditory system of the brain contains some remarkably ingeniouscircuitry-perhaps the most ingenious of all the sensory modalities. Thephenomena we have been examining illustrate the operation of some of thiscircuitry, which has evolved to enable us to interpret our soundenvironment most effectively [ref. 22].
1. Wertheimer, M. Untersuchung zur Lehre von der Gestalt II. PsychologischeForschung, 1923, 4, 30l-350.
2. De Boer, E. On the ‘residue’ and auditory pitch perception. In W. D.Keidel & W. D. Neff (Eds.) Handbook of sensory physiology. Vol 5, part 3,479-583.New York: Springer-Verlag, 1976.
3. Moore, B. C. J., Glasberg, B. R., & Peters, R. W. Thresholds for hearingmistuned partials as separate tones in harmonic complexes. Journal of tbeAcoustical Society of America, 1986, 80, 479-483.
4. Darwin, C. J. & Gardner, R. B. Mistuning a harmonic of a vowel: Groupingand phase effects on vowel quality. Journal of the Acoustical Society ofAmerica 1986, 79, 838-845.
5. Rasch, R. A. The perception of simultaneous notes such as in polyphonicmusic. Acustica, 1978, 40, 1-72.
6. Assman, P. F. & Summerfield,A Q.Modelling the perception of concurrentvowels: Vowels with different fundamental frequencies. Journal of theAcoustical Society of America, 1990, 88, 680-697.
Brokx, J. P. L. & Nootebohm, S. G. Intonation and the perceptual separationof simultaneus voices. Journal of Phonetics, 1982, 10, 23-36. Sheffers, M.T. M. Sifting vowels: auditory pitch analysis and sound segregation.Doctoral thesis, Groningen University, The Netherlands. 1983.
7. Rasch, R. A. Timing and synchronization in ensemble performance. In J.A. Sloboda (Ed.) Generative processes in music: The psychology ofperformance, improvization, and composition. Oxford: Oxford UniversityPress, 1988.
8. Chowning, J. M. Computer synthesis of the singing voice. In SoundGeneration in Winds, Strings, Computers, Stockholm: Royal Swedish Academyof Music Stockholm, Publ. No. 29., 1980. 4-13.
9. Darwin, C. J. & Carlyon, R. P. Auditory grouping. In B. C. J. Moore(Ed.) Hearing. San Diego, Academic Press, 1995, 387-424.
10. Beerends, J. G. & Houtsma, A. J. M. Pitch identification ofsimultaneous dichotic two-tone complexes.Journal of the Acoustical Societyof America, 1989, 85, 813-819.
11. Broadbent, D. E. & Ladefoged, P. On the fusion of sounds reaching thedifferent sense organs. Journal of the Acoustical Society of America, 1957,29, 708-710.
12. Deutsch, D. Two-channel listening to musical scales. Journal of theAcoustical Society of America, 1975, 57, 1156-1160; Deutsch, D. Auditoryillusions, handedness, and the spatial environment. Journal of the AudioEngineering Society, 1983, 31, 607-618.
13. Deutsch, D. Musical illusions and paradoxes.Philomel Records, P.O.Box12189-2189, La Jolla, CA 1995.(CD)
14. Miller, G. A., & Licklider, J. C. R. The intelligibility of interruptedspeech. Journal of the Acoustical Society of America, 1950, 22, 167-173.
15. Dannenbring, G. L. Perceived auditory continuity with alternatelyrising and falling frequency transitions. Canadian Journal of Psychology,1976, 30, 99-114.
16. Warren, R. M. Auditory illusions and their relation to mechanismsnormally enhancing accuracy of perception. Journal of the Audio EngineeringSociety 1983, 31, 623-629.
17. Sasaki, T. Sound restoration and temporal localization of noise inspeech and music sounds. Tohuku Pschologica Folia, 1980, 39, 79-88.
18. Bregman, A. S., & Campbell, J. Primary auditory stream segregation andperception of order in rapid sequences of tones. Journal of ExperimentalPsychology, 1971, 89, 244-249.
19. Van Noorden, L. P. A. S. Temporal Coherence in the Perception of ToneSequences. Unpublished doctoral dissertation. Technische HogeschoelEindhoven, The Netherlands, 1975.
20. Wessel, D. L. Timbre space as a musical control structure. ComputerMusic Journal, 1979, 3, 45-52.
21. Warren, R. M., Obusek, C. J., Farmer, R. M., & Warren, R. P. Auditorysequence: Confusions of patterns other than speech or music. Science, 1969,164, 586587.
22. For further reading,see Bregman, A. S. Auditory scene analysis: Theperceptual organization of sound. Cambridge: MIT press, 1990; Darwin, C. J.& Carlyon, R. P. Auditory grouping. In B. C. J. Moore (Ed.) Hearing. SanDiego, Academic Press, 1995, 387-424; and Deutsch, D. Grouping mechanismsin music. In D. Deutsch (Ed.) The psychology of music (2nd edition) SanDiego, Academic Press, 299-348, in press.