Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now


Listening Evaluation

Good sound is more than just a matter of taste.

Listening Evaluation

Nov 1, 2005 12:00 PM,
By Rick Kamlet

Good sound is more than just a matter of taste.

Web-expanded Sidebar

Published Studies on Listening Tests

Harman’s Multichannel Listening Lab is designed to reduce subjectivity in listening evaluation as much as possible.

You walk into a semi-darkened room about the size of a small living room. The selective lighting reveals six comfortable chairs. You see a scrim at the front of the room, and a flat-panel computer screen and a keyboard at the center chair. Previously, you were given a hearing test and went through a couple of hours of computer-based training to teach you to describe differences you hear between sound qualities. As you sit down, you’re asked to switch between several loudspeakers and assess which ones sound best and why. You cannot see the speakers behind the scrim, and each speaker seems to move into the same physical location behind the scrim as you select it for listening. You are asked to repeat the listening, and you suspect that the speakers may be numbered differently and in a different order.

As you switch between the speakers, you realize there’s some science behind this! You’ve just experienced the Harman Listening Lab. Why is it set up the way it is? Why was your hearing tested, and why were you given the computer-based training?

All loudspeaker manufacturers strive to create products that sound great, but the qualities that make sound “good” seem at times to be as controversial as those that qualify a wine as good. Because of this, Harman International, the parent company of JBL Professional, has an entire department dedicated to studying listener evaluations. Led by Dr. Floyd Toole and Sean Olive, this group works toward defining the qualities that characterize good sound and then tying those qualities to objective, measurable factors. Through their research they have found improved methods of collecting listening data and interpreting that data in ways that relate more directly to what we hear. In the end, the goal is to more consistently develop speakers with better sound fidelity.

The listener ultimately determines how good a loudspeaker sounds. Just as beauty is in the eye of the beholder, sound quality is in the mind (or the ear-brain hearing system) of the listener. Some loudspeaker designers and manufacturers design to a set of criteria, the validity and completeness of which may or may not tell the whole story about sound fidelity. Some rely on a trusted “golden ear” listener or a set of listeners. Others hold listener evaluations with panels of listeners. But listeners can have many different and often conflicting opinions, even when listening to the same thing under the same conditions. Sometimes this is because there are too many variable factors — ones that may not have been taken into consideration. Some of the factors listeners are basing their judgments on may, in reality, have little to do with actual sound quality.

So, why research the listening process? Why develop standards for holding valid listening sessions? Research is performed in the listening lab to make listening evaluation reliable and valid. Once you have reliable methods of holding listener evaluations, then you can determine what objective, measurable factors in loudspeaker designs and performance relate most closely to listener preference.

The results of this research are regularly published in papers presented through audio research organizations in order to advance the science. However, these papers tend to be heavily laden with statistical details that make them somewhat difficult for the general public to get through. This article tries to summarize some of these studies in more accessible language.

Figure 1 Those listeners who had gone through listener training were found to be much more consistent than untrained listeners, regardless of their profession. Olive (JAES 2004)
For a larger image, click here

Before reviewing the research about loudspeaker design, let me start with some basic principles listening research has uncovered:

Room position: The position of the speaker within the room can affect the sound character more than the actual differences between the speakers. This unwanted variable is unrelated to the actual sound quality and must be eliminated from listening sessions. You don’t want to rate a speaker as sounding better only because a particular location made it sound better. Therefore, a listening lab was developed where the speakers all shuffle into the same position within the room for listening, removing that false variable from listener’s assessments.

Speaker appearance: In sighted listening tests, it has been found that the appearance of the speaker often falsely affects the listener’s evaluation about how it sounds. A big speaker with a beautiful, polished finish and impressive industrial design is often judged higher simply because of its appearance. Such biases do not give us reliable data about sound quality. Therefore, listening tests must be arranged so that the listener cannot see the speaker.

Consistency and training: Novice listeners don’t tend to have the vocabulary to describe what they’re hearing. They find it difficult to be analytical in forming their judgments. For this reason, listeners tend to focus on fairly narrow aspects of sound quality. This can vary between individuals, and even between sessions for the same individual, resulting in inconsistent judgments. As listeners get more experienced at hearing, their listening palette broadens and their consistency quickly improves. Because of this, the Harman group has developed computer-based training that helps listeners articulate what they’re hearing.

Consistency and hearing loss: Some listeners will listen to exactly the same setup in two different listening sessions and come up with radically different ratings for the same loudspeaker. Experiments have shown that the most erratic listeners are those with hearing loss. It turns out that people with good hearing tend to offer more consistent assessments over multiple listening sessions and tend to agree closely with each other.

What does all this mean? Well, one thing it means is that good sound quality is not so much a matter of individual taste as was previously thought. It is not totally subjective, as is wine tasting. It was found that there are objective criteria that people with good hearing and decent analytical capability can agree upon. Loudspeaker design is, to a large degree, a science, and as a result, scientific measurements can be a predictor of loudspeaker preferences.

Figure 2: The seven frequency curves typically used in the model include on-axis (ON in black), listening window (LW in green), early reflected curve (ER in red), predicted in-room response (PIR in black), sound power (SP in light blue), and directivity indices related to the sound power and early reflections (SPDI in red and ERDI in purple). Olive (JAES 2004)
For a larger image, click here


Now that we have some principles for performing reliable, objective, and consistent listening evaluations, we can apply the data gathered from these listening sessions toward discovering design criteria for loudspeakers. Here are some of the technical principles that have emerged from these listening tests:

Price: One principle is that high cost doesn’t necessarily guarantee high performance. Many very expensive speakers rate poorly in unbiased listening tests.

Single specifications: The +/-3dB spec often used to describe the quality of a loudspeaker’s frequency response is, in many ways, meaningless. The important thing is what happens within that +/-3dB tolerance. A tolerance of +/-1dB starts becoming more meaningful but, ultimately, a single specification for the character of the frequency response does not tell you enough. You need a graph to tell you how the speaker performs and some principles to tell you what is acceptable on that graph and what is not.

Figure 3: The family of anechoic curves of four loudspeakers in a listening test in the order in which they were rated, from highest to lowest. Olive (JAES 2004)
For a larger image, click here

Measurement “granularity”: Coarse measurements are typically inadequate. For example, the audibility of resonances varies significantly, based on a number of factors, one of which is the Q of the resonance. Frequency response curves that have been averaged on a 1/3-octave basis — the curves sometimes found on spec sheets — may tell about general tonal balance (which can usually be adjusted through equalization). But they tend to lack the necessary resolution to distinguish between important factors such as low-Q vs. high-Q resonances. To assess audibility, 1/20-octave measurements are often required.

Neutral sound: Listeners assess a neutral, accurate sound reproduction as the best sound quality rather than an enhanced, exaggerated sound, as some used to think.

Family of complex specifications: Listener preferences correspond not with a single technical measurement, such as on-axis frequency response or total sound power, but rather to a family of curves such as those in Figure 2 on p. 42. Each of these curves reveals something that the other curves might not. These curves represent:

  • On-axis frequency response
  • The response averaged over a specific off-axis listening window
  • The early reflected sound response
  • A predicted “in-room” response (more typically useful for home speakers)
  • The sound power (the total of sound emitted in all directions)
  • The directivity indices related to the sound power and to early reflections.

It is not necessary for all these curves to be flat. Although it’s best for the on-axis frequency response to be flat, you want some of the other curves to descend or ascend with frequency. It’s good, for example, for the off-axis curves to have a similar general character to the on-axis curves, and for the transitions of the curves to be gradual and not choppy. However, unless the system has perfectly constant directivity at all frequencies, you do not want the sound power curve to be flat over its entire range, or the speaker will be too bright on-axis.

Figure 4: An example of preference ratings and 95 percent confidence intervals for a comparison of 13 loudspeaker models. Olive (JAES 2004)
For a larger image, click here


Figure 3 shows the family of spatially averaged curves of each speaker in a four-speaker listening test in the order in which they were rated. These happened to be expensive home hi-fi loudspeakers with retail prices between $5,000 and $11,000 each, but the same evaluation process is valid for all price ranges. You can see how variations from one curve to another, and the choppiness of the curves, led to some speakers being rated more poorly than those with smooth curves. The set of curves shows that, although most of the speakers in this study were fairly well-behaved off-axis, the more poorly rated speakers showed a tendency to lose correlation between sound power and the on-axis sound character.

Many additional principles have been discovered during the research, such as which parameters of spectral balance are correlated most closely to listener preference, in terms of both positive and negative assessment of sound quality. And the research actively continues. Olive’s and Toole’s group continues to work toward discovering principles that help quantify the listening evaluation process.

Is this research useful in improving the performance of professional products? How does this relate to sound in venues such as performing-arts centers, recording studios, concerts, stadiums, and movie theaters?

The human hearing system works the same way whether a person is in a living room listening to a home theater system or at venues with professional sound systems. In professional studios, the neutrality and accuracy of a studio monitor, both on-axis and off-axis, apply toward its ability to successfully translate a sound mix. In performance spaces, the naturalness of the sound is essential for enhancing the listening experience. In houses of worship, the clarity of sound affects how well the message gets to the parishioners.

Sound quality is crucial to such high-volume professional applications; yet even in low-volume commercial applications, the quality of a business’ sound system affects customers’ perceptions of that business. It sets the atmosphere and image. For instance, accuracy is an important factor in the intelligibility of an overhead paging system. Applying measurable principles of sound quality may be more important than ever in these venues.

Of course, the challenges manufacturers and systems designers encounter with larger venues require many additional principles to be applied. If it were simple, everyone would be doing it! But, the more we as an industry learn about how what we measure relates to what we hear, the better the products and system designs will be.

Published Studies on Listening Tests

  • “Differences in Performance and Preference of Trained versus Untrained Listeners in Loudspeaker Tests: A Case Study” by Sean E. Olive, AES Convention Paper, March 2003.
  • “A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part 1—Listening Test Results” by Sean E. Olive, AES Convention Paper, May 2004.
  • “A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part 2—Development of the Model” by Sean E. Olive, AES Convention Paper, Oct 2004.
  • “Audio, Science in the Service of Art—Newly Revised” by Floyd E. Toole,

Rick Kamletis senior director of commercial installed sound at JBL Professional.

Featured Articles