What are our performance goals?: Although we can't yet evaluate thesubjective quality of a system, we have measurement techniques that atleast point us toward better quality.
Jul 1, 1997 12:00 PM, Bob Thurmond
What is the difference between a really good sound system and one thatdoesn't sound good enough? Yeah, yeah; your systems sound great and yourcompetitor's don't. Everyone has their opinions, but proving what you say,producing facts to back up your claims, dealing in specifics instead ofgeneralities...now there's the rub. How would you go about doing such athing?
Who are we?Science is the process of discovering and specifying the principles thatdescribe the characteristics of real objects and systems. Engineering isthe practice of using these principles to produce new devices and systems.In every case, the principles must be specific and quantified in order toproduce consistent results. In true science and engineering, all resultscan be precisely calculated before they occur and precisely measured andverified afterward.
Do we really need to worry about such things? We'd better, if we intend tocall what we do engineering!
Sound systems obey precise rules, as do all other systems; the performanceis never random or erratic for no reason. If we construct two identicalrooms with identical sound systems, the performance will be identical inevery way. Unfortunately, consistency does not imply completequantifiability. Sound systems, like weather systems, are exceedinglycomplex. At the present time, both can be described, predicted, andmeasured only in limited, generalized ways. This does NOT mean that thecharacteristics of both systems cannot be quantified at all, nor that thelimited quantification currently achievable is not useful. Quite theopposite may be true.
So just how well can we predict or measure the characteristics of a soundsystem? How accurate or useful are the actual results? Because we can'treally tell how good a prediction is until we have measurement techniquesgood enough to verify the results, we'd better work on the measurementsfirst.
How loud?Which meaningful sound system characteristics can we measure accurately?The one that probably comes to mind first is loudness. In fact, this isundoubtedly the most important single characteristic of a sound system. Thewhole idea is to make the sound louder, and this is easy enough to measurewith a sound-level meter.
However, sound-pressure level is not the same as loudness. Sound pressureis a measure of the intensity of a disturbance in the air molecules;loudness is a mental perception. The two are related, but not directly. Infact, it is quite difficult to construct a measurement scheme able toproduce results that correspond closely to the perception of loudness.Several somewhat different techniques have become accepted, but all areseverely limited. For example, most are primarily intended to measuremechanical noise, sometimes speech, but not music. In fact, speech loudnessis difficult to measure accurately, and there is no standard technique formeasuring the loudness of music at all.
To get an idea of the difficulty of such a task, think of the significanteffect even a small amount of processing can have on the perceivedloudness. A bit of compression, a bit of distortion, some judiciousresponse shaping, some phase shifting or a complex combination of all ofthese can have an effect that is quite audible yet is only slightlymeasurable by conventional means. Clearly there is a significant gap in ourknowledge, and no one seems to be making any real progress in filling it.We don't even know how to measure a sound system's most importantcharacteristic!
Then factor in the obvious differences in musical taste betweenindividuals. We can all think of favorite music that never seems to be tooloud, and other, hated examples that seem too loud at any level. No doubt,any given example will be perceived as louder by some people than byothers. Many such variables might affect perceived loudness. Would it everbe possible to quantify, or even identify, all of them?
Fortunately, it seems that we may not need to measure the loudness of asound system directly. Approximations based on sound-pressure levels maysuffice, even though there are often disagreements about what loudness theyactually represent. That's the problem, of course; we really aren'tmeasuring loudness at all.
A rose by another nameIt is entirely possible to measure a subjective response to a stimulus, andquite accurately at that. Psychologists do it all the time, and thetechniques are well established. Unfortunately, they are tedious andtime-consuming and require many subjects for reliable results. It isundesirable to go through all this for every test, but another approach isoften possible. If we can clearly relate a stimulus and a response, then wecan be reasonably sure that every time people are exposed to the stimulus,they will experience the corresponding response. Then all we need to do ismeasure the stimulus, such as SPL, and we will know what the subjectiveeffect, such as loudness, will be.
It is difficult to establish such a relationship accurately, so why botherto try? Well, for one thing, there is a little annoyance called hearingloss, which happens to be a lot more common, and permanent, than we like toadmit. Relationships between hearing risk and sound levels over time arealready well established, but only for noise exposure, not music. There issome evidence that music does not have the same effect as noise, but no onereally knows. It may even turn out that perceived loudness is moreimportant than actual sound levels, but there has been no serious effort tofind out. We are only guessing. This is engineering?
If we really knew more about the actual relationship between music orspeech levels and hearing loss, we could predict the effects on audiences,musicians and sound professionals. We could be reasonably sure of whichexposures are safe and which are not. This can take away some of thethrill, much like wearing a seat belt or a life jacket; many people justdon't believe something bad will happen to them until it is too late.
Even if we are dealing with a speech-only system, all is not well. Thetechniques typically used to measure loudness assume the source of thesound is an unaided voice. An amplified source would have significantlydifferent sound power, as well as directional and source coherencecharacteristics. It is not clear how these would affect the results.Furthermore, in such a system, the primary concern is usually that thesound may not be loud enough, rather than too loud. This, of course, bringsup the next question: how loud is loud enough? But before we can even beginto answer this question, we may have to deal with several others.
Variations on a themeFor example, there has been a tacit (sorry!) assumption in this loudnessdiscussion: that a given system can produce a specific loudness level. Butbecause loudness has meaning only in terms of the listeners, and there aremany listeners to most systems, there may be a great variation in loudnessover the listening area. Perhaps such a variation is acceptable, perhapsnot. Perhaps some audience members would choose to be in high-level areas,while others would choose lower-level areas, if they could. Has such achoice actually been offered? If so, was there actually a variation inpreference, or did everyone want pretty much the same loudness? If thelatter was the case, as might be expected, did they actually get it? Do allaudience members ever get the same loudness levels? Never exactly, ofcourse, but how much variation is typical? Does the sound provider have anyresponsibility in this regard, or is it simply a matter of avoiding a lossof business because of too many complaints?
If we honestly try to provide uniform loudness coverage to all theaudience, we would need to know how close we need to come and how muchvariation is acceptable. We would have to find a way to verify theperformance of a system - is there any measurement technique we could use?
There is more to a sound system than loudness, of course. The tonalbalance, or timbre, is widely considered to be the most important aspect ofperceived sound quality. We would have to decide what the timbre should be,whether it should be just like that of the original source. Maybe, maybenot.
Let us assume for a moment that we only want our system to make theoriginal sound louder without changing it in any other way. If we usegood-quality, full-range components, the result should be pretty close.Then we can measure the overall response in any of several ways andequalize it quite flat. What's the problem?
Off balanceThe problem is that the resulting sound will be strikingly different fromthat of the source. It will be much too bright and in-your-face, andprobably too thin as well. Furthermore, the degree of this effect may varywith the type of room. Once again, the instrument does not measure thesound the way we perceive it. But in this case there may be a fairlystraightforward way to correct for this difference, if we can just figureout what causes it.
Many people suspect that the difference is caused by the reverberant energythe room adds to the original sound, and which is excited by the higher,reinforced level. They reason that our hearing can discriminate between thedirect and the reverberant sound, but the instrument microphone cannot. Orperhaps the fact that the reverberation time typically varies greatly withfrequency, which affects the instrument more than our hearing. In any case,the solution would be to equalize for flat direct sound.
Actually, this is not hard to do with modern instrumentation.Unfortunately, it usually makes the sound even worse. It is clear that thisis not the cause of the problem at all.
So we wind up simply altering the response correction until the resultssound acceptable. We roll off the highs and apply other tweaks, pridingourselves on our experience and our knowledge of "audience preferencecurves" and such. Of course, this is only a ritual; we really have no ideawhy this correction is necessary, let alone what form it should take.
A natural balanceActually, in this case we have a good clue. More than 20 years ago, BobSchulein, at Shure Brothers, discovered a way to measure this effect andderive an accurate correction for it. His experiment was simple enough thatmost audio professionals can duplicate it easily and learn a lot.
It requires two identical loudspeakers, one placed a meter or so in frontof a listener in a room, the other about 10 times farther away. Pink noiseis fed over each loudspeaker in turn, and the levels are adjusted untilthey appear to match at the listener location. It will be noticedimmediately that the timbre of the sound from the two loudspeakers isdifferent, with the character of the difference depending on the room.Next, the signal to the distant loudspeaker is equalized until it soundslike the near one. The resulting equalizer response is the correctionneeded to make a distant loudspeaker sound natural in that room. It willconsist of a high-frequency rolloff and probably a gentle rise around 500Hz, but probably with differences between rooms.
Note that this equalizer response is not the system correction needed tomake this loudspeaker sound right. It represents the measured response theoverall system using this loudspeaker in this room needs to have in orderto sound right.
So why have you never seen this curve published somewhere? Schulein'soriginal article was published in the April 1975 issue of the AES Journal,and the curve was reproduced in the April 1989 issue of S&VC. To myknowledge, it has never been published elsewhere. Worse, the originalexperiment was carried out only once, in one room, and, so far as I know,never repeated by anyone. Why has no one bothered to gather such importantdata? Perhaps it's too much trouble; perhaps we're afraid it might tell usthat what we thought we knew is wrong.
Furthermore, this preferred response curve will probably be different - andno one knows how different - for other loudspeakers and other rooms. Itmight be possible to predict the correct response for a given loudspeakerand room type, but it certainly won't if all this remains unmeasured.
Besides, this response curve may not really be what we want at all. Theremay be some other curve that sounds better in some way. This is almostentirely a matter of opinion, but there may be justification for thisoption. It is possible, for example, that a non-neutral response couldenhance speech intelligibility in a difficult situation. Such a variationwill always affect the naturalness of the sound, but there may be areasonable tradeoff between the several desirable goals.
A measure of balanceIn any case, we will need to measure the system frequency response. Ah,yes, another measurement, but this one, at least, is well established. AnRTA or some other TLD (three-letter device) will give us all theinformation we need, won't it? After all, this time we're not trying tomeasure our perception of anything, just the system frequency response. Ofcourse, anyone who has played around with this has learned that themeasured response is substantially different at different locations. Thisinconveniently presents us with two problems.
First, because we can usually make only one correction to the overallsystem response, we must have a single response to correct, presumably theoverall average. But for such an average to be valid, it must include anadequate number of samples, taken over a representative range of locationsand averaged properly. There are good indications that six samples is theabsolute minimum, with eight or 10 being safer, especially if the variationbetween samples is too great. The difficulty is deciding what is too greatand on what basis to make that decision. How do we know that our samplelocations are really valid? Perhaps only by taking even more, to see if theresults remain consistent. And what is a proper averaging technique - SPLaveraging by frequency bands, power averaging, multiplexing? The resultswill not be the same. SPL averaging probably corresponds best to what weperceive, but no one has studied this to be sure.
The second problem involves a matter we have already mentioned: thevariation in response from one location to another. If this variation istoo great, then individual listeners will hear different system timbres.Each will have a different perception of how much is too much or evennoticeable and under what circumstances. Has anyone ever seriously tried toanswer these questions?
IrregularitiesThe problems caused by poor coverage uniformity are common and serious. Hotspots, dead spots, good areas, bad areas, uneven quality - all are commoncomplaints directed against systems with uniformity problems. The causes ofsuch problems are not our concern at the moment; detection and evaluationare. Perhaps we should review some of the possibilities.
We need to measure amplitude vs. frequency vs. latitude vs. longitudethroughout the seating area. Four dimensions are a bit tricky to handle;even three can be measured or displayed only if at least one is treated indiscrete steps rather than continuously. In other words, we must hold twoof the dimensions fixed, such as the location, in order to make acontinuous-sweep measurement in the other two. (Actually, integration timecould be considered a fifth dimension, but its relation to frequencyresolution is usually established first and held constant.) Ideally, onelocation point should be taken for every seat. This would be a bitimpractical, however, so we need to know how few points will give us theinformation we need. Ten? Probably not. One hundred? Possibly, but how canwe be sure? And this number is still a bit overwhelming.
What should we do?Maybe we should consider another set of dimensions. If we held frequencyand one directional axis constant, we could obtain a continuous plot oflevel vs. location along the other axis quite easily. A series of suchplots crisscrossing the audience area would provide a great deal ofinformation on coverage uniformity, especially if repeated in severalimportant frequency bands. In fact, this can be done conveniently byrecording traverses with a broadband signal source, then filtering therecording as desired for analysis. It is still unclear just how many suchplots would really be necessary, but a grid of, say, five rows by 10 seatsmight be both practical and adequately informative. Such plots would, ofcourse, show level uniformity directly, and response indirectly throughcomparison of the plots of different frequencies at the same location.
Even when response uniformity is measured, which is rarely, it is difficultto say what the measurements mean because no real research has ever beendone in this area. Ironically, this may be one area where the measurementsare significantly more sensitive than our ears because relatively largemeasured variations seem hardly audible. Again, more research is badlyneeded.
No sign of intelligencePerhaps we have overlooked a major characteristic: intelligibility. Thisis, after all, the whole point of a speech system, and it can be measuredby a well-established technique, with results that can be correlatedclosely with subjective evaluations. This applies to speech only, ofcourse; is there such a thing as music intelligibility?
Intelligibility is usually measured by the RASTI technique, which involvessignals in only two octave bands. Thus, the system needs to carry onlythose two bands to achieve a high RASTI score. Such a system would soundstrange indeed and would have poor real-world intelligibility. For thesubjective and measured results to agree, the system must have a reasonablyflat and extended frequency response. But is this always the case?
The RASTI test tells us nothing about such matters, nor about many otherswhich are certainly important. These must be verified by other means beforea RASTI test can be considered meaningful. Furthermore, if the RASTIresults indicate poor intelligibility, they provide precious few clues asto the cause. Again, we must use other tests to diagnose the cause of thepoor performance. Perhaps if these other tests were thorough enough, and ifwe optimized the system performance according to what they tell us, goodRASTI test results would be assured and, therefore, superfluous.
A RASTI test is much like a stopwatch at an auto race. The stopwatch canverify and document the most important performance characteristic of a car,but is of little value in determining what may be wrong. For thisinformation, extensive tests are routinely performed on contemporary racecars, both before and during a race. The results of these tests have leddirectly to dramatic performance improvements in recent years. We mightfind a lesson here.
What have we learned?Unfortunately, it seems that every subjective characteristic of a soundsystem is difficult to measure directly, and that objective measurementscannot be related precisely to subjective impressions. This is not what wewanted to hear. It means that sound system quality really is mostly amatter of opinion and cannot be measured.
However, although we have found no way to evaluate accurately thesubjective quality of a particular system, we have measurement techniquesthat clearly indicate the direction toward better quality. Betteruniformity in loudness and frequency response, for example, are alwaysbetter than worse, even if we cannot say exactly how much better, or whatdegree of difference is significant. Thus, we can confidently compare theperformance of one system with another in several ways. This is certainlybetter than nothing. It means, for one thing, we really can prove that oursystem is better than another, or find out that it isn't. We can find outwhich characteristics of our system are good and which are not, and perhapsfigure out corrective measures from this information. We may even find outhow to produce a system that sounds good to everyone.
Not a bad idea! How do we carry it out?
* There is no practical way to measure directly the loudness of a sound system.
* Loudness can be estimated indirectly from simple measurements, but theaccuracy of such estimates is uncertain.
* Loudness uniformity variations may be less difficult to measure, but thesignificance of such measurements is largely unknown.
* The correct frequency response for a given sound system is uncertain.
* The relationship between frequency response and timbre is also unclear.
* Accurate measurement of the frequency response of a sound system is moredifficult than commonly believed.
* Response variations over the listening area cause more performanceshortcomings than expected.
* Response variations can be measured fairly easily, but the significanceof such measurements is quite uncertain.
Response variations are rarely measured.
There are other system characteristics that might be significant, such asdistortion and noise. However, these are nonlinear effects; that is, theyconsist of new signals added to the original ones, rather than justmodifications of the original signals. This means that they simply shouldbe minimized to the point where they are unobtrusive. Fairlywell-established measurements exist for such conditions, and simplelistening tests are reasonably reliable. All this makes these flaws ofsecondary concern.