Audio teleconferencing
Jul 20, 1996 12:00 PM,
Mike Sims
Although videoconferencing is making headway, audio teleconferencing continues to deliver on the promise of long-distance communications for education and business.
When my mother used to ask my brother questions about baseball, Bob would invariably begin his answer saying, “Mom, nine men are on a baseball team.” In a similar vein, this discussion of teleconferencing hybrids must start with a short explanation of the two-wire plain old telephone service.
The standard analog telephone connection between a residential or commercial site and the local switching office is a two-conductor pair. Because both sides of the conversation travel over the two-conductor pair, some means must be provided to separate the transmit and receive signals. One two-wire to four-wire conversion is done on the switching-office side to facilitate sending to and receiving from distant switching offices. The other conversion happens at the customer’s handset. The separation of the transmit signal from the receive signal is not accomplished perfectly. In particular, when the customer speaks into his handset, he hears himself in the earpiece. This is known as sidetone and gives a handset the familiar live sound. This is a classic case of “If you can’t fix it, feature it.”
Signal leakage reduction* The analog approach: Not more than a few years ago, if you wanted to connect a sound system to an analog phone line, you would use an analog hybrid. An analog hybrid uses passive techniques to minimize the amount of the local transmit signal that leaks through to the local receive signal. Because the impedance of the telephone line is complex and can change during a conversation, an analog hybrid can only achieve about 10 dB to 15 dB reduction in the transmit signal leakage to the receive output. This is an obvious problem in a teleconferencing application because this leakage is clearly audible from the local loudspeakers. Aside from being annoying, transmit leakage can cause reduced intelligibility or feedback in the local teleconferencing sound system.* The digital approach: With the advent of modern digital signal processors, a much more effective approach to transmit leakage reduction became feasible. Using a digital adaptive filter, the digital hybrid can realize a much better approximation to the telephone line impedance and can also track changes in the line impedance over time. Fairly typical for a digital hybrid is 30 dB to 40 dB of transmit leakage reduction. As an added bonus, once the signal is in the digital domain, other desirable signal processing can be performed by the digital signal processor.
Echo suppressionThere is perhaps no better known catch phrase in teleconferencing than full-duplex operation. Full duplex literally means that the transmit and receive paths are always fully open all the time. But because all digital hybrids use some form of echo suppression, none can have truly full-duplex operation. In practical terms, full-duplex means the ability for participants who are physically remote from one another to hold conversations. Whether this goal is fulfilled has to do both with the performance of the digital hybrid and with other aspects of the system implementation.
Echo suppression refers to manipulating the transmit and receive signal path gain to achieve even lower transmit-to-receive leakage. In addition, echo suppression can minimize the effects of low-level echoes caused by line reflections in the phone system.
However, the use of echo suppression is a double-edged sword. Although using more echo suppression yields greater transmit-to-receive leakage reduction, using less gives more nearly duplex operation. How echo suppression is implemented has a large effect on the real-world performance of a hybrid.
One common approach to echo suppression is to monitor the relative signal levels of the transmit and receive paths and switch a fixed amount of attenuation, on the order of 6 dB, into the path with the lower signal level. The transmit-to-receive leakage is improved by the amount of the attenuation. This relatively modest amount of attenuation is all that is possible without seriously compromising near-duplex quality because more than about 6 dB of switched attenuation begins to produce a speakerphone effect.
Another take on path gain manipulation uses an attenuation level proportional to the relative transmit and receive levels instead of being fixed. Because no abrupt changes in attenuation are made, more echo suppression can be applied than in the fixed attenuation case without affecting the perceived duplex nature of the conversation. Level-proportional echo suppression means greater transmit-to-receive leakage reduction than the fixed-gain approach without causing the speakerphone effect.
Center clipping is a technique that applies either downward expansion or complete attenuation to receive path signals below a certain level. Although center clipping can be effective in suppressing low-level leakage and line echoes, it can introduce significant distortion to lower-level signals near the clipping threshold.
Minimizing local loudspeaker-to-microphone couplingAnother significant issue in teleconferencing applications has to do with the echo generated when microphones on one side of a teleconference retransmit far-end audio (picked up from local system loudspeakers) back to the originating talker at the far end. This echo is annoying to the far-end party, and if both ends of the teleconference generate echo problems, audio feedback can also occur. Two common approaches for minimizing loudspeaker-to-microphone coupling are acoustic echo canceling and automatic microphone mixers.
Acoustic echo canceling involves the use of a digital adaptive filter to model the acoustic coupling between system loudspeakers and system microphones (acoustic signature) in the room. The audio signal feeding the local loudspeaker is filtered using the acoustic signature stored in the adaptive filter model. This filtered signal is then subtracted from the microphone signal before the signal is sent to the far end. Acoustic echo cancellation can reduce loudspeaker-to-microphone coupling by 20 dB to 25 dB. Acoustic echo canceling is particularly appropriate when only one or two local microphones are used and are always open.
Pay attention to the limitations of acoustic echo canceling to derive the greatest benefit from its use. The ability of the adaptive filter to track changes in the acoustic signature of the room has some limitations. First, the adaptive filter takes a few seconds to adapt to changes in the acoustic signature of the room. Second, adaptation can only occur when the far-end talker is talking but the local talkers are not.
As a result, if changes to the room’s acoustic signature occur — the microphone moves or even a local talkers’ body moves with respect to the local microphones — during a conversation, the far-end party will hear the artifacts of inadequate cancellation until the digital filter can re-adapt. This is a thorny problem because no audible effects will be noted at the local end. Clearly, it’s important to minimize the possibility of physical changes in the room during the teleconference.
Automatic microphone mixers can also provide an effective means to minimize loudspeaker-to-microphone coupling if the automatic mixer has some means to allow the received audio signal from the hybrid to participate in the automatic mixer’s algorithm. This means that far-end audio would turn down local microphones when the far-end talker is talking. As a result, loudspeaker-to-microphone coupling is reduced by the amount that the automatic mixer attenuates off channels. Automatic mixers have the additional advantage of providing a cleaner audio feed to the far end when multiple local microphones are used.
“But the system is no longer full-duplex,” you say. Again, the issue is not whether the system is full-duplex but whether a lively, satisfying conversation can be held. The performance of the automatic mixer will determine to a large degree whether this approach is successful.
Can acoustic echo canceling be used with an automatic mixer to give the advantages of both? The answer is a qualified yes. An automatic mixer changes the acoustic signature of the room as it turns microphones on and off. This situation would not be a problem for an acoustic echo canceler when local talkers are talking. However, it is potentially disastrous when it happens during speech from the far end because the acoustic echo canceler is struggling to re-adapt to the changing pattern of on and off microphones determined by the automatic mixer. If the automatic mixer can be set up to give a consistent pattern of on microphones when the far-end talker is active, re-adapting problems can be minimized. Also, if the off attenuation can be adjusted to a small value, such as 6 dB, the acoustic signature of the system won’t change as drastically as microphones turn on and off. The downside of less off attenuation is a lower-quality audio feed to the far end.
With some of the technical issues surrounding teleconferencing hybrids covered, let’s move to some typical applications.
Small rooms with portable systemsThe least expensive approach to teleconferencing uses portable conference phones to turn an office into a teleconferencing facility. Although most of these units have integral keypads for dialing and need a standard two-wire analog phone line, at least one (the Gentner ET100) will work in conjunction with an existing office phone. Microphones and a loudspeaker are integral to the units, and most offer extension microphones to allow larger groups of people to participate.
Both echo suppression and acoustic echo canceling are used to minimize the loudspeaker-to-microphone coupling. Typically, the length of the adaptive filter used for acoustic echo canceling will accommodate a room of up to 150 ft2 (13.9 m2) or so. Extra-long adaptive filter lengths are available on some units to facilitate operation in larger rooms. If the adaptive filter length is not adequate for the room size, the far-end participant might hear un-cancelled reverberation tails.
Portable systems have three basic performance issues: adequate loudspeaker volume, minimal talker distance to system microphones and quality audio sent to the far end. Because the loudspeaker and microphones are built into the same housing, the acoustic coupling between the loudspeaker and microphones is high. Even with echo suppression and acoustic echo canceling, the output volume of the system must be limited to moderate levels to avoid retransmitting far-end audio back as an echo.
If the basic acoustics of the room are not good, it might not be possible to achieve a volume level adequate for good intelligibility. As more people try to participate in a teleconference, talkers might be less able to be near system microphones. Although this problem can be mitigated to some extent by extension microphones, the increase in talker-to-microphone distance leads to poorer pickup of remote talkers. Finally, because the microphones in portable systems are always open, the audio signal sent to the far end often has the talking-in-a-barrel quality.
Within the constraints previously outlined, a portable system can give adequate performance at a low cost and is particularly suitable for impromptu teleconferences.
Larger rooms without local sound reinforcementMany applications require a dedicated room for teleconferencing. Most of the performance limitations in the portable system can be addressed in this venue. Microphones can be provided for all participants, minimizing the talker-to-microphone distance. An automatic mixer can be used to manage multiple microphones, which provides higher quality audio sent to the far end. Loudspeaker placement is more flexible, so the distance from the loudspeaker to the microphones can be increased, which allows the possibility of greater loudspeaker volume.
If the automatic mixer has the capability to allow the audio received from the far end to participate in the automatic algorithm, the off attenuation of the mixer may be used to reduce the loudspeaker-to-microphone coupling, reducing retransmitted echo.
Larger rooms with local sound reinforcementWhen the teleconferencing room is so large that local participants need some sound reinforcement to hear one another, an extra level of complexity is introduced into the facility design. Because rooms this large are generally designed to handle a large number of people and a large number of microphones, an automatic mixer is mandatory. Also, because of the possibility of feedback in the local sound-reinforcement system, some form of loudspeaker zoning might be necessary.
A helpful feature for the automatic mixer is that it has two internal buses. One bus is simply the sum of all the local microphones. The output from this bus is used as the audio send to the hybrid. The second bus is the sum of all the local microphones plus the audio output from the hybrid. The output of this bus feeds the local sound system. The use of two buses connected in this manner ensures that the audio output of the hybrid doesn’t find its way back to its own input. An alternative for mixers without two buses is to sum the output of the mixer, which mixes only the local microphones, with the audio output of the hybrid using a two-in/one-out line-level mixer. This signal is used to feed the local sound system; the output of the mixer feeds the audio input to the hybrid.
Loudspeaker zoning is often necessary for achieving adequate sound-reinforcement levels in rooms with normal-height ceilings — 8 or 9 foot (2.4 m or 2.7 m) — and ceiling-mounted distributed loudspeakers. Because of the close proximity of microphones to loudspeakers, reaching reasonable levels of sound reinforcement may be difficult or impossible, even with an automatic mixer. Systems that turn loudspeakers on and off in response to activity on a nearby microphone will increase gain before feedback, but they are inherently half-duplex systems. When a talker speaks into his microphone, nearby loudspeakers are turned off, so the talker will be unable to hear either far-end audio or local reinforced audio.
A better solution is to use a matrix mixer. A matrix mixer allows each system loudspeaker or small cluster of loudspeakers to have its own mix of system microphones. As a result, microphones far away from a given loudspeaker are part of the mix for that loudspeaker, but closer microphones are excluded. The audio output of the hybrid is included in the mix for all loudspeakers. Because no loudspeaker switching takes place, the matrix mixer preserves the near full-duplex operation of the teleconferencing system. This approach assumes that the automatic mixer has direct (individual) outputs from each channel and that the direct outputs are post-attenuator, which preserves the automatic action of the mixer. Figure 1 on page 20 shows a diagram of a matrix mixer solution.
Conference bridgingFor some teleconferences, it might be necessary to interconnect more than two sites. In this case, some form of conference bridging will be necessary. Bridging allows multiple sites to be connected, with full interaction among all sites. One option for bridging is using a service bureau to set up and run the conference. These service bureaus will contact all the sites and ensure proper interconnection. They can also provide auxiliary services, such as fax on demand, recording and replay of the conference at a later time, and even translation for conferences where not all of the participants speak the same language.
A second option, which allows for more flexibility at the expense of higher up-front and operational costs, is for a site to buy a conference bridge. This option allows conferences to be set up at any time and gives complete control to the originator. On the low end, bridges are available that use standard analog phone lines and have fairly rudimentary control features. At the high end, bridges use a dedicated T1 line to connect many sites. These systems are computer-based and provide sophisticated scheduling and control.
However bridging is accomplished, the same system requirements apply for each local site in a bridged conference. All sites must have an adequate sound-system design to participate satisfactorily in a bridged conference.
Teleconferencing for everyoneAudio teleconferencing provides low-cost, high-value communication between remote sites. Because virtually every place in the world has at least analog telephone service, voice-grade teleconferencing is universally available. Providing a successful teleconferencing installation means paying careful attention to the entire system design. Doing so will ensure a satisfied client, which is good business for all concerned.