Will Telepresence Save Videoconferencing?
Maybe not for a few years ? but important AV enhancements are driving today’s solutions toward a ?reality? experience.
Credit: courtesy of Destiny Conferencing LLC
With all the advances in AV processing and high-speed communications networks recently, much has been written about the coming age of telepresence. Many of these news stories conjure up images straight from science fiction movies — where effortless visual conversations take place in the future without a technology hiccup. These days are indeed coming, but they’re not here yet.
While there are dozens of definitions of “telepresence,” most AV integrators and their customers are likely to characterize the term as “using technology, particularly audio and video techniques, to give the appearance of an individual being present at a location other than the actual location of that individual.” The goal, of course, is to provide each user in a communications or conferencing session with the sensation that users at other sites are physically present in the same room.
Videoconferencing is moving toward providing a meeting experience that approaches a face-to-face experience. AV integrators should understand that this march toward realism is being driven by three technology factors: video imagery, audio processing, and physical configurations.
Realism through video enhancements
Perhaps the most exciting videoconferencing announcement in the past five years occurred in mid-2005 when LifeSize Communications, Austin, TX, made public the world’s first high-definition videoconferencing system (Photo 1). Other videoconferencing manufacturers, including: Polycom, Pleasanton, CA; Sony, Park Ridge, NJ; and Tandberg, Oslo, Norway, have since promised to follow suit with products by the end of this year —with Tandberg already demoing such systems in February. High-definition video provides more pixels in each image, thereby supporting images with finer detail and the ability to sit closer to the screen, which, in turn, enables images to be more life size without being grainy.
Sample costs for some of these systems include: LifeSize Room, $12,000; Sony PCS HG-90 codec, $25,000; Sony PCSA-CHG90 HD camera for videoconferencing, $11,000; HP Halo, $600,000 per room for installation and $18,000 per month per room for operation; Teliris, $8,000 to $12,000 per month; and Destiny Conferencing, $100,000 per room installation, plus $2,000 to $8,000 per month operation.
To emulate the effect of a live meeting, a videoconferencing system should be able to provide important image details. The standard CIF (also known as full CIF or FCIF) image used in today’s videoconferencing equipment provides 0.1 megapixels — something nobody would tolerate with a still image camera today. What’s the result? On most videoconferencing calls, users can’t really read the logo on the other person’s shirt or see the wrinkles when a remote colleague frowns. High definition, on the other hand, provides 9X the pixel count (see Table 1), making images crisper and details clearly visible.
Fig. 1. Video resolution vs. bandwidth.
Credit: courtesy of LifeSize Communications
For an electronic meeting to emulate an in-person meeting, people’s faces should be life size. Having a business discussion with a postage-sized image of the other person is a constant reminder that the meeting is taking place over an electronic medium. In the past few years, flat-panel displays have grown to exceed 50 inches. In fact, a 100-inch display has even been demonstrated at trade shows.
Given the normal resolution of the human eye (normal acuity of a person with 20-20 vision), the optimal viewing distance (D) is a function of the distance between screen pixels, which itself is a function of the size of the screen and the number of pixels in a line. This distance is shown in Table 2 for three different television sizes.
Hence, for a 50-inch monitor used in a conference room, the optimal distance with an HD display is approximately 9 feet; sitting closer than this will result in a viewer resolving the individual pixels, making the image seem grainy and unrealistic. For a standard CIF image on the same 50-inch monitor, the human eye should be about 33 feet away to achieve the same perceived smoothness in the image. Sitting closer than 33 feet isn’t optimal (but generally required). This is the magic of HD video. More pixels per image generate smoother images for any display size, enable finer detail to be conveyed, and allow viewers to sit closer to the screen without seeing the pixels.
Many customers today are already asking AV systems integrators to install large HD-ready flat-panel displays, thereby taking one of the needed steps for HD videoconferencing. Nevertheless, two HD videoconferencing obstacles remain. With 9X the pixel count in the video image, HD requires more processing power than many installed videoconferencing systems can provide. Although new HD systems were obviously designed with this processing power, most legacy systems won’t be up to the task. The second obstacle is bandwidth. Many videoconferencing systems today operate at 384 kb/s, a legacy from the old ISDN days. As users migrate from ISDN to IP videoconferencing, many are also moving to 512 and 768 kb/s. But HD videoconferencing requires even higher bandwidths, between 1 to 3 Mb/s, depending on different factors. Exactly how this bandwidth issue will impact the adoption rate of HD videoconferencing in the enterprise market remains to be seen.
Fig. 1 shows how one company intends to deliver HD at 1 Mb/s while providing quality between that of current CIF videoconferencing and true HD at calling speeds between 384 kb/s and 1 Mb/s. (At bandwidths higher than 1 Mb/s, resolution would stay the same, but frame rate would improve from approximately 15 f/s to 30 f/s. Frame rate is also data dependent.) Other vendors have talked about HD videoconferencing at 1 to 4 Mb/s while suggesting that 2 Mb/s is really the sweet spot.
Those familiar with how videoconferencing codecs work, however, will recognize that image resolution vs. bandwidth doesn’t tell the entire story. Image quality is somewhat data dependent — high-motion video content in a videoconferencing call, for example, often leads to image degradation. Motion estimation algorithms within the compression-decompression engine sometimes can’t keep up with the data rate; the result could be blocky pictures or even a video freeze. Even though the “resolution” wouldn’t change in a technical sense, the image quality would certainly decline. Videoconferencing quality is always a complex function of processing power, bandwidth, and data content.
Realism through audio enhancements
Research has confirmed that audio quality is the key determinant of the videoconferencing experience. Capturing voice or other audio signals for transmission in a digital format — as is the case for a videoconference — requires filtering, processing, and digitizing the signals, and then compressing the results. There are many ways to process audio signals, each representing a tradeoff between multiple performance and efficiency parameters. The codecs used for videoconferencing (and voice over IP networks) are optimized around three basic parameters:
Frequency response (or input bandwidth). High-bandwidth systems provide a high-fidelity signal. They also require input and output devices (microphones and loudspeakers) capable of handling the higher bandwidth signals. Videoconferencing systems once supported “tinny” narrowband 3 kHz audio. New systems support 7 kHz, 14 kHz, and even 22 kHz wideband audio. These expanded frequency ranges provide a richer sound (the equivalent of FM radio over AM radio); more importantly they deliver important information to listeners that makes speech more intelligible, thereby significantly reducing “meeting fatigue.”
Compression rate. Some audio codecs can compress an input signal by a factor of 10 or more. The higher the compression rate, the smaller the output bit stream, and generally, the lower the voice quality. But smaller bit streams require less network bandwidth, leaving more network bandwidth available for other purposes such as video, or for carrying multiple channels, etc.
Delay. Delay is a crucial determinant of videoconferencing user satisfaction. With a high-delay communications environment, two-way conversation becomes very awkward and unnatural. The delay causes people to trip over each other in conversation or to pause frequently to see if the other side is speaking. Hence most videoconferencing codecs today are designed with the goal of minimizing delay.
Some new videoconferencing devices now sport not only low-delay, wideband codecs, but also multi-channel sound. Having the person on the left of the image come from the speaker on the left, for example, is a natural enhancement to a virtual meeting environment, reducing “meeting fatigue” and contributing to meeting “realism.”
As any AV integrator can attest, for real-world videoconferencing installations, microphone and loudspeaker quality can make a huge difference in the end-user experience. Several videoconferencing systems vendors offer special audio subsystems with higher performance speakers and audio processing electronics to improve the signals going into and coming out of the codec. The bottom line in audio quality is dependent on much more than the codec being used and an expanded frequency range, particularly when real-time, two-way interactions are required.
Realism through physical configurations
A face-to-face physical meeting today has two interesting attributes that the videoconferencing industry is still struggling to emulate. One is natural eye contact; the other is a lack of technology distractions.
Fig. 2. Traditional videoconferencing configurations don’t provide true eye contact
Eye contact is one of the most important aspects of face-to-face communication. Most video meetings today suffer from the fact that while the user is looking at the screen image of the remote participant, he isn’t looking into the camera. Some studies have suggested that eye contact instills trust and fosters an environment of cooperation and partnership, while the lack of eye contact can generate feelings of negativity, discomfort, or distrust. Providing natural eye contact during a videoconference requires participants to look directly into the camera. Unfortunately, as shown in the diagram, traditional videoconferencing (Fig. 2 on page 38) often fails in this regard because participants naturally tend to look at the video image of the remote person, not at the camera.
Photo 1. LifeSize Room system including codec, camera, and phone (microphone).
Some companies are now building systems that use two-way mirrors and behind-the-screen cameras in order to provide eye contact. This technique, however, still isn’t a mainstream industry development, and today’s eye contact solutions are generally very expensive.
One vendor, DVE, uses beam-splitting mirrors to accomplish eye contact with otherwise “standard” videoconferencing systems (Photo 2). Another manufacturer, Teleportec, uses a similar technology to make far-end participants appear to float in thin air in front of the system. Whether this effect adds to meeting “realism” is still up for debate.
When people are in face-to-face meetings, they don’t have to concern themselves with speaking into the microphone or listening to a loudspeaker that may be beside or behind them. Making the AV technology “invisible” is one of the tricks of the trade for AV integrators. Some videoconferencing vendors have addressed this issue with designs that foster eye contact while making the voice seem like it’s coming from the other person’s lips, and hide local microphones in a well-placed position.
Some AV techniques contributing to realism are accomplished by optimizing an entire room design for videoconferencing. For example, Teliris’ Global Table installations typically include wall-mounted flat-screen displays, strategically positioned cameras (to facilitate eye-contact), and integrated table microphones. Similarly, Telanetix offers turnkey, fully integrated conferencing environments in which all AV components are discretely placed to avoid distracting the meeting participants.
Another telepresence vendor, Destiny Conferencing, has attacked this problem with its TeleSuite solution by effectively splitting the meeting room table into two parts: the local half and the remote, identical “virtual” half. The flat-panel displays showing the remote attendees are situated in the space normally occupied by participants across the table.
The TeleSuite System design, with screens displaying life-size images, hidden cameras and microphones, and carefully planned lighting, gives meeting participants the distinct feeling that they’re sitting across the table from the remote attendees. This is the essence of the telepresence experience.
HP also entered the videoconferencing market at the end of 2005 with a totally managed videoconferencing system that provides an immersive environment. Halo is based on multiple high-bandwidth MPEG video codecs supporting multiple cameras in a single conference room. The use of MPEG compression provides very low delay and high image quality, although the current system requires very high bandwidth, can only communicate with other Halo systems, and doesn’t support multipoint. Data collaboration is built-in, although the data collaboration screen is located such that eye contact is defeated.
Advances in video and audio technologies used in videoconferencing now give AV integrators the real tools they need to design room conferencing systems that begin to deliver the telepresence experience. The raw systems themselves, with high-definition video and spatial audio, will impress many corporate managers and videoconferencing professionals. But when designed into environments with life-size images, integrated with high-quality lighting and sound, and optimized for eye contact, these new products will also take multimedia communications to the next level.
Photo 2. DVE Solution solves the true eye contact dilemma.
Existing systems based on the total room approach, such as those offered by HP, Teliris, Destiny Conferencing, and others, tend to be used in niche applications where the videoconferencing system is used 4 to 8 hours per day. These rooms, with their optimized lighting and sound, provide an environment that significantly reduces “meeting fatigue.” While some might argue that the high utilization results from the high-quality experience provided, it’s also true that high anticipated utilization is needed in order to justify the high cost of installation and operation.
True HD videoconferencing is in its infancy, with shipments beginning only in December 2005 and with fewer than 100 systems in operation worldwide at the end of 2005. (This compares to 135,000 non-HD room videoconferencing systems shipped in 2005 and an installed base of approximately 500,000 worldwide.) Early users fall into several categories: 1) New customers who plan to deploy flat-screen panels and already have the bandwidth to support HD; 2) Executives who demand the very best video (and audio) possible; and 3) Specialized applications where the highest resolution video is required — this spans a wide variety of applications ranging from doctor-to-patient telemedicine to manufacturing and retail operations where video is used to diagnose faults via a remote expert or to inspect goods from a distant supplier.
As HD television becomes mainstream in the consumer market, HD videoconferencing is sure to follow. For AV integrators and multimedia room designers incorporating videoconferencing into their available solutions, the new breed of videoconferencing systems, with support for enhanced audio and high-definition video, finally gives integrators the tools they need to show off their technical skills.
Andrew W. Davis is the managing partner with Wainhouse Research, Duxbury, MA. He can be reached at [email protected]