Digital Video Compression Basics
Jun 1, 2006 12:00 PM, By Jeff Sauer
What you need to know about bit rates, I-frames, and more.
More MPEG Confusion
Compression programs like Sorenson Squeeze 4.0 output a number of different digital video formats.
By now most AV contractors are well acquainted with the trend toward moving command and control information over IP networks. Doing so leverages existing Cat-5 cables, removes cable-run limitations, and enables sending and receiving information from anywhere to anywhere, all with greater administrative oversight. Some AV pros are also familiar with the increasing facility to send digital media files over IP, and for many of the same reasons.
However, the amount of data in digital media, particularly in digital video, far exceeds that of command and control data. And that complicates matters in a number of ways. The plethora of different file formats does not help, nor does the associated hodgepodge of potentially confusing acronyms that can make understanding digital video feel like some kind of cruel initiation.
Fortunately, it's not really as bad as JPEG, MPEG-1, -2, -4, -7, VC-1, and On2 make it sound. Here, I'll try to knock down some of the barriers that make terms like “macroblocks” sound mysterious. If you're already working with digital video, the following may seem rather basic. But hopefully that's the point. Once you understand the jargon and a little about the technology, digital video really is just another manifestation of the information that AV contractors have been moving for years.
THE BIG PICTURE
Regardless of what flavor of “digital video” we're talking about, one thing is almost always true: digital video means a lot of data, at least relatively speaking. Sending it from one place to another — whether via streaming over the Web, videoconferencing, or by sending high-quality distance-learning video across a campus — is among the most demanding tasks to ask of a digital infrastructure. Historically, those huge files have meant compromises in terms of image quality and have led to reservations about putting that amount of data on a network.
Thankfully, both digital video and IP professionals have gotten smarter over the years. IP networks are a lot more robust than they were just a few years ago, thanks to smart switching that can isolate bandwidth usage to specific nodes (as well as the simple availability of higher bandwidth). Equally important, video compression is a lot more sophisticated now, enabling excellent image quality from a lot fewer bits of data. Ultimately, understanding digital video largely comes down to basic plumbing: the size of the pipes (bandwidth), and the amount of flow (file sizes). If the flow is too much for the pipes, you'll have problems.
There are a number of different ways to reduce the amount of data in a digital video stream or file, and the first will be obvious to AV contractors. In North America, a full-resolution, standard-definition digital video frame on a DVD, on a DV camcorder tape, or on digital television is 720×480. (720×483 or 720×486 are minor variations used in the broadcast world. Also see the “Square Pixels” sidebar on this page to understand about 640×480.) However, video streamed over the Web is often limited to quarter-screen resolution (352×240 or 320×240) or smaller, thereby immediately reducing the data to one-fourth the original amount. Limit the resolution, and decrease the data rate. Pretty simple.
Full-motion video in this country contains 30 frames per seconds (actually 29.97) or 60 interlaced fields per second. Reducing the number of frames per second obviously has a direct result on the video data rate (or bit rate). The caveat is that lower frame rates can cause video to appear jerky. Color “sub-sampling” is also common; it means that while each pixel keeps its independent luminance value, colors' values are shared across neighboring pixels, thus saving data. Color sub-sampling is generally acceptable for distributing video because our eyes are much more sensitive to luminance variation than to color differences.
MiniDV is the most common tape format for the DV standard.
DOWN TO DETAILS
Image compression is at the core of digital video, and it's what causes most of the associated confusion. There are several techniques for compressing images (including vector quantization, wavelet transforms, fractals, and Discrete Cosine Transform) that are fascinating to understand. More basically, though, video compression methods can be separated into intraframe and interframe techniques. And there are reasons for employing both.
Intraframe compression is done strictly within the confines of a single video frame, while interframe compression leverages the temporal redundancies that typically occur between successive video frames. Simplistically speaking, with the exception of security installations, AV contractors are much more likely to deal with interframe video compression formats because those formats offer considerably more efficiency and are, therefore, far more appropriate for digital distribution. On the other hand, intraframe compression is overwhelmingly used during content creation, including within digital camcorders and nonlinear video editing systems. Since each frame is self-contained, intraframe compression affords much greater facility to stop, find, cut, edit, and add graphics and effects to individual frames during the creation process. It also avoids generation loss from multiple recompressions that can happen during the process.
Let's take a look at some of the more popular digital video formats.
JPEG is by far the most commonly used format for still photography; indeed, the JPEG acronym stands for the Joint Photography Experts Group. Nonetheless, JPEG is common in the pro AV world for security applications. That's because it can be critical to maintain individual frames for detailed scrutiny. JPEG has also been adapted for video as Motion-JPEG. It was by far the most common compression format for nonlinear editing systems in the 1990s, although today most editing systems work with DV or uncompressed video. M-JPEG is essentially intraframe JPEG compression that shares quantization look-up tables between frames. JPEG 2000, an update to the 15-year-old Motion-JPEG format, is finding significant interest as a base compression for HD video frames.
More important for video, JPEG compression techniques are also found in other formats like MPEG, DV, and VC-1. Specifically, JPEG attempts to express image information as equations, but first breaks a picture into 8×8 pixel blocks to truncate any potential compression errors from those equations. That's why you'll sometimes see “blockiness” in a highly compressed image, video file, or even a television program.
DV is the digital video standard for consumer and lower-end professional camcorders. It's an intraframe variation of Motion-JPEG. Though it's not an interframe technique, DV does use interfield compression to exploit the temporal redundancies between the two fields that make up a single interlaced video frame.
Somewhat confusingly, DV is a video compression format but is associated with tape formats bearing similar names. The DV video compression is consistent, but the tapes are not. MiniDV is the most common tape format — it's used in consumer camcorders and a number of lower-end professional camcorders. In an effort to maintain a distinction (and higher sales margins) between their consumer and professional products, Sony and Panasonic offer “professional” tape formats — DVCAM and DVCPRO, respectively. The video compression doesn't change, but the larger tape formats do afford greater redundancy for error protection, as well as additional audio support. More confusingly, each manufacturer also has higher-bit-rate versions of DV, including Panasonic's DVCPRO50, DVPCRO100 (DVCPRO HD), and Sony's HDCAM. Suffice it to say, these formats are more common amid the production and postproduction industries and less common to the pro AV market.
MPEG (Moving Pictures Experts Group) is by far the most common compression format for digital video, although there are a number of flavors to cause confusion. In generally, MPEG is similar to JPEG, except it uses 16×16 macroblocks of pixels and it adds significant temporal compression. Early video compression, like Indeo and Cinepak, leveraged temporal compression with D (delta) frames that saved only the differences from a preceding I-frame (intraframe). MPEG uses “B” (bi-directional predictive) frames/pictures that can share data with either preceding or successive I-frames, as well as P-frames (predictive) between full I-frames that act much like those earlier D-frames. A typical “Group of Pictures,” or GOP, might be I-B-B-P-B-B-P-B-B-P-B-B-P-B-B-I, with the Bs looking backward and forward at both image changes and to predict positional changes of macroblock objects. Confusing, perhaps, but those core techniques of motion estimation, macroblock frame division, and bi-directional temporal compression are frequently used terms that should at least be recognizable to you. They're also the basis for many other digital video compression formats.
The original standard, now known as MPEG-1, was developed in the early 1990s with the goal of putting compressed video and compressed audio on a 1X speed CD — i.e., keeping the data rate down to 150KB per second. (See the “More MPEG Confusion” sidebar.) MPEG-2 uses similar techniques as MPEG-1, but targets much higher-bandwidth applications like digital television and DVDs. Because of those different targets, MPEG-2 was never intended to replace MPEG-1, and indeed, it has not.
MPEG-1 is most efficient between 1-2.5Mbps and is usually encoded at quarter-screen resolution (352×240) and at 30fps. It's generally zoomed up to fullscreen when decoded to a video monitor. Image quality was designed to be “better than VHS tape,” and MPEG-1 can still look good today if there isn't much motion or complexity in the footage.
MPEG-2 is the video standard for DVDs and digital television. It is generally encoded at full resolution (720×480) and at 60 fields per second. MPEG-2 is most efficient between 3-9Mbps, although it can be used at higher HD resolutions and at higher bit rates with different “profiles.”
MPEG-4 was developed to target the extremely low bit rates of Internet video. For that reason it's effectively replacing MPEG-1. MPEG-4 differs from MPEG-1 and MPEG-2 in that it supports any number of different compression methods, and potentially even different video, audio, still image, 3D, and metadata “tracks” within a single MPEG-4 file. It is time-based rather than field- or frame-based, and therefore affords great flexibility for file-size reduction. However, most of that description is merely theoretical; in practice, MPEG-4 generally refers to a specific low-data-rate compression method that is similar to that of MPEG-1 and MPEG-2.
MPEG-4 AVC (Advanced Video Coding) is an example of that aforementioned theoretical capability to support different compression methods. MPEG-4 AVC uses the MPEG-4 file structure, but applies a very different compression technique that was developed jointly by the MPEG committee and the videoconferencing industry. It's generally considered a major step forward in compression efficiency. Indeed, given the advanced encoding and resolution independence, it is possible that MPEG-4 AVC will ultimately replace MPEG-2 as the primary format for next-generation high-definition DVDs, digital television, and digital cinema.
VC-1 is the SMPTE designation for a standard based on Microsoft's Windows Media 9. Interestingly, VC-1 was originally submitted to the MPEG committee as the basis for MPEG-4, and parts of it remain in the standard-definition ISO MPEG-4 today. However, Microsoft continued to developed VC-1 after the submission, and VC-1 is therefore generally considered somewhat more efficient than MPEG-4 (although not necessarily more efficient than MPEG-4 AVC). VC-1 is also an accepted video standard for both future high-definition DVDs formats (Blu-ray and HD-DVD). Its proponents cede nothing to AVC in terms of extensibility.
HDV is a high-definition version of DV that uses the temporal (interframe) compression of long-GOP MPEG-2 to achieve HD-reslution video at the same bit rate as standard-definition DV, thereby allowing consumer camcorders to record high-definition footage.
H.261 and H.263 are the video compression standards for legacy videoconferencing systems and are still in use today. However, most newer videoconferencing systems are migrating to MPEG-4 AVC (also known as H.264) due to its much higher coding efficiency.
At 100Mbps, DVCPRO HD is Panasonic’s intraframe-compressed high-definition video format.
Streaming video is a specific term that refers to video that is played by a server and viewed on another system without ever being saved to a hard drive on that second system. Generally, streaming is understood to mean video streamed over the Internet, although distance-learning equipment can also stream high-quality, high-bandwidth MPEG-2 over Local and Wide Area Networks with sufficient bandwidth. Streaming is different from downloaded video, which is saved to a local hard drive prior to playback.
There are four major file formats for Internet video: QuickTime, Real Video, Windows Media, and now Flash Video. Each of those file formats is ultimately agnostic to any specific compression, although most do have a “default” compression method. For QuickTime and Real Video, it's MPEG-4. For Windows Media it is the Windows Media codec, although MPEG-4 is also an option. Flash Video now has two default codecs: Sorenson Spark and On2.
One of the great disparities, as well as oddities, in the history of digital video comes from the fact that the broadcast video industry and the computer industry both started to work with digitizing video independently and each started with a different understanding of what a pixel was. Specifically, the two industries had a different take on pixel shape. AV professionals, whose pixel frame of reference comes mostly from the computer industry, are familiar with square pixels and 4:3 resolutions like 640×480, 1024×768, 1600×1200, etc. However, early on the video industry defined pixels as having a rectangular shape, taller than they are wide. The result is that it takes more pixels horizontally to make the same 4:3 aspect ratios (hence broadcast's 720×480 vs. the computer industry's 640×480).
Because the integration of the two industries stemmed mostly from computers replacing traditional video production and postproduction equipment, the computers had to play by the video industry's rule in order to be accepted. That leads to awkwardness in many cases, such as with computer graphics created for video titles, logos, and animation. For example, a graphic of a round globe, basketball, or coin is created on a computer screen, then edited into a video. The image of the globe will be squeezed so that it appears oval. The pros that use computer-based video applications are well equipped to adapt, but we still sometimes see stretched video images when they're played back on computer screens, including from streaming media on the Web.
More MPEG Confusion
The MPEG committee didn't skip MPEG-3 altogether. It was actually proposed as the future standard for digital cinema. However, it was determined that simply creating new use “profiles” for MPEG-2 that supported higher bit rates and higher resolutions would achieve the same thing, and MPEG-3 faded away.
Many people see MP3 and think it must be the elusive missing “3” in the MPEG chain. It is not. MP3 stands for MPEG Layer 3 Audio. It's actually a remaining standard from the original work with MPEG-1.
Two more MPEG standards, the curiously named MPEG-7 and MPEG-21, are also pending today, although neither directly focuses on video or audio compression, but rather on the metadata that accompanies video and audio. It encompasses standard ways to describe digital media data for archiving and retrieval purposes.