Audio and MP3
Dec 1, 1999 12:00 PM, Robert J. DiCamillo
MPEG (Moving Picture Experts Group) audio and video have been with us nowfor more than 10 years. The MPEG group was formed in the late 1980s tocreate standards for the compression of digital audio and video signals. In1992, MPEG became a standard as agreed upon by the International StandardsOrganization (ISO) and the International Electrotechnical Commission (IEC).MPEG 2 became a standard in 1994 and added the ability to encode content atlower bit rates (16 kbps, 22.05 kbps, 24 kbps) and to encode a signalaccording to psychoacoustic models. The psychological models exploitmasking and threshold effects in human hearing to decrease the amount ofencoded data below the audibility threshold. For video encoding, MPEG maydecrease the number of bits per frame for less complex pictures because thehuman eye will not notice the loss of quality. There are three layers tothe MPEG specification that apply to audio (only) encoding - layer one,layer two, and the most recent, layer three.
Interestingly, the most prolific users of MP3 are musicians and Web surferswho have all but created an online culture around the exchange of MP3(music only) files, so most of the information gleaned about thiscutting-edge technology has been through its application on the Internet.Although it may be difficult in the infancy of a new trend to determine itsultimate effects on the A-V world, what is clear is that MP3 is here tostay, and it is evolving. The format of this transmission, compression anddecoding scheme will no doubt undergo many changes over time to suit theneeds of its users and exploit what will be an ever-changing computerhardware marketplace.
So what does this technology mean to the installation and contractingprofessional? MPEG offers a unique way of transmitting and receiving audioand video data over the Internet. In the future, manufacturers will be ableto design devices that can be updated with such new program material asmusic and messages as well as program material that is developed, modifiedand mixed via computer. An advantage to this technology is its media - itdoes not wear out like tape or other magnetic media, nor does it requiremore permanent media to be produced, such as CD-ROM or DVD. It can becontinuously updated via an Internet connection, thereby avoiding costlyon-site service. Venues where extremely high-fidelity program material isnot paramount, such as background music applications or video in suchenvironments as restaurants, bars and retail applications, are likely tobenefit the most at a point in time sooner than other venues as thetechnology develops further.
Although MPEG 2 delivers audio and picture quality equivalent to TV studiostandards, it is not perfect. MPEG is not a loss-free encoding orcompression scheme, and its application results in a loss in signalquality. A way to understand compression loss lies in understanding thatyou do not get 100% signal quality after compression, but by setting theencoding parameters in advance of the compression so that you can ensurethat the material you encode and subsequently transmit, download or readfrom a CD is of a high enough quality for your target audience. Anotherpoint to remember is that the encoding setups will differ with differentapplications, using different schemes for broadcast, downloading over theInternet, or producing a CD.
Defining the technology
A single piece of hardware (or software) that can do both encoding anddecoding is sometimes referred to as a codec (encoder/decoder). In audio,the current MPEG specifications are broken up into layers, termed 1, 2, and3. The popular abbreviation, MP3, refers to audio layer three encoding. Noone seems to know why 128 kbps MP3 became the choice for downloading filesfrom the Internet instead of 128 kbps MP2. In all likelihood, it happenedthis way because MP3 is a more recent development than MP2, and althoughMP3 is a higher revision version than MP2, people sometimes assume MP3 issuperior. It is a fact that MP3's predecessor was audio layer two or MP2encoding, and many people believe it to be superior to MP3 at bit rates of128 kbps or higher.
Higher fidelity encoding, however, requires more resources, and this meansmore bandwidth and an increased demand for data storage space. Higherlayers increase the amount of audio data compression and the complexity ofencoding the audio signal. It is less mathematically intensive (andtherefore takes less time) to encode a signal on layer one as audio than itis to encode the same signal on layer two as audio. The layers arehierarchical, so a layer three decoder should be able to decode layer twoaudio. Layer three is built on the features of layer two, adding a modifiedFFT (Fast Fourier Transform) and a modified discrete cosine transform tothe encoding process. Encoding for layer three is more computationallyintensive then layer two or layer one. The more complex encoding schemesand algorithms of the higher layers can improve audio quality, despitehaving greater compression. Even with increased compression and a lower bitrate, layer three audio encoding offers equal !or greater quality then layer two audio encoding. For the overall effectiveness of MPEG audio's differentlayers, refer to table 1.
Economy of scale: time vs. audio quality
Increased computation time on the encoding side is a small price to pay forthe quality and compression that even MP3 affords. Thus, MP3 encoding isstarting to be applied at even the professional audio level. For example, 4minutes of audio from a standard audio CD requires about 40 MB of disk orserver space. The equivalent MP3 or MP2 file encoded at a 128 kbps constantbit rate takes up about 4 MB of space, a tenth of the space (a 10:1compression ratio).
Some audiophiles describe the quality of MP3 audio at 128 kbps as not beingeven remotely close to CD. Most people, however, hear 128 kbps constant bitrate MP3 audio as comparable to a Dolby B or Dolby C cassette recording ofa state-of-the art CD; there is a reduction in the dynamic range and someloss of highs and imaging, but content will remain a far cry fromunlistenable. Different codecs can provide varying levels of audio quality,and more importantly, such encoding parameters as the encoding model oralgorithm, where to cutoff low frequencies, and the choice of stereo modescan affect the sound quality any MP3 encoder will produce. Decoders canvary in quality in similar ways.
Tradeoffs: bit rate vs. bandwidth
An important consideration in encoding audio is the relationship betweenaudio quality and bit rate (or bandwidth) and how much space the datarequires on disk or in memory. If you encode at lower bit rates, audioquality can suffer, but lower bit rates are better suited to slower speednetwork and transmission lines. Similarly, files encoded at lower bit ratesalso take up less size in memory or to data storage. If you are willing todouble your bandwidth from 128 kbps to 256 kbps, then constant bit rate MP2or MP3 audio is fairly close and perhaps indistinguishable from CD quality.The 4 minute selection example mentioned earlier now requires about 8 MB ofdisk space when encoded in 256 kbps constant bit rate MP2 audio, or you geta 5:1 compression ratio.
Further, doubling the bandwidth from 128 kbps to 256 kbps to increase theaudio quality halves the compression ratio from 10:1 to 5:1 and doubles thestorage to contain the entire file all at once on disk or in memory.Broadcasters will also need to rent or buy faster network connections totransmit audio at higher bit rates.
Downloading audio means receiving audio data from a server over thenetwork. Downloads of files usually require the entire file to be copied todisk before anything can subsequently be done with the file (like playingit). Therefore, downloading files of MP3 music means the end user waits forthe entire file to be copied over the network to his local disk beforeplaying it. If you are transmitting data to a client with a 56 kbps modem,the 4 minute, 4 MB MP3 file will take about 10 minutes to download,assuming the network connection between your computer and the server doesnot encounter severe degradation or bottlenecks. The 8 MB MP2 file wouldtake twice as long, or about 20 minutes to download, but compare this tothe amount of time it would take to download the original 40 MB CD audiofile - 100 minutes.
Downloading music files in their entirety is an expensive operation interms of time, and as such, it is not great for broadcasting or real-timeapplications. If the download gets interrupted by a network or transmissionline failure, you usually have to start again, unless some rather smartnetwork protocols are employed. However time consuming and tedious, anadvantage of downloading is that you only have to do this operation once,and then you have your own private copy of the music or other data on yourlocal disk. If there are no copy protections on this file, you canduplicate it as many times as you like. With such programs as WinAmp for PCusers and MacAmp for Apple users, many people on the Web are beginning tocollect MP3 files on their computers and trade them with others. TheseJukeBox programs create playlists, and you can organize your music filesallowing one to program their playback in any order or fashion. If a songin the form of an MP3 file gets popular, it! can be copied and transmitted over the network among fans hundreds of times in a day or two. Musicians, ofcourse, love this. Record labels, however, typically loathe this practice.
Streaming audio is the ability to start playing audio before it has beendownloaded into your system from the Web as a complete file. This isnecessary because of the time needed for a complete download, and it allowsthe listener to have access to the material much more quickly in theprocess. By buffering and assembling the bits as they are received, an MP3decoder can start to play audio almost right away. The stream is played inreal-time, and a copy of the entire file need not be assembled and saved toyour local disk.
For example, a player with a buffering scheme that stores up to 30 secondsof music might start to play music after it has downloaded only the first 5seconds of music from the Internet. The 5 second or so time lag betweenreceiving and playing audio is a small price to pay for the improvedreal-time performance. Also, decreasing the playback bit rate to somethingless then 56 kbps (like 28 kbps) ensures that there is a steady stream ofmusic; the player will not run out of music to play before enough new musicis downloaded to and buffered in the player. Streaming is really a clevertradeoff that delays playing music in real-time but is not so costly interms of time as waiting for the music to download in its entirety.
Unlike downloading entire files that provide a complete copy of the musicon your hard disk, the piecemeal technique of streaming is sensitive toproblems with the network and transmission lines. If the network getsinterrupted for longer than the player can buffer music, then the stream ofmusic will be broken and the player will produce an audible skip, whichsounds like static or background noise. Because most people are wired tothe Internet over consumer-grade phone lines, they will inevitablyexperience bit rates of much less than even 53 kbps from network congestionand bottlenecks. Streaming is going to skip sometimes as a result, and itis going to take even longer to download the complete file or broadcast.High-end users with cable modems, ISDN and ADSL may still encounterbottlenecks downloading data from a server, but they can generally streamaudio at higher bit rates. When streaming audio, we are normallyconstrained by the bit rates available over conventional !telephone networks, and audio quality suffers. It is not yet a perfectly networked world.
Let me add a quick note about codecs. Codec manufacturers are stilldeveloping their algorithms. It may not be surprising to find thatalgorithms that sound good for encoding speech can actually sound lousywhen encoding music. What is surprising is that algorithms designed toextend the high end for encoding broadcast music in MP3 often do not soundgood for speech range material. The best way to find out which algorithmswork best for your program material is to audition your program materialwith the different encoding schemes available on your codec.
No one paid much attention to the legal implications MP3 files bring to useof commercial music on the Web until publicity broke about a StanfordUniversity sophmore who had posted his collection of favorite music on auniversity server. The server was taking so many hits that it began toattract attention. University networks are in no way immune from the law,but what occurs behind private university or corporate network firewalls isunlikely to be scrutinized heavily from a legal standpoint despite officialregulations, and therein lies the copyrighting issue that owners ofmaterial being posted or moved around on the Web most fear.
It is probably inappropriate for this article to describe at length all thelegal ramifications of posting commercial music or intellectual property onthe internet, but some guidelines are in order for streaming or downloadedaudio. There is nothing illegal about the copyright holder posting MPEGaudio files of his or her work on the network, and anyone can subsequentlycopy those files as many times as they like or propagate them anywhere onthe network. The music industry claims that this is not what concerns theauthorities, and they mostly dismiss this to be a fringe market populatedby musicians seeking publicity.
The major recording labels claim that they do care about anyone'sduplicating or ripping off the music or intellectual property of theirartists for the purpose of making it freely available on the Web becausethey receive no royalty payment. Although the copyright law allows you to make areasonable number of copies for your private use, this concept ofreasonable use does not include posting your favorite copy-protected songson your home page at AOL as far as the record labels are concerned.
Copyright ID and protection
MPEG audio can be encoded with different kinds of ID tags to identify suchthings as the copyright owner, song title, artist and album. Unfortunately,these tags do not provide any kind of physical copy protection. The musicindustry is literally clamoring to provide its own secure electronic musicdistribution scheme known as SDMI (Secure Digital Music Initiative).
Like schemes before it, SDMI provides a digital watermarking scheme thatimbeds a virtually inaudible digital signature in the file as well as acopy protection scheme. If everyone uses an SDMI-compliant player, thenSDMI watermarked files could be played, and their distribution could betracked and royalties collected. Once the copy count was exceeded, playingor copying the file would not be possible. Although SDMI has an impressivenumber of companies as members, there is no reason to believe, based uponInternet culture, that SDMI will replace free MP3 (and MP4 in the offing)as the de facto standard for internet audio. Manufacturers are free to makeSDMI-compliant players that do not exclude playing non-SDMI encoded files.
History has shown that wherever intellectual property is distributed,regulation of intellectual property law is sure to follow. First, therewere printing rights, then came audio and video rights. Now, there will beInternet rights and portable-device rights. Welcome to the brave new futureof electronically transferred audio.
For more information on compression schemes, visitwww.cs.sfu.ca/undergrad/CourseMaterials/CMPT479/material/notes/Chap4.