Editor's note: To view a PDF version of this article, click here.
As a big baseball fan, and in particular a fan of our local squad here in Philadelphia, I offer these additional astonishing statistical anomalies. The Phillies have played 110 seasons and have won one (i.e., 1) championship. The Marlins, an 11-year-old franchise, have now won two championships. It took until 1980 for the Phils to break through: 98 seasons. Under the obviously dubious assumption that all teams have an equal chance at the beginning of a season, the probability of the Phillies winning just once in 110 tries is on the order of 3 in 100 (under some average number of teams assumption).
From a different angle, four championships should have about 17 percent chance of occurring, six of them about 15 percent. The probability of the Marlins winning twice over 11 years, by contrast, is about 4.5 in 100. I guess this is what makes sports so interestingyou can throw away the statistics each game and season. But I sure wouldn't mind a little intervention from the law of averages.
Now, if you want to get yourself worked up about being ripped off statistically, what if you only received 2 percent of the video service you subscribe to? As it turns out, through the wizardry of video compression and the limitations of human visual perception, the vast majority of raw information content is stripped out of digital TV signals before you get it. Actually, you typically get less than 2 percent of the raw digital content representation of a video. The trained eye, and sometimes the untrained eye, may notice the impact of cheating you out of your video bytes. The untrained eye may not recognize that it is the compression system's fault anyway.
So, what is going on here? Well, it is simply a question of "doing the math." To take advantage of the move to digital requires representing all of the information we'd like to store, share, transport, observe or listen to in bits and bytes. The world is an analog place, in general. It is relatively straightforward to understand the representation of keyboard symbols of written communication in terms of bits. We will not get into character recognition at this point. If you ever saw my handwriting, you would know why.
In any event, there is a fixed number of different symbols on a keyboard, beginning with the usual letters and numbers. Assigning a particular bit sequence to each is a natural, logical process of taking a discrete set of items (letters, numbers, punctuation marks, etc.) and mapping them neatly into a lookup table of binary words. For the spoken word, general audio and the visual image, it is not so directly obvious.
Speaking of speech
Developers of the telephone system long ago learned some of these lessons and designed a system that takes into account the human aspect of audio perception. The simple result is that you don't sound quite the same on the other end of a telephone as you do in person, because it was determined that a reasonable trade-off of clarity of the voice content with system design involved some bandwidth limiting of the speech signal. Basically, the bandwidth not transmitted, while altering slightly the pitch, does not significantly affect the ability to interpret the sounds, words and conversation accurately. Those same problems scale to wider bandwidth applications as well, in particular audio, images and video content.
The telephone system makes a very straightforward example. Here, speech is converted by a microphone into a very simple electrical waveform that can be visualized as simply voltage level vs. time. All other more sophisticated applications can build on this model, since they are basically more complex versions or combinations of such waveforms. There are two pieces of the puzzle that together determine the raw bandwidth demands of the various signal types in terms of bits per second. The representation process is based upon the precision with which the frequency domain and the amplitude (waveform) domain must be represented.
The first, the frequency domain, is a bit of nature that cannot be ignored. Voice is represented in a spectrum analysis of signals that span about 7 kHz, while hearing extends from approximately 20 Hz to about 20 kHz. But our ears are audio filters, with typical flatness specifications that color the sounds we hear, while our brain does the more complex work of determining whether we like it.
Figure 1 shows coarse frequency response curves of equal loudness for normal ears at 80 decibels of peak sound pressure level, which is the average loudness setting of a consumer's home sound system. Spectrally speaking, there just is not much signal energy, if any, beyond 7 kHz or so. This upper band limit means that there is no need for a voice-only system to reproduce spectral content beyond that range. As we have indicated above, the phone system has determined that only about half of that 7 kHz is worthy of passage.
Cheating our senses
The second piece of the representation puzzle is how accurately we represent the waveform that is consuming the 7 kHz. For example, any digit head and many nondigit heads can tell you that "n" bits offers waveform resolution of 2n different levels. The process of taking an analog signal and assigning it one of these inexact discrete levels is called quantization. When n = 2, there are four amplitude levels; when n = 12, there are 4,096 levels.
Again, in terms of representing analog content, there is an aspect of the way we perceive sound that determines how precisely our senses can determine the difference between, say, quantization level 1,516 and 1,517 in the n = 12 case. Can our ears tell the difference on a phone call? No. Can our eyes see the difference in a picture? Hmmm, in this case it isn't so simple. Will one pixel (a tiny piece of digital picture) be noticeable if it is in error by one bit, but is surrounded by accurately represented pixels? No. But it isn't likely to play out that way in practice, either. You can play with display resolutions on your PC and see for yourself what resolution impacts you can observe and which are tolerably distorted to you, in that environment.
So, there are aspects of waveform resolution and signal bandwidth that are both part of the equation when developing the system. The combination determines the bandwidth necessary as follows: For a 7-kHz-wide voice signal, Nyquist says that you need to sample it at least at 14 kHz to represent it undistorted. If we need 12 bits to hear it with some level of clarity, then we would have 12 bits/sample x 14,000 samples/second = 168 kbits/s. Now, the phone system doesn't use these numbers, which is why a telephony DS0 signal is 64 kbits/s (8 bits/sample x 8,000 samples/s).
Let's briefly rewind back to music. As you may have learned either through experience, through the Napster wars or from use of pay music sites today, a .wav file that plays out from your CD player into your speakers varies in size, but is typically in the 50- to 75-megabyte range. Meanwhile, you may notice that an MP3 file of the same tune may be 3 to 5 megabytes. That's quite a bit smaller. Suddenly, a 56.6-kbit/s modem can retrieve the tune in a dozen minutes or so, instead of a couple of hours. Toss in broadband access via a cable modem, recognize that the download side is where the highest speed of the technology is, and downloading an MP3 file is essentially a real-time activity while surfing.
In fact, broadband access has significantly enabled music access via download, and is poised to do more damage to the brick-and-mortar music retailer than ever imagined. It is just another example of what business folks call creative destruction. The shame of this example lies in the lack of vision with respect to this trend by the recording industry. The film and video industry, of course, have this example before them to help them learn the proper lessons as the same phenomenon scales to movies, where yet many more megabytes of content reside per multimedia product.
Picture this
Your eyes, like your ears, are also filters, but in a visual sense. They are also much harder to explain as nicely as ears. The filtering facts are wildly more important from an engineering standpoint, however, because raw video content is much more bandwidth-intensive. This is a reflection of the precision with which the eye can see things; the nature of creating images large enough to be entertaining and enjoyable (and our standards sure have changed!); and doing so rapidly enough in succession to look continuous. For example, once we learned how to access the Internet from our homes, we whined and complained hysterically about the 14.4-kbit/s speeds. This was followed by cyclical rejoicing and complaining as we cranked it up to 28.8 and then 56.6 kbits/s. Asymmetric digital subscriber lines and cable modems brought us into Megabit World, a place from which there shall be no retreat. Even so, upstream rates of 100s of kilobits per second are still the norm, although downstream bursts can be in the 10s of megabits per second. Now, consider that standard-definition video (the typical television resolution that you are accustomed to) under International Telecommunications Standard ITU-R 601 spells out a 270-Mbit/s data rate!
The math associated with this number is built on pleasing your visual senses: Over 300,000 pixels are in a picturea pixel being one tiny little rectangle of resolution on the screen, each of which is encoded with 24 bits to represent what is contained therein. This is one image, encoded by over 7 million bits (24 x 300,000), but they have to sequence through at 30 images per second to keep the flow fluid enough for human filtering, resulting in a number in the 270-Mbit/s range when factoring in various elements of overhead. The portion of this that is active video reduces the number, but only down to about 160 Mbits/s. And this is one channel of video. Factor in the simultaneous delivery of 20 or 40 or 80 or 200 channels, and you can see the magnitude of the digital video delivery problem. Two hundred channels means delivery of 32 gigabits/s, which is impractical in our world of limited over-the-air spectrum and limited cable bandwidth, even if we had a fiber to every residence.
Today about 2 Mbits/s are used to represent a digital picture, based on the MPEG-2 standard. (MPEG stands for Moving Picture Experts Group.) This varies with the complexity of the content, but this result seems astonishingly low at first glance. This means that, of the 160 Mbits/s of active video, only 1.5 percent is used. DVD-quality video is 6 Mbits/s, so in this case a full 4 percent of the information is represented. What magic is performed here? There are several steps to getting the most from the least in a way that is still pleasing to the viewer. Figure 2 shows a high-level block diagram of the steps involved. In the next Building Blocks, we will detail these blocks.
So, why is data compression technology important? As the examples above point out, storage and transport of digital information require more bandwidth in signal representation than the raw analog signal counterparts. So why do digital? Despite the apparent inefficiency, it's still easier to represent information with high fidelity longer, over greater distances, in more robust media, at evolving cost points and in common formats when the content is represented digitally. Taking the best advantage of these features, however, means making sure they are used most efficiently to keep system cost under control, and in the case of video, to make wide-scale multimedia distribution even practical.
None of this is new. Audio technology evolved to provide enhanced means of storage and transport, and playback convenience when the record album gave way to sharing the stage with the cassette. Radio signals sometimes undergo pre- and post-distortion to limit their amplitude range, to reduce system cost and enhance transport quality.
Acknowledgments
The author appreciates the help of Ajay Luthra and Paul Moroney, who inspired this topic. Both are with the Advanced Technology group within Motorola's Broadband Communication Sector.
Related Article
"Video Coder Threads Media Needle," http://www.CommsDesign.com/story/OEG20020801S0022
Reference
1. J. Gibson, T. Berger, T. Lookabaugh, D. Linfbergh and R. Baker, Digital Compression for Multimedia, Academic Press, 1998.
About the Author
Rob Howald (rob.howald@motorola.com) is the director of systems engineering in the transmission network systems group of Motorola's Broadband Communications Sector in Horsham, Pa. He has a BSEE and an MSEE from Villanova University and a PhD from Drexel University.