|
The audio world has
been abuzz of late with news of several emerging digital
recording and playback technologies, each purporting to better the
sound of the venerable compact disc in a variety of musically
significant ways. Central to the discussion have been the concepts of
word length and sampling rate, neither of which have
been given adequate treatment by the high-end audio press.
To understand the
concept of word length, one must first grasp the concept of the binary
digit, more commonly referred to in its contracted form as the bit.
Just as the common decimal, or base 10, digits are formed by the
numbers 0 through 9, the digits of the binary, or base 2, system are
formed by the numbers 0, and 1. In general, the digits of a base n
system are formed by the numbers 0 through n-1, and any number
in a base n system can be formed by summing multiplicative
powers of the base. For example, the decimal number 123 can be formed
as follows:
(1 x 102)
+ (2 x 101) + (3 x 100), where 102
= 10 x 10, 101 = 10, and 100 = 1
= (1 x 100) + (2 x 10)
+ (3 x 1)
= 100 + 20 + 3
= 123
So, we see that the
number 123 is, in effect, an abbreviation for the first expression
given above, formed by the digits which multiply the powers of the
base. Our everyday numbers are nothing more than a notational
convenience for more complex mathematical expressions!
In similar fashion, the
number 123 can be expressed in the binary system (i.e. using a base of
2 rather than 10) as follows:
(1 x 26) +
(1 x 25) + (1 x 24) + (1 x 23) +
(0 x 22) + (1 x 21) + (1 x 20)
= (1 x 64) + (1 x 32) +
(1 x 16) + (1 x 8) + (0 x 4) + (1 x 2) + (1 x 1)
= 64 + 32 + 16 + 8 + 0
+ 2 + 1
= 123
Just as we previously
formed a mnemonic short form in the decimal system by using only the
multipliers of the powers of the base 10, we can form a base 2
mnemonic for the number 123 by using the binary digits, or bits, which
multiply the powers of the base 2. Doing so results in the 7-bit
binary representation 1111011 for the decimal number 123.
Seven bits is
sufficient to represent the number 123, but what about larger numbers
such as 241? With a little analysis, one can show that the largest
number which can be represented by n bits is 2n
- 1. Therefore, 7 bits can only represent numbers up to and including
127. 8 bits, on the other hand, can represent numbers up to and
including 255, so representation of the number 241 requires a minimum
of 8 bits. Early microprocessors, like the Intel 8086 which formed the
heart and soul of the original IBM XT, were capable of processing 16
bits of information at a time. In the terminology of the digital
world, the IBM XT had a word length of 16 bits. When the
compact disc was being developed by Sony and Philips in the mid '80s,
the 8086 was the state of the art in affordable microprocessors. Not
surprising, then, that the 16-bit word length, now generally believed
to be inadequate for audio applications, was chosen for the Red Book
CD standard.
If the largest number
which can be represented by an n-bit quantity is 2n
-1, then a 16-bit quantity is capable of representing numbers up to
and including 65535. In other words, each piece of data stored on a
compact disc must have a discrete value less than or equal to 65535.
What if the data to be represented exceeds this value, or falls
between two representable values? Then it must be transformed into a
value which is representable within the limits of the word length.
Unfortunately, such transformations result in a loss of information
and, subsequently, an audible distortion (the so-called quantization
error). Although it's beyond the scope of this article to delve
any deeper into the subject, let me simply state that the larger word
lengths being chosen for the new wave of digital formats (up to 24
bits in most cases) allows for greater precision in representing the
values stored in these digital words.
In order to understand
the meaning of the term sampling rate, it's instructive to
first look at a somewhat simplified, yet representative, picture of a
typical electrical signal representing a continuous musical waveform
(See figure 1). This representation of the electrical signal depicts
the way in which the signal's voltage, or electrical potential, varies
over time. At a given time, t, the height of the curve above
(or below) the horizontal axis gives us the voltage, v, at
that time. If, for each time t, we write down the height of
the curve at that point, we could subsequently use that information to
reconstruct the curve exactly. Unfortunately, there are infinitely
many points in time at which we could measure, or sample, the
height of the curve. Do we need to sample the curve at infinitely many
points in time in order to reconstruct it? Thankfully, no. Nyquist
proved that it is sufficient to sample the curve at a rate equal to
twice its frequency, where the curve's frequency is defined as
the number of times it completes a full cycle in one second (the
curve's frequency is measured in cycles per second, or Hertz
[abbreviated Hz.]) In other words, to reconstruct the curve exactly,
we must choose a sample rate which is twice that of the
curve's frequency.
This is, of course, a
necessarily simplified view of a real-world musical signal which is
comprised of a plethora of waveforms each with its own unique
frequency. According to Nyquist, in order to reconstruct such a
composite waveform, we must sample it at twice the frequency of the
highest frequency sub-signal. Since the upper limit of human hearing
is 20,000 Hz (20kHz), it should, theoretically, be sufficient to
sample at a rate of 2 x 20kHz or 40kHz, just less than the 44.1kHz
sampling rate chosen for the compact disc. Experiments have shown,
however, that humans can perceive frequencies much higher than 20kHz,
which has lead to the higher sampling rates (96kHz) of some of the
recent successors to the compact disc. Audiophilia plans to explore
these new formats in greater detail in the coming months.
|