The
invention of the phonograph changed everything! Suddenly,
musicians could reach an audience beyond their local region,
and fans could listen to music in their own homes.
The
CD brought recorded music into the digital realm, adding
clear benefits to the quality of the music. Signal-to-noise
ratio increased dramatically with the introduction of the
CD. The ability to separate the archive and display functions
also provided a clear benefit - while the CD contained the
information itself, the conversion of bits into audio occurred
within the electronics of the player. Digital media did
not degrade with time or number of uses. With the exception
of physical damage, such as scratches, the CD can be played
a thousand times and still provide the same quality as it
did the first time - this was a vast departure from the
concept associated with LP's and cassettes, which are analog
formats.
Downloadable
audio was born from the combination of two technologies
- the internet and audio compression. With access to the
internet infiltrating homes over the past decade, only one
element was necessary to achieve the obvious desire to get
music from websites - audio files which could be downloaded
in a reasonable amount of time. CD-quality audio (44 kHz,
16 bit) occupies about 10 MB per minute in storage. A typical
song of 30-40 MB would take more than 2 hours to download
using conventional modems, making internet distribution
impractical - welcome mp3
While
many audio compression methods exist, MPEG-1 Audio Layer
3 (or mp3) was the breakthrough everyone was waiting for.
Utilizing the concept of masking, this compression technology
provided CD-like quality at a fraction of the file size
- that 30 MB song, once compressed, took up only 3 MB. Suddenly,
music was downloadable at 15-20 minutes per song.
MP3
took the music industry by storm, and whatever your views
of Napster or its clones, digital downloadable music is
here to stay. Music distribution is in a state of transformation
the likes of which we have not seen in generations. As broadband
connections to the internet become commonplace, and as the
internet reaches more homes, people will choose to get their
music over the internet. Musicians should embrace this technology
and this method of delivery.
Digital
Audio
Background:
Digital audio is often described as 16-bit, 44 kHz, otherwise
known as CD-quality audio. Digital audio differs from analog
audio in its waveform structure. While analog audio is represented
by a continuous waveform, digital audio is a step-function.
We hear music as an analog waveform. Our ears perceive sound
as small changes in air pressure, and convert those changes
to mechanical motions, and then to electrical stimuli which
proceed to the brain. All digital audio must be converted
to analog audio through a D/A converter before we hear it.
This is done within the CD player or computer sound card
before the signal is sent to the speakers.
The
purpose of digital audio is to store sound in a format which
is non-destructive, which means it doesnt get degraded
when copied. Analog sound gets worse with every copy made,
as with a cassette for instance. Just as digital is converted
to analog through a D/A converter, analog is converted to
digital through an A/D converter. The fundamental element
in this process is called sampling.
Sample
Rate:
Analog-to-digital conversion takes place when the audio
waveform is sampled at a fixed interval of time and represented
as a series of data values. The sample rate is the number
of times per second the waveform is sampled, and is expressed
in units of kHz (kilohertz, or one thousand hertz)
one Hz equals one cycle per second. Both sample rate and
audio frequency are expressed in units of kHz. Simply stated,
frequency corresponds with pitch. For example, the A below
middle C has a frequency of 220 Hz, and the A one octave
higher has a frequency of 440 Hz.
To
produce CD-quality digital audio, the analog audio must
be sampled at 44 kHz or higher. The human ear can detect
frequencies up to about 20 kHz. According to the Nyquist
principal, the sampling frequency must be at least two times
the highest frequency that you want represented. Otherwise,
aliasing will occur, which means that the frequencies above
1/2 the sample rate will be represented inappropriately
as lower frequencies, because they were undersampled.
This
is the reason for the 44 kHz it is roughly twice
the 20 kHz that we can hear. Some commercial audio recording
software boasts sampling rates as high as 96 kHz, but unless
you want your dog to enjoy those higher frequencies which
you cant hear, theres no physical reason to
go above 44 kHz.
Good
temporal resolution can be achieved with commercial analog-to-digital
conversion technology. A useful analogy is to compare temporal
resolution (sound) with spatial resolution (imaging)
audio frequency is expressed in cycles per second, while
spatial frequency is expressed in line pairs per millimeter;
the greater the frequency, the better the detail. When converting
an analog image into a digital image (as with a digital
camera), the spatial resolution or detail of the resulting
image depends on how many samples you take of the image
over its area, or how many pixels you have. Commercial digital
cameras have improved in spatial resolution in recent years,
but they remain a far cry from the resolution of conventional
film, which is on the order of a few microns.
The
goal is to sample the image, or the audio waveform, at a
sample rate which exceeds two times the highest frequency
that can be resolved by the viewer, or listener. In audio,
that frequency is 20 kHz, and the sample rate must therefore
be above 40 kHz (44 kHz is the current standard for CD-quality
audio).
Bit
Depth:
The bit depth for CD-quality audio is 16 bits. Think of
bit depth as step size. The greater bit depth you use, the
finer your steps will be, or the greater number of steps
you will have throughout your range. Continuing the analogy
with imaging, bit depth describes the number of gray levels
in a black and white image. For example, if you have 8 bits,
then you have 256 different shades of gray to represent
changes in the darkness/lightness of your image (28 = 256).
If you have 10 bits, then you have 1024 shades of gray.
In 16-bit audio, you have 65,534 numbers (216) available
to represent the incoming voltage level of the signal.
In
practice, utlilizing a greater bit depth reduces noise in
your audio, as well as quantization distortion. Quantization
refers to the practice of assigning whole numbers to an
input voltage level. Since rounding always occurs, the smaller
the step size the better a greater number of bits
achieves this. CD-quality audio uses 16 bits. Sometimes
bit depth is referred to as resolution, not to be confused
with temporal resolution, which refers to the sample rate.
Dynamic Range and Signal-to-Noise Ratio:
Greater
bit depth increases the dynamic range. The dynamic range
is the range between the lowest and highest level that can
be reproduced by a system. A system with 16-bit resolution
has a dynamic range of 96 dB, where dB refers to the decibel.
The decibel is one tenth of a Bel (named after Alexander
Graham Bell). This is a logarithmic scale and is relative.
A 3 dB change is the minimum perceptable change, and a 10
dB change represents a sound that is twice as loud. Log
scales are used frequently in both sound and imaging (as
with optical density) to condense the dynamic range. Also,
log scales often more closely approximate how our ears and
eyes perceive sound and images, and thus their use is properly
justified.
Signal-to-noise
ratio is simply the level of the music divided by the level
of the noise. All sound has a noise component. Noise refers
to random fluctuations in sound which are not associated
with the music. The goal is to reduce this noise to acceptable
levels, or to separate the signal as far from the noise
as possible. A large signal-to-noise ratio represents better
reduction of noise. CDs typically achieve a signal-to-noise
ratio of about 90 dB, which is enviable when compared with
the SNR in medical imaging technologies, for instance.
File
Size Reduction:
The size of a digital audio file can be computed as the
product of sample rate, bit-depth, number of channels, and
seconds, divided by the number of bits per byte, or 8. As
an example, consider a 60 second stereo sound clip. Because
its stereo, it has two channels. If this is recorded
with a 44.1 kHz sample rate at 16 bit resolution, then the
file size is 44,100*16*2*60/8 = 10,584,000 bytes or 10.6
MB.
The
size of an uncompressed audio file is about 10 MB per minute
of audio. This is not a concern from a storage standpoint
anymore, with multi-GB drives, but for web-based audio this
size is excessive. If you were to download a four minute
song using a 28.8 kpbs modem, it would take more than 3
hours. Without compression/decompression algorithms (codecs),
the only way to reduce the size of an audio file is through
manually degrading its sound quality in one of three ways.
1- Convert a stereo file to a mono file:
Stereo has two tracks, while mono has only one. You can
reduce the file size in half by making this conversion.
If you are planning to provide users with audio files that
you do not expect them to listen to on good sound systems,
this file reduction method should be utilized.
This should be your first choice in reducing file size,
since the dual-channel nature of the music is lost but no
actual degradation of audio quality takes place.
2- Reduce the sample rate:
When you reduce the sample rate from 44 kHz to 22 kHz, ,
you will cut your file size in half as well, and you will
lose frequencies above 11 kHz. This will be noticeable,
but the extent of the degredation will depend on nature
of the music. If there is a great deal of high frequency
components, the loss will be more meaningful. For speech,
of course, you should consider going as low as 8 kHz, since
high frequencies can be more easily ignored, as they are
across our telephone lines.
This should be your second choice in reducing file size.
Higher frequencies are sacrificed for smaller file size.
3- Reduce the bit depth:
If you reduce your bit depth from 16 bits to 8 bits, this
again will cut the file size in half. The practical result
of reducing bit depth is to introduce noise into the recording.
In most cases, the result will be unacceptable.
This method should be your last choice in reducing file
size.
If
all three of these methods are used, you can reduce your
file size by a factor of eight. A 31 MB song now takes up
only 3.9 MB, which is an acceptable size for downloading
over the internet. Unfortunately, the audio quality has
been degraded beyond what is acceptable to listeners. This
is where compression technology becomes important.
Compression
Technology and MP3
Introduction:
If we could all receive data at a few megabits per second,
perhaps we wouldnt concern ourselves so much with
compression technology. But normally, when we want to send
data to somebody else, we zip it or stuff it or do whatever
we can to make it smaller than it really is. For pictures,
we compress them to jpeg or gif format. For audio, we use
mpeg, or more specifically, mpeg layer 3 otherwise
known as MP3.
In
reality, there are many audio compression formats, but MP3
is the format that has taken the music industry by storm
in recent years, and is rapidly changing the landscape in
the distribution of music over the internet. If you havent
heard of MP3, then you probably just returned from your
mission or youd better use that as your excuse,
anyway.
Background:
In 1992 the Motion Picture Experts Group (MPEG) approved
a compression/decompression algorithm (or codec) which was
called MPEG-1 Audio Layer 3, or MP3. But it wasnt
until 56K modems became commonplace that MP3 reached a wide
audience and began to transform the way music is heard and
distributed. With this combination of compressed music and
faster connection speeds, internet users could download
an entire song in 15-20 minutes certainly not instant
gratification, but reasonable enough for people to get the
music they wanted.
The
proliferation of digital music on the internet has since
been very rapid. The most commonly downloaded material from
the internet is now music. The most commonly requested word
in search engines is "mp3". Digital music and
the internet are a perfect fit. Why would anyone go to a
store, purchase a CD, bring it home and put it in their
player, when they can point and click instead? Why would
you walk up and down the isles in a music store and sift
through the thousands of titles to find the CD you want
(and maybe never find it), when you can type in a few keywords
and find the title in a matter of seconds?
The
future of MP3 and music distribution holds many unanswered
questions regarding formats, copyright infringement, and
distribution channels. But one thing is clear digital
music distribution over the internet is here to stay and
will continue to expand rapidly as broadband access becomes
commonplace. When users can download entire CDs in a matter
of seconds, there will be little reason to acquire music
is any other way.
Audio
Compression:
MP3 achieves roughly a 10:1 compression ratio with only
minimal loss of audio quality. Unlike the file size reduction
methods mentioned, which degrade the audio quality by reducing
sample rates or bit depths, the codec used for mp3 utilizes
a masking method. The algorithm removes sounds in the audio
which are masked by other sounds. The idea behind this is
that because that sound is being masked, you arent
going to hear it anyway. So if its removed, you wont
miss it. The audio quality is in reality degraded, because
the codec uses lossy compression, but most users find little
noticeable degradation.
Audio
Formats
Downloadable
Audio:
For downloadable audio, there is little reason to look any
further than MP3. While many other formats exist, many of
these are uncompressed or do not offer the level of compression
that MP3 does. The MP3 format is now so widely used that
nearly all web users have capabilities either within their
browser or through external software for playback of MP3
files. Heres a brief description of audio formats:
- AIFF (.aif, .aiff): Audio Interchange File Format. This
is an uncompressed format, and is not used on the web, since
file size is approximately 10 MB per minute of audio. This
is the default audio format for computers running the Mac
OS. It uses the PCM codec.
- AU (.au, .snd): Sun Audio format. This is a moderately
compressed format. It was used frequently in the early days
of the web, but is no longer practical due to large file
size. This is the default audio format for computers running
Unix. It uses the u-law codec.
- Wave (.wav): Microsofts Wave format should also
be avoided for web use, since it is an uncompressed format.
This is the default audio format for computers running Windows.
It uses the PCM codec.
- Quicktime (.mov): Quicktime
is more than an audio format. It is an architecture to store,
edit and play multimedia content, such as synchronized graphics,
sound, video, and text. Apples quicktime software
actually supports all of the major audio formats for playback.
It uses a proprietary format from Apple Computer.
- MP3 (.mp3, mp2): MPEG-1 Audio Layer 3 format. MP3 has
become the undisputed format of choice for downloadable
audio. It provides good quality digital audio at a compression
ratio of about 10:1. The significance of MP3 cannot be overstated.
Currently, there is no conceivable reason to use any format
other than MP3 for the delivery of downloadable audio.
Streaming
Audio and Encryption:
Streaming audio differs from downloadable audio in that
it begins playback almost immediately after being requested.
Instead of waiting until the entire song has been downloaded,
the audio is "streamed" to the users computer,
and it continues streaming during playback. Since delivery
time is quicker, audio quality is normally poorer with streaming
audio. The purpose of streaming audio is generally two-fold
to deliver the audio to the listener with minimal
delay, and to prevent the user from obtaining an actual
copy of the music.
Encryption
technology is a method of preventing the user from making
copies of the music they download. The recording industry
is currently developing a standard, termed the Secure Digital
Music Initiative (SDMI), which will likely be a part of
digital music in the coming years. Some encryption formats
exist already. Below is a description of some streaming
formats and those which use encryption:
- Real Audio (.ra, .ram, .rm): Real
Networks pioneered streaming audio with its introduction
of Real Audio several years ago. RealPlayer now supports
many streaming formats besides RealAudio. This is by far
the most popular format for streaming audio, controlling
roughly 80% of the market.
- Shockwave Audio (.swa): Shockwave is Macromedia's
contribution to web-based audio. It is a streaming audio
format which allows you to choose the level of quality for
playback, depending on the modem speed of your audience.Shockwave
streams a low bit-rate MP3 file with a different file header.
Many players can handle Shockwave audio.
- Windows Media Audio (.wma): Windows
Media Audio uses a proprietary compression format, and
is a relatively late entry into this realm. It is a streaming
format and is aimed squarly at Real Networks RealPlayer.
- Liquid Audio: Liquid
Audio is a streaming format which utilizes licensed
technology from Dolby Labs. But it's more than a streaming
format. The goal of Liquid Audio is to allow users to preview
music and then purchase it one song at a time. Liquid Audio
uses a tracking system to make sure the record company,
the publisher and the artist get paid. It is meant to be
a one-stop solution for digital downloads over the internet.
MIDI:
MIDI (.mid, .midi): Musical Instrument Digital Interface.
MIDI is different from the other formats mentioned, because
it really isnt an audio format. MIDI is a language
for computers and musical instruments to talk to each other.
A MIDI file does not contain music. It contains instructions
for a musical instrument to play a song.
You must have a musical instrument to get music from a MIDI
file. Fortunately, many people have a musical instrument
built right into their computers Apples Quicktime
Musical Instruments is one such example. Other software-based
synthesizers exist which are more advanced.
Since
MIDI files are so small (a few kB), their use is well-suited
for web-based delivery of audio. However, its important
to recognize that the music may sound different to each
user, since it is played on the computers synthesizer,
which may vary among users. MIDI is best reserved for instrumental
works, and only when the selection of the instrument which
plays the music is not essential, since you cannot control
this. If you want to deliver music which sounds exactly
the way you hear it, MIDI should be avoided.
Conversion
Between Formats:
Many tools exist for converting between different audio
formats. High-end audio editing programs such as Sound
Forge and Peak
offer the most extensive options. However, less expensive
alternatives exist, such as Quicktime,
SoundJam,
Jukebox
and WinAmp.
A good resource for keeping up with mp3 music, players,
news, and much more is the Lycos
MP3 site.
The
Future of Web-Based Audio
The distribution of music over the internet will continue
to proliferate as broadband access becomes more commonplace
- this is a given. The unanswered questions have to do with
which format will emerge as the standard and how the recording
industry will modify its business model to generate profits
from web-based distribution.
The
widespread acceptance of MP3 can be credited to its open
architecture and grass roots support. MP3 is not really
a standard, since no company or organization has branded
it as such. While consumers have embraced MP3, the recording
industry has taken every opportunity to curtail its use,
as demonstrated by its numerous lawsuits, first against
Diamond Multimedia when the Rio (portable MP3 player) was
introduced, and more recently against mp3.com and Napster.
The Recording Industry Association of America (RIAA) lost
its lawsuit against Diamond Multimedia, but has had more
success in the Napster case, where the court ruled in favor
of the RIAA. This decision was appealed, and Napster was
allowed to continue operations pending an appeals court
ruling.
While
the RIAA and large record companies have unanimously opposed
web-based distribution of MP3 music, many musicians have
found MP3 to be an opportunity, especially independent musicians
and unsigned artists. Before the internet and MP3, artists
had a difficult time getting their music heard and distributed,
unless they had a major record deal. Now, artists can place
music on their own web pages and music distribution websites
which promote their music. Opportunities exist for musicians
on mp3.com,
iuma.com,
and for LDS musicians on ldsmusician.com.
Artists are now finding an audience for their music without
the help of big record companies. And even major artists
are utilizing these new distribution methods.
While
the future of web-based audio holds many unanswered questions,
the next three to five years will be exciting to watch as
developments take place. In the meantime, musicians should
take every opportunity that exists to get exposure for their
music, using the internet for such purposes.