| « Introducing the PIC18 | Weak Pull up Resistors. » |
Digital audio is a very powerful thing, we listen to our iPods every day, we stream music from websites using our computers and we work with many algorithms derived from techniques in digital audio. Digital audio has its roots in wave mechanics and is a very fundamental process for modern computation. We will be discussing sound as a wave, some basic properties of waves and how to convert a wave into something which you can store in memory, and techniques to play that wave back. We’ll start with an introduction to sound and how to represent sound as a wave, and a mathematical function.
An introduction to sound:
Sound is a wave, and like all waves it transfers energy through a medium, in this case it is usually air. Whenever something emits a “sound” it compresses air molecules, these compressed air molecules exert a force on uncompressed ones, this starts a chain reaction and sound propagates; this is the essence of a sound wave. For sound to exist you need a medium, be it air, a solid, liquid or plasma. The sound wave will travel through that medium until it dissipates completely and is undetectable. Sound is usually the result of a force or stress on an object, you hit a tuning fork on a table, and it will vibrate at a particular frequency. To better visualize a sound wave mathematically, we will look at strign vibrations.
String Vibrations:
Like a turning fork, a string when plucked vibrates a particular frequency. The frequency at which a string vibrates at is due to its mass, length and tension on that string. The frequency which is most prevalent is called the fundamental frequency, when you play middle c on an instrument the fundamental frequency is 261 Hz. Now if middle c is a set frequency, why does middle c sound different on different instruments? The answer is because the sound from an instrument inst pure, each instrument has a unique harmonic structure which delivers a unique sound. To understand this more let’s look at the math of a wave.
Mathematical representation of a sound wave:
The nice thing about waves, is we can represent them mathematically, the simplest being that of a sine wave. Sine is a natural function which oscillates between crest to crest at a particular frequency. We can represent a pure oscillating sound by the following function.
Where A is the amplitude (sine is a function which oscillates between 1 and -1, multiplying it by A will result in A as the amplitude). The Greek letter omega is the angular frequency which is denoted in radians/second and the Greek letter phi denotes the angular offset (for all cases in this article we can consider it zeros). A “Pure” sound is that which is sinusoidal, we here these all the time on cheap synthesizers, when a sound is sinusoidal, it can be represented by the above formula alone. If a synthesizer were to play middle c, it would produce a sine wave of frequency 261Hz (omega = 640) and what you would notice is that it sounds like the right pitch, but no natural instrument sounds like a pure tone. When an instrument makes a tone, it is usually comprised of its fundamental frequency and higher order oscillations called harmonics. Harmonics are usually either ½(Strings) or ¾(Closed tube) integer multiples of the fundamental frequency of a lesser amplitude. Because of this we can consider and natural tone a superposition of fundamental sine waves and harmonic sine waves. Most tones in nature are comprised of a superposition of sine waves. We can use this fact to break down any complex sound into a series of many simple ones governed by the very simple formula above.
Synthesizer theorem:
Breaking a complex sound into a series of sine waves can be done using a Fourier series. A Fourier series simply is a mathematical series which can break apart any repeating oscillations into a series of simpler sine waves. We use a Fourier transform to translate an arbitrary function into a series of sine functions. For the purpose of this article we’ll use a Fourier series as a concept and we won’t be exploring the math behind it (Maybe one day).
To give a quick example of this we can consult Wikipedia. We all know and love our humble saw tooth wave; If we find the Fourier series of a saw tooth wave we get the following series as a result.
If we take n=1 we get the fundamental frequency of the series:

If we take the first 5 terms we get:

As we take n to infinity, we get a better and better approximation of our function. This is the fundamental way in which analog synthesizers would generate more complicated sounds, they would take several sine wave oscillators which they would combine in ways to reproduce a wave, and they wouldn’t need many because after a while the approximation was good enough to replicate the sound. This is fine and dandy, but in digital circuits we do not use sine waves as our building blocks, instead we use impulses.
The Impulse wave:
An impulse wave is completely different from a sine wave, instead of taking a wave as an infinite series of sine waves; we take the function and evaluate its value at a specific time t. We break the function up into many steps and by taking the value at every step we can reconstruct the wave. In this scenario as step size goes to infinity, we get a better reconstruction of the wave.
In this graph we have 5 impulses (red) that make up the waveform (green). Each impulse exists at time t and has a definite height to it. We break up a wave into specific impulses with definite height, we do this every sample. If the red waveform were 10KHz and we had 5 impulses per period, our sample frequency would be 50KHz. Each sample all we do is record the height of the waveform at that time.

If we were to reconstruct the wave using our impulses we would get the following:

As you can see, its not a pretty picture, but if you were to filter the signal, you would get a pretty good reconstruction of the original. To increase reconstruction we increase the sample rate and get more samples of the waveform per period. A wave with 100 samples/wave looks much more accurate than one with 10 samples/wave.
One golden rule is that your sampling frequency should always be above that of your highest recorded sound (Or at least the ones you would like to capture)
Digital Audio:
The heart of digital audio is taking an analog sound signal in the form of voltage over time and doing periodic analog to digital conversions and then storing these results in memory for later use.
Lets go back to our graph with the impulses. Lets say the vertical axis is voltage and the horizontal is time. Every impulse we sample a wave:
IMPULSE # -- VOLTAGE -- A2D(Int)(VDD=5V) -- T(uS)(@50KHz/Sample) 1 -- .1 -- 5 -- 20 2 -- .2 -- 10 -- 40 3 -- .3 -- 15 -- 60 4 -- .4 -- 21 -- 80 5 -- .5 -- 26 -- 100 6 -- .1 -- 5 -- 120 7 -- .2 -- 10 -- 140 8 -- .3 -- 15 -- 160 9 -- .4 -- 21 -- 180 10 -- .5 -- 26 -- 200 etc...
In all digital audio systems there are two things which determine your bandwidth.
The first is bit-rate which is how many bits you use to convert the signal. A bit-rate of 8-bits can break a signal up into 256 parts, thus a 5V signal has an ideal resolution of (5/256) 19mV(not bad!).
The second is sampling-rate, this is how often you sample a waveform, typical MP3 sample rates are about 44.1KHZ or one sample every 22uS.
The bandwidth of a signal is simply the sample-rate times the bit-rate.
The advantage of a digital system is that 22uS is rather slow, we can do a lot of work between sampling and storing a waveform, and with some good code we can produce high-quality powerful audio systems which are easy and cheap to make.
What now?
So now that we have an introduction to basic properties of sound, what can we do now? In upcoming posts i will be discussing the PIC18 series of micro-controllers, and some sample code to do audio pass-through and eventually storing and playing back a waveform.