Karplus-Strong String Synthesis
A few months ago, I set out to reimplement a guitar synthesizer in JavaScript. It turned out pretty good (see JavaScript Karplus-Strong for a demo), but even more satisfying than the end result was learning about the algorithm used for synthesizing the string sounds, Karplus-Strong.
With this post I hope to share some of that satisfaction by exploring visually how the Karplus-Strong algorithm works. We're going to start off looking at simple vibrations, then introduce the Fourier transform as a means of looking at vibrations from a different perspective, and finally use this new perspective to demonstrate Karplus-Strong.
Simple Vibrations
First, let's talk a bit about vibrations.
All sound is generated by something moving or vibrating. In some cases, this is obvious: if we pluck a guitar string, we can clearly see the vibrations. For other sounds, like tapping a desk, we might not be able to clearly discern the movement; but it is still there.
Depending on how fast the vibrations are, we get a different pitch of sound. Things which vibrate more quickly have a higher pitch, while things which vibrate more slowly have a lower pitch. The pitch of a sounds is also known as its frequency. (The difference between the two terms is that we tend to use pitch when referring to how we perceive the sound, and frequency when referring to the vibration causing the sound.)
Frequency is quantified using a number with the unit hertz, or Hz for short. In general, "hertz" means "times per second": a light blinking with a frequency of 1 Hz turns on and off once a second.
In the context of sounds, "hertz" means "vibrations per second". For example, the first string on a 6-string guitar (the one with the highest pitch) in standard tuning has a frequency 330 Hz. This means that, when plucked, the string will vibrate back and forth 330 times per second. The sixth string (the one with the lowest pitch) has a frequency of about 82 Hz, so will vibrate back and forth 82 times per second.
If we were to zoom in on a small section of a vibrating string and trace its motion, it might look something like this:
Now, this graph is really cheating a little bit - the shape of the vibration produced by a real guitar string is much more complicated. The shape of vibration we're using instead is known as a sinusoid, or sine for short. (This vibration is quite special mathematically in that it can be generated from a circle. See this demo for an excellent visualisation.) This vibration would sound nothing like a guitar.
Depending on the shape of the vibration, we get a different sound.
When drawn stationary like this, we can easily pick out one "block" of the vibration, which represents a complete movement back and forth. We call this block a cycle or period of the vibration. The way we're generating these sounds is essentially by taking a particular pattern of vibration for one period and repeating it over and over again.
All of the above vibrations sound very artificial, because nothing in nature vibrates in such a simple way. They certainly don't sound anything like musical instruments. (If you really try hard, you can sort of convince yourself that the the 'white noise' option sounds a bit like a reed organ.)
One very simple thing we can do to bring these vibrations closer to the vibrations we're used to from instruments that are struck or plucked is to make the loudness die away with time. This corresponds to making the size of the vibration get smaller and smaller as time goes on.
To my ears, the damped sine actually sounds something like a xylophone. We're still pretty far away from a guitar, but try listening to the damped white noise a few times more. We can kind of imagine it sounds like a very artifical guitar. But what do we need to do to get it to sound more natural?
Karplus-Strong
The idea of the Karplus-Strong algorithm for synthesizing string sounds is: what if we made some parts of the sound die away quicker than others? Specifically, what if we made it so that the higher-frequency parts of the sound died away quicker than the lower-frequency parts?
So far, we've been looking at vibrations and talking about them in terms of just a single frequency. You might be thinking, "How can a sound at one frequency have both high-frequency parts and low-frequency parts?" To answer this question, we need to take a brief foray into Fourier transforms.
The Fourier Transform
(For those who are already familiar with the Fourier transform: skip on past this section, and forgive the extreme hand-waviness of this explanation!)
Let's look at our graph of a sinusoidal vibration from before, this time marking the time taken for each cycle of vibration:
If each cycle takes 0.2 seconds, then we'll be able to fit 5 cycles in 1 second, and so the frequency of the vibration is 5 Hz. Let's mark that frequency on a line:
The two views of the vibration - its vibration pattern and its frequency - are completely synonymous. If we know what the vibration pattern is, we know where to mark its frequency (since we can count how long each cycle is), and if we know what its frequency is, we can reproduce the vibration pattern.
We can do the same for another sinusoid at a different frequency:
This time, with each cycle taking 0.1 seconds, we can fit 10 of them in 1 second, so the frequency is 10 Hz.
Now, what if were to play them both at the same time? The pattern of vibration will be a mix of both of them:
But we know the original two frequencies of sine vibration must still be in there:
With the vibration pattern mixed up, though, it's not so easy to figure out what frequencies it's made of just from looking at the vibration pattern. If we were given just the vibration pattern, it would be much harder for us to mark its component sine frequencies on a frequency line.
This is where the Fourier transform comes in. The Fourier transform is a mathemetical method to find out what frequencies of sine vibration are contained in a vibration, just from looking at the shape of vibrations. It takes us from the first of each pair of graphs, showing how the position of the string varies with time, to the second of each pair, showing the component frequencies.
The details of how the Fourier transform actually works are a little involved, but for our purposes all we need to know is what it allows us to do: go from a description based on time to a description based on frequencies of sinusoid.
Sinusoids: The Building Blocks of Vibrations
OK, so the Fourier transform for sinusoidal vibrations is pretty easy: if we've got a bunch of sinusoids added together, it lets us pick out the individual frequencies. But what if we want to do the same thing for other types of vibration? Can we still use the Fourier transform even if the component vibrations aren't sinusoidal?
It turns out that we don't have to worry: the sinusoid has some nice mathematical properties that mean we can represent any vibration as a mix of sinusoidal vibrations of different frequencies.
"Surely you're joking, Mr Rahtz!" I hear you say. "What about our square vibration? How can we make something so sharp out of something so smooth?"
Well, it turns out we'd have to add together an infinite number of sinusoidal vibrations at the right frequencies to get a perfect square vibration, but we can start to get a pretty good approximation from just the first half a dozen of them:
For the square vibration, we need to add precisely differing amounts of each vibration frequency, so we replace the plain frequency line with a graph showing how much of each frequency we need.
The Fourier transform shows us what mix of sinusoids makes up a vibration, taking us from the graph on the left to the graph on the right. So if we're saying that any vibration can be represented as a mix of sinusoids, we can use the Fourier transform on any vibration to show us its sinusoid frequency building blocks.
Again, the details of why this turns out to be the case are not obvious, but if this has piqued your interest in the Fourier transform and you'd like to learn more, take a look at An Interactive Guide to the Fourier Transform and this article on Nautilus .
A Different Perspective on Noise
Let's go back and look at our white noise vibration, which we thought might have some promise.
The way we were generating our sounds before was by taking a particular pattern of vibration for one period (which, remember, is like one "block" of the vibration we see when tracing it out) and repeating it for the subsequent periods. This time, though, let's take just a single period of white noise
and look at it through the lens of the Fourier transform to see what its component sinusoidal frequencies are:
For this frequency graph, instead of individual dots representing individual frequencies of sinusoids, we get a continuous line of component frequencies: instead of, say, 100 Hz, 300 Hz and 500 Hz as we had for the square wave, the frequency components of white noise are spread across a continuous range: 100 Hz, 100.1 Hz, 100.2 Hz, and everything in between and beyond.
What's more, white noise has its component frequencies spread across all frequencies possible - it has some amount in it of any frequency you can think of.
In fact, if we were to take a large number of different samples of white noise, we would see that, on average, it contains the same amount of each frequency. This is why white noise is called white: in the same way that white light contains the same amount of all the colours in the light spectrum, white noise contains the same amount of all the frequencies in the vibration frequency spectrum. (For the graph above, though, since we only took a single sample of white noise, we see quite a bit of variation in the amount of each frequency present.)
We can start to see why white noise is interesting to us as a starting point for synthesizing sounds. In the same way that a block of marble contains every possible statue imaginable, we can start with white noise, which contains every possible frequency, and then carve it out to form the kind of sound that we want.
Karplus-Strong
Before the interlude, we said that the idea of the Karplus-Strong algorithm for generating string sounds is that we dampen the higher-frequency parts of a vibration more than the lower-frequency parts. Now that we've seen the Fourier transform, we can be a bit more precise in our description: Karplus-Strong works by dampening the higher-frequency sinusoids more than the lower-frequency ones.
To illustrate how this works, let's go back to our example of white noise which died away with time:
By simply making the size of each period of vibration get smaller and smaller, what we're effectively doing is dampening all component frequencies uniformly:
For the first period (indigo), we just use a sample of white noise. For the second period (blue), we take the first period, but make it a little bit smaller. This has the effect of reducing the amounts of its component frequencies by the same amount across the board, as can be seen in the graph on the right. We repeat the process for each subsequent period, getting smaller and smaller for each one.
For Karplus-Strong, on the other hand, we don't reduce all frequencies by the same amount. Instead, we apply more reduction to higher frequencies, leaving the lower frequencies relatively untouched:
In a very hand-wavy sort of way, this kind of makes sense. When we pluck a guitar string, it starts off by vibrating all over the place, with a wide range of component frequencies. As times moves on, not all component frequencies die away at the same rate. Intuitively, we might expect that the faster vibrations would die away more quickly.
For our demonstration above, the frequency content changes quite suddenly between each period. With an actual guitar, though, the transition is much more gradual, over a longer number of periods.
This is all we need to generate really quite realistic results. We can tune each string by changing the length of each period. By changing how quickly we reduce the higher frequencies, we make the sound last longer (sounding like an electric guitar) or shorter (sounding more like an acoustic guitar).
Here's a simple demo which demonstrates the final result, showing how the different amounts of each frequency change with time.
Summary
So, in summary:- Sounds are produced by things vibrating. Depending on the shape of the vibration, we get a different sound.
- The frequency of a vibration is how many times the object in question moves back and forth each second. The higher the frequency, the higher the pitch of the sound produced.
- The Fourier transform shows that we can decompose any vibration into a mix of component sinusoidal vibrations at different frequencies.
- The Karplus-Strong algorithm works by starting with white noise, and for each subsequent period, dampening the component frequencies - with more dampening on the higher-frequency components than the lower-frequency components.
If you'd like to read more about Karplus-Strong, the original paper by Kevin Karplus and Alex Strong, Digital Synthesis of Plucked-String and Drum Timbres (JSTOR; requires free signup) is also surprisingly acccessible.