The thing here is, that ancient people discovered that notes which have frequencies in a ratio of small integers sound good together. Eg. 2:3.
For a long time, people were creating scales trying to have as many nice ratios as possible. This has problems. I’ll let you think about those yourself.
Then some guy figured out that human ear is not perfect and we can’t really tell whether we hear 2:3 or 2:2.9966. And came up with idea of doing those 12th rootes of 2.
Now, try to do 2^(7/12). Try also 2^(4/12). You see ?
You might ask ”… but why do small-integer ratio sound good?”. The most plausible explanation I know of is due to William Sethares, whose book Tuning, Timbre, Spectrum, Scale I highly recommend. He reckons (with some evidence) that it goes like this:
If you play pure sine waves together and ask people what sounds nice, there isn’t any strong preference for small-integer ratios. What there is is a dislike of notes that are almost but not quite coincident.
Real musical notes are not pure sine waves, but they can be considered (via Fourier analysis) to be made of pure sine waves, and the frequency-discriminating hardware in our ears does in fact basically do that Fourier analysis.
For most (but not all) ways of generating musical notes, what you actually have is a “fundamental” frequency together with integer multiples of that frequency. The higher multiples have less energy in them. Sometimes you only have odd but not even multiples. The exact distribution of sound energy across the frequencies—the so-called “spectrum” of the sound—is one of the main things that determines what it sounds like to us.
So: suppose two notes sound good together when none of the sine waves making them up clash with one another by being close but not close enough to be indistinguishable to our ears. You might try to make that happen by having none of those frequencies close to one another, but it turns out that you can’t, for real instruments that have lots of frequencies in their spectrum. The other thing you can do is to make the ones that come close together actually coincide, at least closely enough for human ears—and the way you do that is by having the ratio of the two frequencies be in a small-integer ratio.
This account of consonance and dissonance has some interesting consequences. For instance, if you make an instrument whose overtones are not harmonic—i.e., not all simple integer multiples of the fundamental frequency—then the combinations of notes on that instrument that will sound good together will not be the same ones that work for harmonic instruments like violins, flutes, saxophones, and human voices. This typically happens for instruments where the most important resonating object isn’t basically one-dimensional (like a violin string, or the column of air in an organ pipe) -- for instance, a drum or bell. And, indeed, if you listen to gamelan music, which is played on bells and drums, you will notice that it uses a different scale (in fact, two different scales) from the one that’s common in “Western” music, and one that in fact is a better fit for the spectra of those instruments! (So says Sethares, anyway.)
And if you have some nonstandard scale and would like some of the possible chords you can play with it to sound good, you can make it so by constructing an instrument with a suitable spectrum. That’s hard to do with actual physical instruments, but in these glorious days of computational everything it’s pretty easy to do with a synthesized instrument. And lo, Sethares has e.g. constructed “instruments” in which one can play nice-sounding music in 10-tone equal temperament, even though none of its intervals is anything like a nice simple rational number.
Do you feel like a major triad is more consonant than a minor triad? (I do.) Sethares’s theory can kinda explain that: you have the same set of intervals (a major third, a minor third, a perfect fifth) but the major third is “nicer” than the minor third, and in a major triad the less-consonant minor third occurs at higher pitch, which means that fewer of the overtones are present and more of them are up where your ears don’t hear so well.
Sethares’ theory is very nice: we don’t hear “these two frequencies have a simple ratio”, we hear “their overtones align”. But I’m not sure it is the whole story.
If you play a bunch of sine waves in ratios 1:2:3:4:5, it will sound to you like a single note. That perceptual fusion cannot be based on aligning overtones, because sine waves don’t have overtones. Moreover, if you play 2:3:4:5, your mind will sometimes supply the missing 1, that’s known as “missing fundamental”. And if you play some sine waves slightly shifted from 1:2:3:4:5, you’ll notice the inharmonicity (at least, I do). So we must have some facility to notice simple ratios, not based on overtone alignment. So our perception of chords probably uses this facility too, not only overtone alignment.
Just making something explicit that I think I missed for a minute when reading your comment: the point isn’t “Sethares doesn’t explain how our ears/brains determine what’s one note and what’s more, so his theory is incomplete” (his theory isn’t trying to be a theory of that) but “our ears/brains seem to determine what’s one note and what’s more by doing something like looking for simple integer frequency multiples, and if there’s a mechanism for that it seems likely that it’s also involved in determining what combinations of tones sound good to us”. I think there’s something to that. Here are two things that seem like they push the other way:
On the face of it, this indicates machinery for identifying integer ratios, not necessarily rational ones. (Though maybe the missing-fundamental phenomenon suggests otherwise.)
Suppose you hear a violin and a flute playing the same note. You probably will not hear them as a single instrument. I think that whatever magic our ears/brains do to figure out what’s one instrument and what’s several also involves things like the exact times when spectral components appear and disappear, and which spectral components appear to be fluctuating together (in frequency or amplitude or both), and maybe even fitting spectral patterns to those of instruments we’re used to hearing. (I suspect there’s a pile of research on this. I haven’t looked.) The more other things we use for that, the less confident we can be that integer-frequency-ratio identification is part of it.
Interesting experiment which I am too lazy to try: pick two frequencies with a highly irrational ratio, construct their harmonic series, and split each harmonic series into two groups. So we have A1 and A2 (splitting up the spectrum of note A) and B1 and B2 (splitting up the spectrum of note B). Now construct a sound built out of all those components—but make A1 and B1 match closely in details of timing, slight fluctuations in frequency and amplitude, and anything else we can think of, and likewise for A2+B2. Does someone listening to this then hear two clashing tones, each of them nicely harmonic (i.e., A1+A2 versus B1+B2), or two weird inharmonic tones that somehow fit together (i.e., A1+B1 versus A2+B2)? What happens if first of all we play A1+B1, A2+B2, A1+A2, B1+B2 separately? The answer may well be that there isn’t really an answer, alas.
If you play a bunch of sine waves in ratios 1:2:3:4:5, it will sound to you like a single note. That perceptual fusion cannot be based on aligning overtones, because sine waves don’t have overtones.
The way I would explain this, is that when hearing real sounds it is very common that you hear a frequency and it’s harmonics. Almost all the time, if you hear 1:2:3:4:5 etc that is because a single note just sounded. So, if you hear a bunch of sine waves in that ratio (ex: a determined a group of people whistling) it sounds like one note.
You can see the perfect fourth and perfect fifth are very close to 4⁄3 and 3⁄2 respectively. This is basically just a coincidence and we use 12 notes per octave because there are these almost nice fractions. A major scale uses the 2212221 pattern because that hits all the best matches with low denominators, skipping 16⁄15 but hitting 9⁄8, for example.
The thing here is, that ancient people discovered that notes which have frequencies in a ratio of small integers sound good together. Eg. 2:3.
For a long time, people were creating scales trying to have as many nice ratios as possible. This has problems. I’ll let you think about those yourself.
Then some guy figured out that human ear is not perfect and we can’t really tell whether we hear 2:3 or 2:2.9966. And came up with idea of doing those 12th rootes of 2.
Now, try to do 2^(7/12). Try also 2^(4/12). You see ?
You might ask ”… but why do small-integer ratio sound good?”. The most plausible explanation I know of is due to William Sethares, whose book Tuning, Timbre, Spectrum, Scale I highly recommend. He reckons (with some evidence) that it goes like this:
If you play pure sine waves together and ask people what sounds nice, there isn’t any strong preference for small-integer ratios. What there is is a dislike of notes that are almost but not quite coincident.
Real musical notes are not pure sine waves, but they can be considered (via Fourier analysis) to be made of pure sine waves, and the frequency-discriminating hardware in our ears does in fact basically do that Fourier analysis.
For most (but not all) ways of generating musical notes, what you actually have is a “fundamental” frequency together with integer multiples of that frequency. The higher multiples have less energy in them. Sometimes you only have odd but not even multiples. The exact distribution of sound energy across the frequencies—the so-called “spectrum” of the sound—is one of the main things that determines what it sounds like to us.
So: suppose two notes sound good together when none of the sine waves making them up clash with one another by being close but not close enough to be indistinguishable to our ears. You might try to make that happen by having none of those frequencies close to one another, but it turns out that you can’t, for real instruments that have lots of frequencies in their spectrum. The other thing you can do is to make the ones that come close together actually coincide, at least closely enough for human ears—and the way you do that is by having the ratio of the two frequencies be in a small-integer ratio.
This account of consonance and dissonance has some interesting consequences. For instance, if you make an instrument whose overtones are not harmonic—i.e., not all simple integer multiples of the fundamental frequency—then the combinations of notes on that instrument that will sound good together will not be the same ones that work for harmonic instruments like violins, flutes, saxophones, and human voices. This typically happens for instruments where the most important resonating object isn’t basically one-dimensional (like a violin string, or the column of air in an organ pipe) -- for instance, a drum or bell. And, indeed, if you listen to gamelan music, which is played on bells and drums, you will notice that it uses a different scale (in fact, two different scales) from the one that’s common in “Western” music, and one that in fact is a better fit for the spectra of those instruments! (So says Sethares, anyway.)
And if you have some nonstandard scale and would like some of the possible chords you can play with it to sound good, you can make it so by constructing an instrument with a suitable spectrum. That’s hard to do with actual physical instruments, but in these glorious days of computational everything it’s pretty easy to do with a synthesized instrument. And lo, Sethares has e.g. constructed “instruments” in which one can play nice-sounding music in 10-tone equal temperament, even though none of its intervals is anything like a nice simple rational number.
Do you feel like a major triad is more consonant than a minor triad? (I do.) Sethares’s theory can kinda explain that: you have the same set of intervals (a major third, a minor third, a perfect fifth) but the major third is “nicer” than the minor third, and in a major triad the less-consonant minor third occurs at higher pitch, which means that fewer of the overtones are present and more of them are up where your ears don’t hear so well.
Sethares’ theory is very nice: we don’t hear “these two frequencies have a simple ratio”, we hear “their overtones align”. But I’m not sure it is the whole story.
If you play a bunch of sine waves in ratios 1:2:3:4:5, it will sound to you like a single note. That perceptual fusion cannot be based on aligning overtones, because sine waves don’t have overtones. Moreover, if you play 2:3:4:5, your mind will sometimes supply the missing 1, that’s known as “missing fundamental”. And if you play some sine waves slightly shifted from 1:2:3:4:5, you’ll notice the inharmonicity (at least, I do). So we must have some facility to notice simple ratios, not based on overtone alignment. So our perception of chords probably uses this facility too, not only overtone alignment.
Just making something explicit that I think I missed for a minute when reading your comment: the point isn’t “Sethares doesn’t explain how our ears/brains determine what’s one note and what’s more, so his theory is incomplete” (his theory isn’t trying to be a theory of that) but “our ears/brains seem to determine what’s one note and what’s more by doing something like looking for simple integer frequency multiples, and if there’s a mechanism for that it seems likely that it’s also involved in determining what combinations of tones sound good to us”. I think there’s something to that. Here are two things that seem like they push the other way:
On the face of it, this indicates machinery for identifying integer ratios, not necessarily rational ones. (Though maybe the missing-fundamental phenomenon suggests otherwise.)
Suppose you hear a violin and a flute playing the same note. You probably will not hear them as a single instrument. I think that whatever magic our ears/brains do to figure out what’s one instrument and what’s several also involves things like the exact times when spectral components appear and disappear, and which spectral components appear to be fluctuating together (in frequency or amplitude or both), and maybe even fitting spectral patterns to those of instruments we’re used to hearing. (I suspect there’s a pile of research on this. I haven’t looked.) The more other things we use for that, the less confident we can be that integer-frequency-ratio identification is part of it.
Interesting experiment which I am too lazy to try: pick two frequencies with a highly irrational ratio, construct their harmonic series, and split each harmonic series into two groups. So we have A1 and A2 (splitting up the spectrum of note A) and B1 and B2 (splitting up the spectrum of note B). Now construct a sound built out of all those components—but make A1 and B1 match closely in details of timing, slight fluctuations in frequency and amplitude, and anything else we can think of, and likewise for A2+B2. Does someone listening to this then hear two clashing tones, each of them nicely harmonic (i.e., A1+A2 versus B1+B2), or two weird inharmonic tones that somehow fit together (i.e., A1+B1 versus A2+B2)? What happens if first of all we play A1+B1, A2+B2, A1+A2, B1+B2 separately? The answer may well be that there isn’t really an answer, alas.
The way I would explain this, is that when hearing real sounds it is very common that you hear a frequency and it’s harmonics. Almost all the time, if you hear 1:2:3:4:5 etc that is because a single note just sounded. So, if you hear a bunch of sine waves in that ratio (ex: a determined a group of people whistling) it sounds like one note.
For those who don’t want to break out a calculator, Wikipedia has it here:
https://en.wikipedia.org/wiki/Equal_temperament#Comparison_with_just_intonation
You can see the perfect fourth and perfect fifth are very close to 4⁄3 and 3⁄2 respectively. This is basically just a coincidence and we use 12 notes per octave because there are these almost nice fractions. A major scale uses the 2212221 pattern because that hits all the best matches with low denominators, skipping 16⁄15 but hitting 9⁄8, for example.