Sethares’ theory is very nice: we don’t hear “these two frequencies have a simple ratio”, we hear “their overtones align”. But I’m not sure it is the whole story.
If you play a bunch of sine waves in ratios 1:2:3:4:5, it will sound to you like a single note. That perceptual fusion cannot be based on aligning overtones, because sine waves don’t have overtones. Moreover, if you play 2:3:4:5, your mind will sometimes supply the missing 1, that’s known as “missing fundamental”. And if you play some sine waves slightly shifted from 1:2:3:4:5, you’ll notice the inharmonicity (at least, I do). So we must have some facility to notice simple ratios, not based on overtone alignment. So our perception of chords probably uses this facility too, not only overtone alignment.
Just making something explicit that I think I missed for a minute when reading your comment: the point isn’t “Sethares doesn’t explain how our ears/brains determine what’s one note and what’s more, so his theory is incomplete” (his theory isn’t trying to be a theory of that) but “our ears/brains seem to determine what’s one note and what’s more by doing something like looking for simple integer frequency multiples, and if there’s a mechanism for that it seems likely that it’s also involved in determining what combinations of tones sound good to us”. I think there’s something to that. Here are two things that seem like they push the other way:
On the face of it, this indicates machinery for identifying integer ratios, not necessarily rational ones. (Though maybe the missing-fundamental phenomenon suggests otherwise.)
Suppose you hear a violin and a flute playing the same note. You probably will not hear them as a single instrument. I think that whatever magic our ears/brains do to figure out what’s one instrument and what’s several also involves things like the exact times when spectral components appear and disappear, and which spectral components appear to be fluctuating together (in frequency or amplitude or both), and maybe even fitting spectral patterns to those of instruments we’re used to hearing. (I suspect there’s a pile of research on this. I haven’t looked.) The more other things we use for that, the less confident we can be that integer-frequency-ratio identification is part of it.
Interesting experiment which I am too lazy to try: pick two frequencies with a highly irrational ratio, construct their harmonic series, and split each harmonic series into two groups. So we have A1 and A2 (splitting up the spectrum of note A) and B1 and B2 (splitting up the spectrum of note B). Now construct a sound built out of all those components—but make A1 and B1 match closely in details of timing, slight fluctuations in frequency and amplitude, and anything else we can think of, and likewise for A2+B2. Does someone listening to this then hear two clashing tones, each of them nicely harmonic (i.e., A1+A2 versus B1+B2), or two weird inharmonic tones that somehow fit together (i.e., A1+B1 versus A2+B2)? What happens if first of all we play A1+B1, A2+B2, A1+A2, B1+B2 separately? The answer may well be that there isn’t really an answer, alas.
If you play a bunch of sine waves in ratios 1:2:3:4:5, it will sound to you like a single note. That perceptual fusion cannot be based on aligning overtones, because sine waves don’t have overtones.
The way I would explain this, is that when hearing real sounds it is very common that you hear a frequency and it’s harmonics. Almost all the time, if you hear 1:2:3:4:5 etc that is because a single note just sounded. So, if you hear a bunch of sine waves in that ratio (ex: a determined a group of people whistling) it sounds like one note.
Sethares’ theory is very nice: we don’t hear “these two frequencies have a simple ratio”, we hear “their overtones align”. But I’m not sure it is the whole story.
If you play a bunch of sine waves in ratios 1:2:3:4:5, it will sound to you like a single note. That perceptual fusion cannot be based on aligning overtones, because sine waves don’t have overtones. Moreover, if you play 2:3:4:5, your mind will sometimes supply the missing 1, that’s known as “missing fundamental”. And if you play some sine waves slightly shifted from 1:2:3:4:5, you’ll notice the inharmonicity (at least, I do). So we must have some facility to notice simple ratios, not based on overtone alignment. So our perception of chords probably uses this facility too, not only overtone alignment.
Just making something explicit that I think I missed for a minute when reading your comment: the point isn’t “Sethares doesn’t explain how our ears/brains determine what’s one note and what’s more, so his theory is incomplete” (his theory isn’t trying to be a theory of that) but “our ears/brains seem to determine what’s one note and what’s more by doing something like looking for simple integer frequency multiples, and if there’s a mechanism for that it seems likely that it’s also involved in determining what combinations of tones sound good to us”. I think there’s something to that. Here are two things that seem like they push the other way:
On the face of it, this indicates machinery for identifying integer ratios, not necessarily rational ones. (Though maybe the missing-fundamental phenomenon suggests otherwise.)
Suppose you hear a violin and a flute playing the same note. You probably will not hear them as a single instrument. I think that whatever magic our ears/brains do to figure out what’s one instrument and what’s several also involves things like the exact times when spectral components appear and disappear, and which spectral components appear to be fluctuating together (in frequency or amplitude or both), and maybe even fitting spectral patterns to those of instruments we’re used to hearing. (I suspect there’s a pile of research on this. I haven’t looked.) The more other things we use for that, the less confident we can be that integer-frequency-ratio identification is part of it.
Interesting experiment which I am too lazy to try: pick two frequencies with a highly irrational ratio, construct their harmonic series, and split each harmonic series into two groups. So we have A1 and A2 (splitting up the spectrum of note A) and B1 and B2 (splitting up the spectrum of note B). Now construct a sound built out of all those components—but make A1 and B1 match closely in details of timing, slight fluctuations in frequency and amplitude, and anything else we can think of, and likewise for A2+B2. Does someone listening to this then hear two clashing tones, each of them nicely harmonic (i.e., A1+A2 versus B1+B2), or two weird inharmonic tones that somehow fit together (i.e., A1+B1 versus A2+B2)? What happens if first of all we play A1+B1, A2+B2, A1+A2, B1+B2 separately? The answer may well be that there isn’t really an answer, alas.
The way I would explain this, is that when hearing real sounds it is very common that you hear a frequency and it’s harmonics. Almost all the time, if you hear 1:2:3:4:5 etc that is because a single note just sounded. So, if you hear a bunch of sine waves in that ratio (ex: a determined a group of people whistling) it sounds like one note.