I don’t mean to press you on a point, but when you say in reference to musical consensus, “Probably, but I think your example is a little bit too extreme to demonstrate your point”, I think it is important to say whether you believe there is any musical consensus of what is good, or if you believe there is zero consensus. The degree does not matter as to whether the point I’m trying to make is true. Is there any consensus based on how shared human nature interacts with physical sounds as to what is agreed upon as “good”? It seems difficult to argue that any consensus is completely arbitrary.
Also, I’m not saying that there is not infinite gradation of sounds or instrumentation, but there is a limit to the distinctness of sounds or instrumentation. Do you believe there are infinite distinct sounds or instruments possible? There isn’t clear language for what I’m trying to get at, but think about how a violin is more distinct from a tuba than from a cello. Or think of it in terms of being similar to the visual spectrum of light: there are infinite gradations of color, but there isn’t infinite distinctness. There are limits to the range of the visible spectrum of light, with the primary colors being most distinct from each other (but there is an infinite amount of gradation that can be categorized as sub-colors).
The point being that if there is an objective aspect as to what human nature appreciates as a song, and there is a limit to the distinctness of sounds (and other factors like rhythm, song structure etc), then there would be a limit to possible songs and a limit to possible songs that would be considered “good” by the general consensus (which I agree with you is very varied, but it is still non-arbitrary).
I think you are right to bring in painting or other forms of art to the discussion. What I’m really trying to do is explain a phenomena that I’ve observed in multiple forms of art. There is a pattern that appears to be taking place, that humans start out with very primitive forms of visual or auditory art, and then develop techniques and understanding to increase in complexity and open up new possibilities in art (like you refer to in your first couple paragraphs), but then the speed of development of more distinct works seems to slow down at a certain point and eventually decline. I agree that this observation is difficult to judge, but do you agree that there is a limit to distinctness? On a long enough timescale, (for example) wouldn’t all classical music sound like a song that has already come before?
I appreciate the back and forth and your arguments.
Also, I’m not saying that there is not infinite gradation of sounds or instrumentation, but there is a limit to the distinctness of sounds or instrumentation. Do you believe there are infinite distinct sounds or instruments possible?
This whole idea that you need “infinite variation”, or rather arbitrarily large variation, if artistic endeavors are to be worthwhile in the near term is just weird to me. A big enough space is plenty enough to keep us all busy for the foreseeable future, and this isn’t even accounting for the fact that art is in no small part about gaining a thorough understanding of such creative possibilities, as opposed to developing new ‘creations’ persay. After all, there’s already more music in the world than one could feasibly listen to in a lifetime!
Sure, I didn’t mean to imply that art is just about new creations. There are many other values to art and creativity of course. Also, I agree that we are fortunate to have an abundance of music available. So don’t take what I’m saying as a criticism of creativity or art, or not appreciating the value of them apart from newness. I’m more examining this topic in the interest of understanding human progress and discovery in general.
I agree that this idea is difficult to prove as of now, which is why I’m doing my best to explain my thought process as to what seems evident to me, and I’m appreciating the objections that others are raising. But if we get to the year 2200 and the majority of people still listen to music primarily from 1500-2050 (or whatever), then that does say something about our reality and human progress/discovery. It also is interesting to me that people intuitively view creativity as something open-ended and undefined (at least I did until a few years ago), when perhaps there is something objective and defined and limited about human discovery (which I now believe).
I really like your thread: thank you for writing me back!
I think you have good intuitions about how sound works. I don’t think I can determine whether there’s a consensus on what is good: I’d venture to guess that any audio humans can perceive sounds good to someone. A friend of mine sent me an album that was entirely industrial shrieking.
But I agree with you that there’s a limit to the distinctness—humans can only divide the frequency spectrum a certain number of times before they can’t hear gradation any more, they can only slice the time domain to a certain extent before they can’t hear transitions any more, and you can only slice the loudness domain to a certain extent before you can’t hear the difference between slightly louder and slightly quieter.
We can make basically any human-perceivable sound by sampling at 32 bits in 44.1khz. Many of those sounds won’t be interesting and they’ll sound the same as other sounds, of course. But if nothing else, that puts an upper limit on how much variation you can have. In ten minutes, at 32 bits, in 44.1khz, you have about 840MB of audio data. You could probably express any human-perceivable song in 840MB, and in practice, using psychoacoustic compression like MP3, it would take a lot less space to do the interesting ones.
I think that for us to run out of music, the domain of things that sound good has to be pretty small. Humans probably haven’t produced more than a billion pieces of music, but if we pretend all music is monophonic, that there are four possible note lengths, and twelve possible pitches (note: each of these assumptions is too small, based on what we hear in real music), then you only need to string six notes together before you get something that nobody has probably tried.
What I was really responding to were these ideas that I thought were implicit in what you were saying (but I don’t think you thought they were implicit):
if you try every human-perceptible sound, most of them will sound bad. (we don’t know if they’ll sound bad because there’s a ton of variation in what sounds good)
if you try every human-perceptible sound, most of them won’t be distinguishable. (The search space is so big that it doesn’t matter if 99.99% of them aren’t distinguishable. We don’t know, in general, what makes music ideas distinguishable, so we don’t know how big that is as a portion of the search space. If you think that this comes down to Complex Brain Things, which I imagine most composers do, then figuring out what makes them distinguishable might reduce to SAT. see all the things neural network researchers hate doing)
we are good enough at searching for combinations that we have probably tried all the ones that sound good. (there are so many combinations that exhaustively searching for them would take forever. If the problem reduces to SAT, we can’t do that much better than exhaustively searching them)
I think that some of the strategies we use to search for musical ideas without having to solve any NP-complete problems have dried up. Minimalism is one technique we used to generate music ideas for a while, and it was easy enough to execute that a lot of people generated good songs very fast. But it only lasted about a decade before composers in that genre brought in elements of other genres to fight the staleness.
After a couple hundred years, Bach-type chorales have dried up. (even though other kinds of medieval polyphony haven’t) The well of 1950s-style pop chord progressions appears to have dried up, but the orchestration style doesn’t seem to have. (If we think “nothing new under the sun” comes down to Complex Brain Things, then we can’t know for sure—we can just guess by looking around and figuring out if people are having trouble being creative in them.) A lot of conventional classical genres don’t appear to have dried up—new composers release surprising pieces in them all the time. (see e.g. Romantic-style piano. Google even did some really cool work in computer-generating original pieces that sound like that.)
When these search strategies die, a lot of composers are good at coming up with new search strategies for good songs. We don’t know exactly how they do that, but modern pop music contains a lot of variation that’s yet to filter into concert music, and my gut tells me that means the future is pretty bright.
Yes, you are getting into the heart of what I’m trying to examine. This concept began to form for me as I was writing and recording rock songs and trying to create a distinct sound within that genre. New distinct music is largely created intuitively by people borrowing on the past but adding variation (like you said). But songs contain a more specific balance of factors than I think people realize, which makes a song more like a complex puzzle than just a complex combination of attributes. Many factors must sync together correctly including chord progression, melody, key, rhythm, vocal style, instrumentation, and audio production. But those factors are all limited in their distinctness (limited notes on a scale, limited chords, limited instruments, limited vocal styles). And for a song to work well, all the factors must sync correctly. If you put Elvis’ voice instead of Kurt Cobain’s on Nirvana’s “Smells Like Teen Spirit”, it might be funny but it wouldn’t work as well. Kurt sings more like Paul Westerberg of the Replacements (listen to “Bastards of Young”), which is a specific distinct vocal style that they share (which works well for a certain type of style only).
So if you have 1000 (to be simple) vocal styles, 1000 chord progressions, 1000 combinations of instruments,....etc, it’s not that each specific vocal style could be paired with each chord progression and with each combination of instruments, etc. In fact, it would be only very specific combinations that would work well. So, the songs that work seem to be extremely sparse within the search space.
Along the same line, if you look at rock bands, they usually have a period of 5-20 years where they produce their best work. It seems to be that they run out of good songs (the possible puzzle combinations of factors within their own style). A band like AC/DC recorded their best songs in the 70s and 80s, and then most of what they released after that just sounded like them repeating their sound but with diminishing results. If there really was a lot of untapped songs within a band’s style, it seems like there would be at least a few counter examples of bands who produce the same high level of quality for 30 or 40 or 50 years, but I’ve never seen that happen.
And once you run out of new distinct factors (voice styles, or production styles, or instrumentation) then it seems like the potential for new distinct songs (as complex puzzles of those factors) also will run out. We have moved through genres over time, including better production in the last 20 years and more electronic aspects etc, but when will the well run dry?
It seems to me that with art (including music), we start with primitive attempts and instruments, but then we develop more complex music theory and new instruments, but then eventually we run out of new and our output will decline. While I was working on writing rock songs, I was noticing at the same time that bands I liked seemed to have declining quality of output, and there weren’t new bands in different styles but of equal quality releasing music to fill the void. The rate of creation of new and quality rock songs and styles seemed to be in decline. I don’t see there being a different method to unlock many other great songs or new styles than the intuitive method and trial-and-error that has been used for centuries; I think the well is just dry.
Anyway, the idea itself is interesting to me because if this concept applies to music then it seems like it would apply to all things involving creativity and discovery. That we can view all knowledge and creation as one thing (The Big Niche) that exists apart from whether possibilities have been created or not, and that it will all eventually be completed at some point in the future.
(see e.g. Romantic-style piano. Google even did some really cool work in computer-generating original pieces that sound like that.)
The best results in computer-generated romantic piano music are actually not from Google but from an unaffiliated researcher—see Composing music with recurrent neural networks. It’s true that Google Brain has released some generated piano sounds from their WaveNet, but those (1) are not actual piano sounds, rather they’re the network’s “dream” of what a piano might sound like; a real piano could never sound like that; also (2) these samples have comparatively little long-term structure, as a natural outcome of working on audio data as opposed to notated music.
These results are actually the reason I don’t buy that “strategies we use to search for musical ideas without having to solve any NP-complete problems have dried up”. Neural net inference doesn’t do any NP problem search, and this network creates rather innovative music simply by virtue of successful generalization, enabled by some very good (I’d even say outstanding) features of the model design.
That music doesn’t sound “rather innovative” to me. It sounds stereotyped, boring, and inept. (For the avoidance of doubt, it is also very impressive to make a computer generate music that isn’t a lot more stereotyped, boring and inept than that, and I would not be astonished to see this approach yielding markedly better music in the future.) It seems to me like it falls a long way short of, e.g., the music of “Emily Howell”.
That music doesn’t sound “rather innovative” to me.
Hmm. What sort of music are you most familiar with/like the most? The system does have some very significant shortcomings, which may account for why you find the output boring—however, I think it also has quite a few strong points. It’s just hard to point them out here, since I’m not sure what your musical background is, or how much you know already about the formal/scholarly theory of music.
What sort of music are you most familiar with/like the most?
Western art music (i.e., the sort commonly described as “classical”).
how much you know already about the formal/scholarly theory of music.
I know a diminished seventh from a minor third and a ritornello from a ritenuto, I can tell you which bit of a classical sonata-form movement is the exposition, and I have a tolerable understanding of the relationship between timbre and tuning systems. But I couldn’t enumerate the different species of counterpoint, express any useful opinions about the relative merits of two serialist works, or write you a convincing musical analysis of “Yesterday”. If that (either explicitly or via what I didn’t think of mentioning) doesn’t tell you what you want to know, feel free to ask more specific questions.
Western art music (i.e., the sort commonly described as “classical”).
Well, first of all, note that the music that system generates is entirely derived from the model’s understanding of a dataset/repertoire of “Western art music”. (Do you have any specific preferences about style/period too? That would be useful to know!)
For a start, note that you should not expect the system to capture any structure beyond the phrase level—since it’s trained from snippets which are a mere 8 bars long, the average history seen in training is just four bars. Within that limited scope, however, the model reaches quite impressive results. Next, one should know that every note in the pieces is generated by the exact same rules: the original model has no notion of “bass” or “lead”, nor does it generate ‘chords’ and then voice them in a later step. It’s entirely contrapuntal, albeit in the “keyboard” sort of counterpoint which does not organize the music as a fixed ensemble of ‘voices’ or ‘lines’. Somewhat more interestingly, the network architecture implies nothing whatsoever about such notions as “diatonicism” or “tonality”: Every hint of these things you hear in the music is a result of what the model has learned. Moreover, there’s basically no pre-existing notion that pieces should “stay within the same key” except when they’re “modulating” towards some other area: what the system does is freely driven by what the music itself has been implying about e.g. “key” and “scale degrees”, as best judged by the model. If the previous key no longer fits at any given point, this can cue a ‘modulation’. In a way, this means that the model is actually generating “freely atonal” music along “tonal” lines!
In my opinion, the most impressive parts are the transitions from one “musical idea” to the “next”, which would surely be described as “admirably flowing” and “lyrical” if similarly-clean transitions were found in a real piece of music. The same goes for the free “modulations” and changes in “key”: the underlying ‘force’ that made the system modulate can often be heard musically, and this also makes for a sense of lyricism combined with remote harmonic relationships that’s quite reminiscent of “harmonically-innovative” music from, say, the late 19th c. (Note that, by and large, this late-romantic music was not in the dataset! It’s simply an ‘emergent’ feature of what the model is doing.).
Something that I had not heard in previous music is this model’s eclecticism in style (“classical” vs. “romantic”, with a pinch of “impressionist” added at the last minute) and texture. Even more interesting than the clean transitions involving these elements, there is quite a bit of “creative” generalization arising from the fact that all of these styles were encompassed in the same model. So, we sometimes hear some more-or-less ‘classical’ elements thrown in a very ‘romantic’ spot, or vice versa, or music that sounds intermediate between the two. Finally, the very fact that this model is improvising classical music is worth noting persay. We know that improvisation has historically been a major part of the art-music tradition, and the output of such a system can at least give us some hint of what sort of ‘improvisation’ can even be musically feasible within that tradition, even after the practice itself has disappeared.
I think you misunderstood me, or maybe I wasn’t clear. I meant “of the strategies which we used to search for musical ideas, none of them involved solving NP-complete problems, and some of them have dried up.” I think what neural nets do to learn about music are pretty close to what humans do—once a learning tool finds a local minimum, it keeps attacking that local minimum until it refines it into something neat. I think a lot of strategies to produce music work like that.
I definitely don’t think most humans intentionally sit down and try to solve NP-complete problems when they write music, and I don’t think humans should do that either.
Actually, what this network does is a lot closer to pure improvisation than the process of successive refinement you’re describing here. Optimization i.e. (search for a local minimum) is used in the training stage, where the network uses a trial-and-error strategy to fit a model of “how music goes”. Once the model is fitted however, generating new pieces is a process of linear prediction, based on what the model has ‘written’ so far—this is actually among the most significant limitations of these models; they’re great for pure inspiration, but they’ll never reach a local optimum or successfully negotiate broader constraints other than by pure chance. That’s why I find it significant that what they come up with is nonetheless innovative and compelling. (There are of course neural-network-based systems that can do a lot more—such as AlphaGo—but I don’t know of anyone using these to make new music.)
Oh, absolutely! It’s misleading for me to talk about it like this because there’s a couple of different workflows:
train for a while to understand existing data. then optimize for a long time to try to impress the activation layer that konws the most about what the data means. (AlphaGo’s evaluation network, Deep Dream) Under this process you spend a long time optimizing for one thing (network’s ability to recognize) and then a long time optimizing for another thing (how much the network likes your current input)
train a neural network to minimize a loss function based on another neural network’s evaluation, then sample its output. (DCGAN) Under this process you spend a long time optimizing for one thing (the neural network’s loss function) but a short time sampling another thing. (outputs from the neural net)
train a neural network to approximate existing data and then just sample its output. (seq2seq, char-rnn, PixelRNN, WaveNet, AlphaGo’s policy network) Under this process you spend a long time optimizing for one thing (the loss function again) but a short time sampling another thing. (outputs from the neural net)
It’s kind of an important distinction because like with humans, neural networks that can improvise in linear time can be sampled really cheaply (taking deterministic time!), while neural networks that need you to do an optimization task are expensive to sample even though you’ve trained them.
I don’t mean to press you on a point, but when you say in reference to musical consensus, “Probably, but I think your example is a little bit too extreme to demonstrate your point”, I think it is important to say whether you believe there is any musical consensus of what is good, or if you believe there is zero consensus. The degree does not matter as to whether the point I’m trying to make is true. Is there any consensus based on how shared human nature interacts with physical sounds as to what is agreed upon as “good”? It seems difficult to argue that any consensus is completely arbitrary.
Also, I’m not saying that there is not infinite gradation of sounds or instrumentation, but there is a limit to the distinctness of sounds or instrumentation. Do you believe there are infinite distinct sounds or instruments possible? There isn’t clear language for what I’m trying to get at, but think about how a violin is more distinct from a tuba than from a cello. Or think of it in terms of being similar to the visual spectrum of light: there are infinite gradations of color, but there isn’t infinite distinctness. There are limits to the range of the visible spectrum of light, with the primary colors being most distinct from each other (but there is an infinite amount of gradation that can be categorized as sub-colors).
The point being that if there is an objective aspect as to what human nature appreciates as a song, and there is a limit to the distinctness of sounds (and other factors like rhythm, song structure etc), then there would be a limit to possible songs and a limit to possible songs that would be considered “good” by the general consensus (which I agree with you is very varied, but it is still non-arbitrary).
I think you are right to bring in painting or other forms of art to the discussion. What I’m really trying to do is explain a phenomena that I’ve observed in multiple forms of art. There is a pattern that appears to be taking place, that humans start out with very primitive forms of visual or auditory art, and then develop techniques and understanding to increase in complexity and open up new possibilities in art (like you refer to in your first couple paragraphs), but then the speed of development of more distinct works seems to slow down at a certain point and eventually decline. I agree that this observation is difficult to judge, but do you agree that there is a limit to distinctness? On a long enough timescale, (for example) wouldn’t all classical music sound like a song that has already come before?
I appreciate the back and forth and your arguments.
This whole idea that you need “infinite variation”, or rather arbitrarily large variation, if artistic endeavors are to be worthwhile in the near term is just weird to me. A big enough space is plenty enough to keep us all busy for the foreseeable future, and this isn’t even accounting for the fact that art is in no small part about gaining a thorough understanding of such creative possibilities, as opposed to developing new ‘creations’ persay. After all, there’s already more music in the world than one could feasibly listen to in a lifetime!
This is very true.
Sure, I didn’t mean to imply that art is just about new creations. There are many other values to art and creativity of course. Also, I agree that we are fortunate to have an abundance of music available. So don’t take what I’m saying as a criticism of creativity or art, or not appreciating the value of them apart from newness. I’m more examining this topic in the interest of understanding human progress and discovery in general.
I agree that this idea is difficult to prove as of now, which is why I’m doing my best to explain my thought process as to what seems evident to me, and I’m appreciating the objections that others are raising. But if we get to the year 2200 and the majority of people still listen to music primarily from 1500-2050 (or whatever), then that does say something about our reality and human progress/discovery. It also is interesting to me that people intuitively view creativity as something open-ended and undefined (at least I did until a few years ago), when perhaps there is something objective and defined and limited about human discovery (which I now believe).
I really like your thread: thank you for writing me back!
I think you have good intuitions about how sound works. I don’t think I can determine whether there’s a consensus on what is good: I’d venture to guess that any audio humans can perceive sounds good to someone. A friend of mine sent me an album that was entirely industrial shrieking.
But I agree with you that there’s a limit to the distinctness—humans can only divide the frequency spectrum a certain number of times before they can’t hear gradation any more, they can only slice the time domain to a certain extent before they can’t hear transitions any more, and you can only slice the loudness domain to a certain extent before you can’t hear the difference between slightly louder and slightly quieter.
We can make basically any human-perceivable sound by sampling at 32 bits in 44.1khz. Many of those sounds won’t be interesting and they’ll sound the same as other sounds, of course. But if nothing else, that puts an upper limit on how much variation you can have. In ten minutes, at 32 bits, in 44.1khz, you have about 840MB of audio data. You could probably express any human-perceivable song in 840MB, and in practice, using psychoacoustic compression like MP3, it would take a lot less space to do the interesting ones.
I think that for us to run out of music, the domain of things that sound good has to be pretty small. Humans probably haven’t produced more than a billion pieces of music, but if we pretend all music is monophonic, that there are four possible note lengths, and twelve possible pitches (note: each of these assumptions is too small, based on what we hear in real music), then you only need to string six notes together before you get something that nobody has probably tried.
What I was really responding to were these ideas that I thought were implicit in what you were saying (but I don’t think you thought they were implicit):
if you try every human-perceptible sound, most of them will sound bad. (we don’t know if they’ll sound bad because there’s a ton of variation in what sounds good)
if you try every human-perceptible sound, most of them won’t be distinguishable. (The search space is so big that it doesn’t matter if 99.99% of them aren’t distinguishable. We don’t know, in general, what makes music ideas distinguishable, so we don’t know how big that is as a portion of the search space. If you think that this comes down to Complex Brain Things, which I imagine most composers do, then figuring out what makes them distinguishable might reduce to SAT. see all the things neural network researchers hate doing)
we are good enough at searching for combinations that we have probably tried all the ones that sound good. (there are so many combinations that exhaustively searching for them would take forever. If the problem reduces to SAT, we can’t do that much better than exhaustively searching them)
I think that some of the strategies we use to search for musical ideas without having to solve any NP-complete problems have dried up. Minimalism is one technique we used to generate music ideas for a while, and it was easy enough to execute that a lot of people generated good songs very fast. But it only lasted about a decade before composers in that genre brought in elements of other genres to fight the staleness.
After a couple hundred years, Bach-type chorales have dried up. (even though other kinds of medieval polyphony haven’t) The well of 1950s-style pop chord progressions appears to have dried up, but the orchestration style doesn’t seem to have. (If we think “nothing new under the sun” comes down to Complex Brain Things, then we can’t know for sure—we can just guess by looking around and figuring out if people are having trouble being creative in them.) A lot of conventional classical genres don’t appear to have dried up—new composers release surprising pieces in them all the time. (see e.g. Romantic-style piano. Google even did some really cool work in computer-generating original pieces that sound like that.)
When these search strategies die, a lot of composers are good at coming up with new search strategies for good songs. We don’t know exactly how they do that, but modern pop music contains a lot of variation that’s yet to filter into concert music, and my gut tells me that means the future is pretty bright.
Thanks!
Yes, you are getting into the heart of what I’m trying to examine. This concept began to form for me as I was writing and recording rock songs and trying to create a distinct sound within that genre. New distinct music is largely created intuitively by people borrowing on the past but adding variation (like you said). But songs contain a more specific balance of factors than I think people realize, which makes a song more like a complex puzzle than just a complex combination of attributes. Many factors must sync together correctly including chord progression, melody, key, rhythm, vocal style, instrumentation, and audio production. But those factors are all limited in their distinctness (limited notes on a scale, limited chords, limited instruments, limited vocal styles). And for a song to work well, all the factors must sync correctly. If you put Elvis’ voice instead of Kurt Cobain’s on Nirvana’s “Smells Like Teen Spirit”, it might be funny but it wouldn’t work as well. Kurt sings more like Paul Westerberg of the Replacements (listen to “Bastards of Young”), which is a specific distinct vocal style that they share (which works well for a certain type of style only).
So if you have 1000 (to be simple) vocal styles, 1000 chord progressions, 1000 combinations of instruments,....etc, it’s not that each specific vocal style could be paired with each chord progression and with each combination of instruments, etc. In fact, it would be only very specific combinations that would work well. So, the songs that work seem to be extremely sparse within the search space.
Along the same line, if you look at rock bands, they usually have a period of 5-20 years where they produce their best work. It seems to be that they run out of good songs (the possible puzzle combinations of factors within their own style). A band like AC/DC recorded their best songs in the 70s and 80s, and then most of what they released after that just sounded like them repeating their sound but with diminishing results. If there really was a lot of untapped songs within a band’s style, it seems like there would be at least a few counter examples of bands who produce the same high level of quality for 30 or 40 or 50 years, but I’ve never seen that happen.
And once you run out of new distinct factors (voice styles, or production styles, or instrumentation) then it seems like the potential for new distinct songs (as complex puzzles of those factors) also will run out. We have moved through genres over time, including better production in the last 20 years and more electronic aspects etc, but when will the well run dry?
It seems to me that with art (including music), we start with primitive attempts and instruments, but then we develop more complex music theory and new instruments, but then eventually we run out of new and our output will decline. While I was working on writing rock songs, I was noticing at the same time that bands I liked seemed to have declining quality of output, and there weren’t new bands in different styles but of equal quality releasing music to fill the void. The rate of creation of new and quality rock songs and styles seemed to be in decline. I don’t see there being a different method to unlock many other great songs or new styles than the intuitive method and trial-and-error that has been used for centuries; I think the well is just dry.
Anyway, the idea itself is interesting to me because if this concept applies to music then it seems like it would apply to all things involving creativity and discovery. That we can view all knowledge and creation as one thing (The Big Niche) that exists apart from whether possibilities have been created or not, and that it will all eventually be completed at some point in the future.
The best results in computer-generated romantic piano music are actually not from Google but from an unaffiliated researcher—see Composing music with recurrent neural networks. It’s true that Google Brain has released some generated piano sounds from their WaveNet, but those (1) are not actual piano sounds, rather they’re the network’s “dream” of what a piano might sound like; a real piano could never sound like that; also (2) these samples have comparatively little long-term structure, as a natural outcome of working on audio data as opposed to notated music.
These results are actually the reason I don’t buy that “strategies we use to search for musical ideas without having to solve any NP-complete problems have dried up”. Neural net inference doesn’t do any NP problem search, and this network creates rather innovative music simply by virtue of successful generalization, enabled by some very good (I’d even say outstanding) features of the model design.
That music doesn’t sound “rather innovative” to me. It sounds stereotyped, boring, and inept. (For the avoidance of doubt, it is also very impressive to make a computer generate music that isn’t a lot more stereotyped, boring and inept than that, and I would not be astonished to see this approach yielding markedly better music in the future.) It seems to me like it falls a long way short of, e.g., the music of “Emily Howell”.
Hmm. What sort of music are you most familiar with/like the most? The system does have some very significant shortcomings, which may account for why you find the output boring—however, I think it also has quite a few strong points. It’s just hard to point them out here, since I’m not sure what your musical background is, or how much you know already about the formal/scholarly theory of music.
Western art music (i.e., the sort commonly described as “classical”).
I know a diminished seventh from a minor third and a ritornello from a ritenuto, I can tell you which bit of a classical sonata-form movement is the exposition, and I have a tolerable understanding of the relationship between timbre and tuning systems. But I couldn’t enumerate the different species of counterpoint, express any useful opinions about the relative merits of two serialist works, or write you a convincing musical analysis of “Yesterday”. If that (either explicitly or via what I didn’t think of mentioning) doesn’t tell you what you want to know, feel free to ask more specific questions.
Well, first of all, note that the music that system generates is entirely derived from the model’s understanding of a dataset/repertoire of “Western art music”. (Do you have any specific preferences about style/period too? That would be useful to know!)
For a start, note that you should not expect the system to capture any structure beyond the phrase level—since it’s trained from snippets which are a mere 8 bars long, the average history seen in training is just four bars. Within that limited scope, however, the model reaches quite impressive results.
Next, one should know that every note in the pieces is generated by the exact same rules: the original model has no notion of “bass” or “lead”, nor does it generate ‘chords’ and then voice them in a later step. It’s entirely contrapuntal, albeit in the “keyboard” sort of counterpoint which does not organize the music as a fixed ensemble of ‘voices’ or ‘lines’.
Somewhat more interestingly, the network architecture implies nothing whatsoever about such notions as “diatonicism” or “tonality”: Every hint of these things you hear in the music is a result of what the model has learned. Moreover, there’s basically no pre-existing notion that pieces should “stay within the same key” except when they’re “modulating” towards some other area: what the system does is freely driven by what the music itself has been implying about e.g. “key” and “scale degrees”, as best judged by the model. If the previous key no longer fits at any given point, this can cue a ‘modulation’. In a way, this means that the model is actually generating “freely atonal” music along “tonal” lines!
In my opinion, the most impressive parts are the transitions from one “musical idea” to the “next”, which would surely be described as “admirably flowing” and “lyrical” if similarly-clean transitions were found in a real piece of music. The same goes for the free “modulations” and changes in “key”: the underlying ‘force’ that made the system modulate can often be heard musically, and this also makes for a sense of lyricism combined with remote harmonic relationships that’s quite reminiscent of “harmonically-innovative” music from, say, the late 19th c. (Note that, by and large, this late-romantic music was not in the dataset! It’s simply an ‘emergent’ feature of what the model is doing.).
Something that I had not heard in previous music is this model’s eclecticism in style (“classical” vs. “romantic”, with a pinch of “impressionist” added at the last minute) and texture. Even more interesting than the clean transitions involving these elements, there is quite a bit of “creative” generalization arising from the fact that all of these styles were encompassed in the same model. So, we sometimes hear some more-or-less ‘classical’ elements thrown in a very ‘romantic’ spot, or vice versa, or music that sounds intermediate between the two.
Finally, the very fact that this model is improvising classical music is worth noting persay. We know that improvisation has historically been a major part of the art-music tradition, and the output of such a system can at least give us some hint of what sort of ‘improvisation’ can even be musically feasible within that tradition, even after the practice itself has disappeared.
Oh, thanks for the link!
I think you misunderstood me, or maybe I wasn’t clear. I meant “of the strategies which we used to search for musical ideas, none of them involved solving NP-complete problems, and some of them have dried up.” I think what neural nets do to learn about music are pretty close to what humans do—once a learning tool finds a local minimum, it keeps attacking that local minimum until it refines it into something neat. I think a lot of strategies to produce music work like that.
I definitely don’t think most humans intentionally sit down and try to solve NP-complete problems when they write music, and I don’t think humans should do that either.
Actually, what this network does is a lot closer to pure improvisation than the process of successive refinement you’re describing here. Optimization i.e. (search for a local minimum) is used in the training stage, where the network uses a trial-and-error strategy to fit a model of “how music goes”. Once the model is fitted however, generating new pieces is a process of linear prediction, based on what the model has ‘written’ so far—this is actually among the most significant limitations of these models; they’re great for pure inspiration, but they’ll never reach a local optimum or successfully negotiate broader constraints other than by pure chance. That’s why I find it significant that what they come up with is nonetheless innovative and compelling. (There are of course neural-network-based systems that can do a lot more—such as AlphaGo—but I don’t know of anyone using these to make new music.)
Oh, absolutely! It’s misleading for me to talk about it like this because there’s a couple of different workflows:
train for a while to understand existing data. then optimize for a long time to try to impress the activation layer that konws the most about what the data means. (AlphaGo’s evaluation network, Deep Dream) Under this process you spend a long time optimizing for one thing (network’s ability to recognize) and then a long time optimizing for another thing (how much the network likes your current input)
train a neural network to minimize a loss function based on another neural network’s evaluation, then sample its output. (DCGAN) Under this process you spend a long time optimizing for one thing (the neural network’s loss function) but a short time sampling another thing. (outputs from the neural net)
train a neural network to approximate existing data and then just sample its output. (seq2seq, char-rnn, PixelRNN, WaveNet, AlphaGo’s policy network) Under this process you spend a long time optimizing for one thing (the loss function again) but a short time sampling another thing. (outputs from the neural net)
It’s kind of an important distinction because like with humans, neural networks that can improvise in linear time can be sampled really cheaply (taking deterministic time!), while neural networks that need you to do an optimization task are expensive to sample even though you’ve trained them.