(see e.g. Romantic-style piano. Google even did some really cool work in computer-generating original pieces that sound like that.)
The best results in computer-generated romantic piano music are actually not from Google but from an unaffiliated researcher—see Composing music with recurrent neural networks. It’s true that Google Brain has released some generated piano sounds from their WaveNet, but those (1) are not actual piano sounds, rather they’re the network’s “dream” of what a piano might sound like; a real piano could never sound like that; also (2) these samples have comparatively little long-term structure, as a natural outcome of working on audio data as opposed to notated music.
These results are actually the reason I don’t buy that “strategies we use to search for musical ideas without having to solve any NP-complete problems have dried up”. Neural net inference doesn’t do any NP problem search, and this network creates rather innovative music simply by virtue of successful generalization, enabled by some very good (I’d even say outstanding) features of the model design.
That music doesn’t sound “rather innovative” to me. It sounds stereotyped, boring, and inept. (For the avoidance of doubt, it is also very impressive to make a computer generate music that isn’t a lot more stereotyped, boring and inept than that, and I would not be astonished to see this approach yielding markedly better music in the future.) It seems to me like it falls a long way short of, e.g., the music of “Emily Howell”.
That music doesn’t sound “rather innovative” to me.
Hmm. What sort of music are you most familiar with/like the most? The system does have some very significant shortcomings, which may account for why you find the output boring—however, I think it also has quite a few strong points. It’s just hard to point them out here, since I’m not sure what your musical background is, or how much you know already about the formal/scholarly theory of music.
What sort of music are you most familiar with/like the most?
Western art music (i.e., the sort commonly described as “classical”).
how much you know already about the formal/scholarly theory of music.
I know a diminished seventh from a minor third and a ritornello from a ritenuto, I can tell you which bit of a classical sonata-form movement is the exposition, and I have a tolerable understanding of the relationship between timbre and tuning systems. But I couldn’t enumerate the different species of counterpoint, express any useful opinions about the relative merits of two serialist works, or write you a convincing musical analysis of “Yesterday”. If that (either explicitly or via what I didn’t think of mentioning) doesn’t tell you what you want to know, feel free to ask more specific questions.
Western art music (i.e., the sort commonly described as “classical”).
Well, first of all, note that the music that system generates is entirely derived from the model’s understanding of a dataset/repertoire of “Western art music”. (Do you have any specific preferences about style/period too? That would be useful to know!)
For a start, note that you should not expect the system to capture any structure beyond the phrase level—since it’s trained from snippets which are a mere 8 bars long, the average history seen in training is just four bars. Within that limited scope, however, the model reaches quite impressive results. Next, one should know that every note in the pieces is generated by the exact same rules: the original model has no notion of “bass” or “lead”, nor does it generate ‘chords’ and then voice them in a later step. It’s entirely contrapuntal, albeit in the “keyboard” sort of counterpoint which does not organize the music as a fixed ensemble of ‘voices’ or ‘lines’. Somewhat more interestingly, the network architecture implies nothing whatsoever about such notions as “diatonicism” or “tonality”: Every hint of these things you hear in the music is a result of what the model has learned. Moreover, there’s basically no pre-existing notion that pieces should “stay within the same key” except when they’re “modulating” towards some other area: what the system does is freely driven by what the music itself has been implying about e.g. “key” and “scale degrees”, as best judged by the model. If the previous key no longer fits at any given point, this can cue a ‘modulation’. In a way, this means that the model is actually generating “freely atonal” music along “tonal” lines!
In my opinion, the most impressive parts are the transitions from one “musical idea” to the “next”, which would surely be described as “admirably flowing” and “lyrical” if similarly-clean transitions were found in a real piece of music. The same goes for the free “modulations” and changes in “key”: the underlying ‘force’ that made the system modulate can often be heard musically, and this also makes for a sense of lyricism combined with remote harmonic relationships that’s quite reminiscent of “harmonically-innovative” music from, say, the late 19th c. (Note that, by and large, this late-romantic music was not in the dataset! It’s simply an ‘emergent’ feature of what the model is doing.).
Something that I had not heard in previous music is this model’s eclecticism in style (“classical” vs. “romantic”, with a pinch of “impressionist” added at the last minute) and texture. Even more interesting than the clean transitions involving these elements, there is quite a bit of “creative” generalization arising from the fact that all of these styles were encompassed in the same model. So, we sometimes hear some more-or-less ‘classical’ elements thrown in a very ‘romantic’ spot, or vice versa, or music that sounds intermediate between the two. Finally, the very fact that this model is improvising classical music is worth noting persay. We know that improvisation has historically been a major part of the art-music tradition, and the output of such a system can at least give us some hint of what sort of ‘improvisation’ can even be musically feasible within that tradition, even after the practice itself has disappeared.
I think you misunderstood me, or maybe I wasn’t clear. I meant “of the strategies which we used to search for musical ideas, none of them involved solving NP-complete problems, and some of them have dried up.” I think what neural nets do to learn about music are pretty close to what humans do—once a learning tool finds a local minimum, it keeps attacking that local minimum until it refines it into something neat. I think a lot of strategies to produce music work like that.
I definitely don’t think most humans intentionally sit down and try to solve NP-complete problems when they write music, and I don’t think humans should do that either.
Actually, what this network does is a lot closer to pure improvisation than the process of successive refinement you’re describing here. Optimization i.e. (search for a local minimum) is used in the training stage, where the network uses a trial-and-error strategy to fit a model of “how music goes”. Once the model is fitted however, generating new pieces is a process of linear prediction, based on what the model has ‘written’ so far—this is actually among the most significant limitations of these models; they’re great for pure inspiration, but they’ll never reach a local optimum or successfully negotiate broader constraints other than by pure chance. That’s why I find it significant that what they come up with is nonetheless innovative and compelling. (There are of course neural-network-based systems that can do a lot more—such as AlphaGo—but I don’t know of anyone using these to make new music.)
Oh, absolutely! It’s misleading for me to talk about it like this because there’s a couple of different workflows:
train for a while to understand existing data. then optimize for a long time to try to impress the activation layer that konws the most about what the data means. (AlphaGo’s evaluation network, Deep Dream) Under this process you spend a long time optimizing for one thing (network’s ability to recognize) and then a long time optimizing for another thing (how much the network likes your current input)
train a neural network to minimize a loss function based on another neural network’s evaluation, then sample its output. (DCGAN) Under this process you spend a long time optimizing for one thing (the neural network’s loss function) but a short time sampling another thing. (outputs from the neural net)
train a neural network to approximate existing data and then just sample its output. (seq2seq, char-rnn, PixelRNN, WaveNet, AlphaGo’s policy network) Under this process you spend a long time optimizing for one thing (the loss function again) but a short time sampling another thing. (outputs from the neural net)
It’s kind of an important distinction because like with humans, neural networks that can improvise in linear time can be sampled really cheaply (taking deterministic time!), while neural networks that need you to do an optimization task are expensive to sample even though you’ve trained them.
The best results in computer-generated romantic piano music are actually not from Google but from an unaffiliated researcher—see Composing music with recurrent neural networks. It’s true that Google Brain has released some generated piano sounds from their WaveNet, but those (1) are not actual piano sounds, rather they’re the network’s “dream” of what a piano might sound like; a real piano could never sound like that; also (2) these samples have comparatively little long-term structure, as a natural outcome of working on audio data as opposed to notated music.
These results are actually the reason I don’t buy that “strategies we use to search for musical ideas without having to solve any NP-complete problems have dried up”. Neural net inference doesn’t do any NP problem search, and this network creates rather innovative music simply by virtue of successful generalization, enabled by some very good (I’d even say outstanding) features of the model design.
That music doesn’t sound “rather innovative” to me. It sounds stereotyped, boring, and inept. (For the avoidance of doubt, it is also very impressive to make a computer generate music that isn’t a lot more stereotyped, boring and inept than that, and I would not be astonished to see this approach yielding markedly better music in the future.) It seems to me like it falls a long way short of, e.g., the music of “Emily Howell”.
Hmm. What sort of music are you most familiar with/like the most? The system does have some very significant shortcomings, which may account for why you find the output boring—however, I think it also has quite a few strong points. It’s just hard to point them out here, since I’m not sure what your musical background is, or how much you know already about the formal/scholarly theory of music.
Western art music (i.e., the sort commonly described as “classical”).
I know a diminished seventh from a minor third and a ritornello from a ritenuto, I can tell you which bit of a classical sonata-form movement is the exposition, and I have a tolerable understanding of the relationship between timbre and tuning systems. But I couldn’t enumerate the different species of counterpoint, express any useful opinions about the relative merits of two serialist works, or write you a convincing musical analysis of “Yesterday”. If that (either explicitly or via what I didn’t think of mentioning) doesn’t tell you what you want to know, feel free to ask more specific questions.
Well, first of all, note that the music that system generates is entirely derived from the model’s understanding of a dataset/repertoire of “Western art music”. (Do you have any specific preferences about style/period too? That would be useful to know!)
For a start, note that you should not expect the system to capture any structure beyond the phrase level—since it’s trained from snippets which are a mere 8 bars long, the average history seen in training is just four bars. Within that limited scope, however, the model reaches quite impressive results.
Next, one should know that every note in the pieces is generated by the exact same rules: the original model has no notion of “bass” or “lead”, nor does it generate ‘chords’ and then voice them in a later step. It’s entirely contrapuntal, albeit in the “keyboard” sort of counterpoint which does not organize the music as a fixed ensemble of ‘voices’ or ‘lines’.
Somewhat more interestingly, the network architecture implies nothing whatsoever about such notions as “diatonicism” or “tonality”: Every hint of these things you hear in the music is a result of what the model has learned. Moreover, there’s basically no pre-existing notion that pieces should “stay within the same key” except when they’re “modulating” towards some other area: what the system does is freely driven by what the music itself has been implying about e.g. “key” and “scale degrees”, as best judged by the model. If the previous key no longer fits at any given point, this can cue a ‘modulation’. In a way, this means that the model is actually generating “freely atonal” music along “tonal” lines!
In my opinion, the most impressive parts are the transitions from one “musical idea” to the “next”, which would surely be described as “admirably flowing” and “lyrical” if similarly-clean transitions were found in a real piece of music. The same goes for the free “modulations” and changes in “key”: the underlying ‘force’ that made the system modulate can often be heard musically, and this also makes for a sense of lyricism combined with remote harmonic relationships that’s quite reminiscent of “harmonically-innovative” music from, say, the late 19th c. (Note that, by and large, this late-romantic music was not in the dataset! It’s simply an ‘emergent’ feature of what the model is doing.).
Something that I had not heard in previous music is this model’s eclecticism in style (“classical” vs. “romantic”, with a pinch of “impressionist” added at the last minute) and texture. Even more interesting than the clean transitions involving these elements, there is quite a bit of “creative” generalization arising from the fact that all of these styles were encompassed in the same model. So, we sometimes hear some more-or-less ‘classical’ elements thrown in a very ‘romantic’ spot, or vice versa, or music that sounds intermediate between the two.
Finally, the very fact that this model is improvising classical music is worth noting persay. We know that improvisation has historically been a major part of the art-music tradition, and the output of such a system can at least give us some hint of what sort of ‘improvisation’ can even be musically feasible within that tradition, even after the practice itself has disappeared.
Oh, thanks for the link!
I think you misunderstood me, or maybe I wasn’t clear. I meant “of the strategies which we used to search for musical ideas, none of them involved solving NP-complete problems, and some of them have dried up.” I think what neural nets do to learn about music are pretty close to what humans do—once a learning tool finds a local minimum, it keeps attacking that local minimum until it refines it into something neat. I think a lot of strategies to produce music work like that.
I definitely don’t think most humans intentionally sit down and try to solve NP-complete problems when they write music, and I don’t think humans should do that either.
Actually, what this network does is a lot closer to pure improvisation than the process of successive refinement you’re describing here. Optimization i.e. (search for a local minimum) is used in the training stage, where the network uses a trial-and-error strategy to fit a model of “how music goes”. Once the model is fitted however, generating new pieces is a process of linear prediction, based on what the model has ‘written’ so far—this is actually among the most significant limitations of these models; they’re great for pure inspiration, but they’ll never reach a local optimum or successfully negotiate broader constraints other than by pure chance. That’s why I find it significant that what they come up with is nonetheless innovative and compelling. (There are of course neural-network-based systems that can do a lot more—such as AlphaGo—but I don’t know of anyone using these to make new music.)
Oh, absolutely! It’s misleading for me to talk about it like this because there’s a couple of different workflows:
train for a while to understand existing data. then optimize for a long time to try to impress the activation layer that konws the most about what the data means. (AlphaGo’s evaluation network, Deep Dream) Under this process you spend a long time optimizing for one thing (network’s ability to recognize) and then a long time optimizing for another thing (how much the network likes your current input)
train a neural network to minimize a loss function based on another neural network’s evaluation, then sample its output. (DCGAN) Under this process you spend a long time optimizing for one thing (the neural network’s loss function) but a short time sampling another thing. (outputs from the neural net)
train a neural network to approximate existing data and then just sample its output. (seq2seq, char-rnn, PixelRNN, WaveNet, AlphaGo’s policy network) Under this process you spend a long time optimizing for one thing (the loss function again) but a short time sampling another thing. (outputs from the neural net)
It’s kind of an important distinction because like with humans, neural networks that can improvise in linear time can be sampled really cheaply (taking deterministic time!), while neural networks that need you to do an optimization task are expensive to sample even though you’ve trained them.