HungryHobo comments on AlphaGo versus Lee Sedol

HungryHobo 9 Mar 2016 18:07 UTC
7 points
I’m quite interested in how many of the methods employed in this AI can be applied to more general strategic problems.

From talking to a friend who did quite a bit of work in machine composition, he was of the opinion that tools for handling strategy tasks like go would also apply strongly to many design tasks like composing good music.
- Houshalter 9 Mar 2016 22:30 UTC
  6 points
  0
  Parent
  Sure, you can model music composition as a RL task. The AI composes a song, then predicts how much a human will like it. It then tries to produce songs that are more and more likely to be liked.
  
  Another interesting thing that alphago did, was start by predicting what moves a human would make. Then it switched to reinforcement learning. So for a music AI, you would start with one that can predict the next note in a song. Then you switch to RL, and adjust it’s predictions so that it is more likely to produce songs humans like, and less likely to produce ones we don’t like.
  
  However automated composition is something that a lot of people have experimented with before. So far there is nothing that works really well.
  - ShardPhoenix 10 Mar 2016 0:15 UTC
    7 points
    0
    Parent
    One difference is that you can’t get feedback as fast when dealing with human judgement rather than win/lose in a game (where AlphaGo can play millions of games against itself).
    - Houshalter 10 Mar 2016 4:52 UTC
      4 points
      0
      Parent
      Yes it would require a lot of human input.
      
      However the AI could learn to predict what humans like, and then use that as it’s judge. Trying to produce songs that it predicts humans will like. Then when it tests it on actual humans, it can see if it’s predictions were right and improve them.
      
      This is also a domain with vast amounts of unsupervised data available. We’ve created millions of songs, which it can learn from. Out of the space of all possible sounds, we’ve decided that this tiny subset is pleasing to listen to. There’s a lot of information in that.
    - gwern 10 Mar 2016 0:44 UTC
      3 points
      0
      Parent
      You can get fast feedback by reusing existing databases if your RL agent can do off-policy learning. (You can consider this what the supervised pre-learning phase is ‘really’ doing.) Your agent doesn’t have to take an action before it can learn from it. Consider the experience replay buffers. You could imagine a song-writing RL agent which has a huge experience replay buffer which is made just of fragments of songs you grabbed online (say, from the Touhou megatorrent with its 50k tracks).
  - Vaniver 9 Mar 2016 22:34 UTC
    2 points
    Parent
    
    However automated composition is something that a lot of people have experimented with before. So far there is nothing that works really well.
    
    Emily Howell?
    - Houshalter 10 Mar 2016 0:58 UTC
      6 points
      Parent
      I was thinking more like these examples:
      
      https://ericye16.com/music-rnn/
      
      http://www.hexahedria.com/2015/08/03/composing-music-with-recurrent-neural-networks/
      
      https://www.youtube.com/watch?v=0VTI1BBLydE
      
      https://highnoongmt.wordpress.com/2015/05/22/lisls-stis-recurrent-neural-networks-for-folk-music-generation/
      - gjm 10 Mar 2016 12:47 UTC
        2 points
        Parent
        I think what Vaniver means is: It seems that Emily Howell works pretty damn well, contrary to your claim that nothing does. (By, so far as I understand, means very different from any sort of neural network.)
    - 0mnus 18 Mar 2016 12:19 UTC
      2 points
      Parent
      I know the conversation here has run its course, but I just wanted to add: whether or not Emily Howell is seen as something that “works really well” as an automated system is probably up for debate. It seems to require quite a bit of input from Cope himself in order to come up with sensible, interesting music. For example, one of the most popular pieces from Emily Howell is this fugue: https://www.youtube.com/watch?v=jLR-_c_uCwI—we really don’t know how much influence Cope had in creating this piece of music, because the process of composition was not transparent at all.
- ChristianKl 10 Mar 2016 10:46 UTC
  5 points
  0
  Parent
  I think Deep Mind did focus on building this engine because they belief the methods they develop while doing it could potentially be transfered to other tasks.
- Gunnar_Zarncke 10 Mar 2016 7:25 UTC
  2 points
  Parent
  I think the basic method could be applied to a more general engine like that of Zillions of Games. And having an engine that plays any kind of strategy game well would be astonishing.
- MrMind 10 Mar 2016 8:28 UTC
  0 points
  Parent
  AlphaGo has convolutional neural network, supervised learning, self-generated supervised learning, a mix-up strategy between Monte Carlo rollouts and goal function estimation.
  All these strategies are apted to go because it is a spatial game with a very well defined strategy function.
  While I do see CNN and supervised learning well worth of being used for music, it is much more difficult to come up with something that resembles the third step in AlphaGo: generating millions of random ‘games’ (simphonies) with their own label (good music/bad music) to train an ‘intuitive’ network.
  - gwern 10 Mar 2016 15:03 UTC
    4 points
    Parent
    
    While I do see CNN and supervised learning well worth of being used for music, it is much more difficult to come up with something that resembles the third step in AlphaGo: generating millions of random ‘games’ (simphonies) with their own label (good music/bad music) to train an ‘intuitive’ network.
    
    Adversarial generative networks give you a good objective if you want to take a purely supervised approach.
  - ChristianKl 10 Mar 2016 10:53 UTC
    0 points
    0
    Parent
    
    generating millions of random ‘games’ (simphonies) with their own label (good music/bad music) to train an ‘intuitive’ network.
    
    A spotify like service could be used to label the quality.
    
    Alternatively it would also be nice to have music that’s trained for specific goals like helping people concentrate while they work or reducing stress.