Sammy Martin comments on DeepMind: Generally capable agents emerge from open-ended play

Sammy Martin 27 Jul 2021 18:20 UTC
LW: 9 AF: 4
AF
This is amazing. So it’s the exact same agents performing well on all of these different tasks, not just the same general algorithm retrained on lots of examples. In which case, have they found a generally useful way around the catastrophic forgetting problem? I guess the whole training procedure, amount of compute + experience, and architecture, taken together, just solves catastrophic forgetting—at least for a far wider range of tasks than I’ve seen so far.
Could you use this technique to e.g. train the same agent to do well on chess and go?
I also notice as per the little animated gifs in the blogpost, that they gave each agent little death ray projectors to manipulate objects, and that they look a lot like Daleks.
- adamShimi 27 Jul 2021 18:41 UTC
  LW: 4 AF: 3
  AF Parent
  Could you use this technique to e.g. train the same agent to do well on chess and go?
  If I don’t misunderstand your question, this is something they already did with MuZero.
  - Kaj_Sotala 27 Jul 2021 18:48 UTC
    LW: 24 AF: 15
    AF Parent
    Didn’t they train a separate MuZero agent for each game? E.g. the page you link only talks about being able to learn without pre-existing knowledge.
    - adamShimi 27 Jul 2021 20:30 UTC
      LW: 9 AF: 5
      AF Parent
      Actually, I think you’re right. I always thought that MuZero was one and the same system for every game, but the Nature paper describes it as an architecture that can be applied to learn different games. I’d like a confirmation from someone who actually studied it more, but it looks like MuZero indeed isn’t the same system for each game.
      - wangscarpet 28 Jul 2021 15:28 UTC
        6 points
        Parent
        Yep, they’re different. It’s just an architecture. Among other things, Chess and Go have different input/action spaces, so the same architecture can’t be used on both without some way to handle this.
        This paper uses an egocentric input, which allows many different types of tasks to use the same architecture. That would be the equivalent of learning Chess/Go based on pictures of the board.
- Charlie Steiner 28 Jul 2021 18:09 UTC
  3 points
  Parent
  I think if anything’s allowed it to learn more diverse tasks, it’s the attentional layers that have gotten thrown in at the recurrent step (though I haven’t actually read beyond the blog post, so I don’t know what I’m talking about). In which case it seems like it’s a question of how much data and compute you want to throw at the problem. But I’ll edit this after I read the paper and aren’t just making crazy talk.