This is amazing. So it’s the exact same agents performing well on all of these different tasks, not just the same general algorithm retrained on lots of examples. In which case, have they found a generally useful way around the catastrophic forgetting problem? I guess the whole training procedure, amount of compute + experience, and architecture, taken together, just solves catastrophic forgetting—at least for a far wider range of tasks than I’ve seen so far.
Could you use this technique to e.g. train the same agent to do well on chess and go?
I also notice as per the little animated gifs in the blogpost, that they gave each agent little death ray projectors to manipulate objects, and that they look a lot like Daleks.
Actually, I think you’re right. I always thought that MuZero was one and the same system for every game, but the Nature paper describes it as an architecture that can be applied to learn different games. I’d like a confirmation from someone who actually studied it more, but it looks like MuZero indeed isn’t the same system for each game.
Yep, they’re different. It’s just an architecture. Among other things, Chess and Go have different input/action spaces, so the same architecture can’t be used on both without some way to handle this.
This paper uses an egocentric input, which allows many different types of tasks to use the same architecture. That would be the equivalent of learning Chess/Go based on pictures of the board.
I think if anything’s allowed it to learn more diverse tasks, it’s the attentional layers that have gotten thrown in at the recurrent step (though I haven’t actually read beyond the blog post, so I don’t know what I’m talking about). In which case it seems like it’s a question of how much data and compute you want to throw at the problem. But I’ll edit this after I read the paper and aren’t just making crazy talk.
This is amazing. So it’s the exact same agents performing well on all of these different tasks, not just the same general algorithm retrained on lots of examples. In which case, have they found a generally useful way around the catastrophic forgetting problem? I guess the whole training procedure, amount of compute + experience, and architecture, taken together, just solves catastrophic forgetting—at least for a far wider range of tasks than I’ve seen so far.
Could you use this technique to e.g. train the same agent to do well on chess and go?
I also notice as per the little animated gifs in the blogpost, that they gave each agent little death ray projectors to manipulate objects, and that they look a lot like Daleks.
If I don’t misunderstand your question, this is something they already did with MuZero.
Didn’t they train a separate MuZero agent for each game? E.g. the page you link only talks about being able to learn without pre-existing knowledge.
Actually, I think you’re right. I always thought that MuZero was one and the same system for every game, but the Nature paper describes it as an architecture that can be applied to learn different games. I’d like a confirmation from someone who actually studied it more, but it looks like MuZero indeed isn’t the same system for each game.
Yep, they’re different. It’s just an architecture. Among other things, Chess and Go have different input/action spaces, so the same architecture can’t be used on both without some way to handle this.
This paper uses an egocentric input, which allows many different types of tasks to use the same architecture. That would be the equivalent of learning Chess/Go based on pictures of the board.
I think if anything’s allowed it to learn more diverse tasks, it’s the attentional layers that have gotten thrown in at the recurrent step (though I haven’t actually read beyond the blog post, so I don’t know what I’m talking about). In which case it seems like it’s a question of how much data and compute you want to throw at the problem. But I’ll edit this after I read the paper and aren’t just making crazy talk.