Not out of the box, but it’s also not designed at all for doing exploration. Exploration in MuZero is an obvious but largely (ahem) unexplored topic. Such is research: only a few people in the world can do research with MuZero on meaningful problems like ALE, and not everything will happen at once. I think the model-based nature of MuZero means that a lot of past approaches (like training an ensemble of MuZeros and targeting parts of the game tree where the models disagree most on their predictions) ought to port into it pretty easily. We’ll see if that’s enough to match Go-Explore.
Can EfficientZero beat Montezuma’s Revenge?
Not out of the box, but it’s also not designed at all for doing exploration. Exploration in MuZero is an obvious but largely (ahem) unexplored topic. Such is research: only a few people in the world can do research with MuZero on meaningful problems like ALE, and not everything will happen at once. I think the model-based nature of MuZero means that a lot of past approaches (like training an ensemble of MuZeros and targeting parts of the game tree where the models disagree most on their predictions) ought to port into it pretty easily. We’ll see if that’s enough to match Go-Explore.