To solve this problem, MuZero adds a new training target for the neural network
Should probably be edited MuZero → EfficientZero.
Anyhow, great post. I enjoyed the EfficientZero paper and would recommend it to any other interested dilletantes, and this post did a good job putting things in the context of previous work.
The temporal dynamics used in the model seem to be really simple/cheap. Do you think they’re just getting away with simple dynamics because Atari is simple, or do you think that even for hard problems we will find representations such that their dynamics are simple?
Should probably be edited MuZero → EfficientZero.
Anyhow, great post. I enjoyed the EfficientZero paper and would recommend it to any other interested dilletantes, and this post did a good job putting things in the context of previous work.
The temporal dynamics used in the model seem to be really simple/cheap. Do you think they’re just getting away with simple dynamics because Atari is simple, or do you think that even for hard problems we will find representations such that their dynamics are simple?