Raemon comments on EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Raemon 2 Nov 2021 22:58 UTC
LW: 11 AF: 7
AF
Can someone give a rough explanation of how this compares to the recent Deepmind atari-playing AI:
https://www.lesswrong.com/posts/mTGrrX8SZJ2tQDuqz/deepmind-generally-capable-agents-emerge-from-open-ended?commentId=bosARaWtGfR836shY#bosARaWtGfR836shY
And, for that matter, how both of them compare to the older deepmind paper:
https://deepmind.com/research/publications/2019/playing-atari-deep-reinforcement-learning
Are they accomplishing qualitatively different things? The same thing but better?
- Raemon 3 Nov 2021 0:23 UTC
  LW: 13 AF: 6
  AF Parent
  Update: I originally posted this question over here, then realized this post existed and maybe I should just post the question here. But then it turned out people had already started answering my question-post, so, I am declaring that the canonical place to answer the question.