Ankesh Anand comments on EfficientZero: How It Works

Ankesh Anand 26 Nov 2021 18:47 UTC
28 points
Great post! I think you might wanna emphasize just how crucial ReAnalyse is for data-efficiency (the default MuZero is quite sample in-efficient), and how the reanalyse-ratio can be tuned easily for any data budget using a log-linear scaling law. You can also interpret the off-policy correction thing as running ReAnalyse twice, so my TL;DR of EfficientZero would be “MuZero ReAnalyse + SPR”.
Regarding contrastive vs SPR, I don’t think you would find a performance boost using a contrastive loss compared to SPR on Atari at least. We did an ablation for this in the SPR paper (Table 6, appendix). I suspect the reason contrastive works (slightly) better on Procgen is because of the procedural diversity there making negative examples much more informative.
Definitely agree about moving to multi-task test beds as the next frontier in RL. I also suspect we would see more non tabula-rasa RL methods, ones that start off with general-purpose pre-trained models or representations and then only do a tiny amount of fine-tuning on the actual RL task.
- 1a3orn 29 Nov 2021 13:21 UTC
  2 points
  Parent
  Agreed, I added an extra paragraph emphasizing ReAnalyse. And thanks a ton for pointing that out that ablation, I had totally missed that.