I remain pretty happy with most of this, looking back—I think this remains clear, accessible, and about as truthful as possible without getting too technical.
I do want to grade my conclusions / predictions, though.
(1). I predicted that this work would quickly be exceeded in sample efficiency. This was wrong—it’s been a bit over a year and EfficientZero is still SOTA on Atari. My 3-to-24-month timeframe hasn’t run out, but I said that I expected “at least a 25% gain” towards the start of the time, which hasn’t happened.
(2). There has been a shift to multitask domains, or to multi-benchmark papers. This wasn’t too hard of a prediction, but I think it was correct. (Although of course good evidence for such a shift would require comprehensive lit review.)
To sample two—DreamerV3 is a very recently released model-based DeepMind algorithm. It does very well at Atari100k—it gets a better mean score then everything but EfficientZero—but it also does well at DMLab + 4 other benchmarks + even crafting a Minecraft diamond. The paper emphasizes the robustness of the algorithm, and is right to do so—once you get human-level sample efficiency on Atari100k, you really want to make sure you aren’t just overfitting to that!
And course the infamous Gato is a multitask agent across host of different domains, although the ultimate impact of it remains unclear at the moment.
(3). And finally—well, the last conclusion, that there is still a lot of space for big gains in performance in RL even without field-overturning new insights, is inevitably subjective. But I think the evidence still supports it.
I remain pretty happy with most of this, looking back—I think this remains clear, accessible, and about as truthful as possible without getting too technical.
I do want to grade my conclusions / predictions, though.
(1). I predicted that this work would quickly be exceeded in sample efficiency. This was wrong—it’s been a bit over a year and EfficientZero is still SOTA on Atari. My 3-to-24-month timeframe hasn’t run out, but I said that I expected “at least a 25% gain” towards the start of the time, which hasn’t happened.
(2). There has been a shift to multitask domains, or to multi-benchmark papers. This wasn’t too hard of a prediction, but I think it was correct. (Although of course good evidence for such a shift would require comprehensive lit review.)
To sample two—DreamerV3 is a very recently released model-based DeepMind algorithm. It does very well at Atari100k—it gets a better mean score then everything but EfficientZero—but it also does well at DMLab + 4 other benchmarks + even crafting a Minecraft diamond. The paper emphasizes the robustness of the algorithm, and is right to do so—once you get human-level sample efficiency on Atari100k, you really want to make sure you aren’t just overfitting to that!
And course the infamous Gato is a multitask agent across host of different domains, although the ultimate impact of it remains unclear at the moment.
(3). And finally—well, the last conclusion, that there is still a lot of space for big gains in performance in RL even without field-overturning new insights, is inevitably subjective. But I think the evidence still supports it.