Matthew Barnett comments on EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Matthew Barnett Nov 2, 2021, 6:32 PM
LW: 24 AF: 11
AF
This milestone resembles the “Atari fifty” task in the 2016 Expert Survey in AI,
Outperform human novices on 50% of Atari games after only 20 minutes of training play time and no game specific knowledge.
For context, the original Atari playing deep Q-network outperforms professional game testers on 47% of games, but used hundreds of hours of play to train.
Previously Katja Grace posted that the original Atari task had been achieved early. Experts estimated the Atari fifty task would take 5 years with 50% chance (so, in 2021), though they thought there was a 25% chance it would take at least 20 years under a different question framing.
- gwern Nov 2, 2021, 8:05 PM
  LW: 21 AF: 10
  AF Parent
  The survey doesn’t seem to define what ‘human novice’ performance is. But EfficientZero’s performance curve looks pretty linear in Figure 7 over the 220k frames, finishing at ~1.9x human gametester performance after 2h (6x the allotted time). So presumably at 20min, EfficientZero is ~0.3x 2h-gametester-performance (1.9x * ¹⁄₆)? That doesn’t strike me as being an improbable level of performance for a novice, so it’s possible that challenge has been met. If not, seems likely that we’re pretty close to it.