Outperform human novices on 50% of Atari games after only 20 minutes of training play time and no game specific knowledge.
For context, the original Atari playing deep Q-network outperforms professional game testers on 47% of games, but used hundreds of hours of play to train.
Previously Katja Grace posted that the original Atari task had been achieved early. Experts estimated the Atari fifty task would take 5 years with 50% chance (so, in 2021), though they thought there was a 25% chance it would take at least 20 years under a different question framing.
The survey doesn’t seem to define what ‘human novice’ performance is. But EfficientZero’s performance curve looks pretty linear in Figure 7 over the 220k frames, finishing at ~1.9x human gametester performance after 2h (6x the allotted time). So presumably at 20min, EfficientZero is ~0.3x 2h-gametester-performance (1.9x * 1⁄6)? That doesn’t strike me as being an improbable level of performance for a novice, so it’s possible that challenge has been met. If not, seems likely that we’re pretty close to it.
This milestone resembles the “Atari fifty” task in the 2016 Expert Survey in AI,
Previously Katja Grace posted that the original Atari task had been achieved early. Experts estimated the Atari fifty task would take 5 years with 50% chance (so, in 2021), though they thought there was a 25% chance it would take at least 20 years under a different question framing.
The survey doesn’t seem to define what ‘human novice’ performance is. But EfficientZero’s performance curve looks pretty linear in Figure 7 over the 220k frames, finishing at ~1.9x human gametester performance after 2h (6x the allotted time). So presumably at 20min, EfficientZero is ~0.3x 2h-gametester-performance (1.9x * 1⁄6)? That doesn’t strike me as being an improbable level of performance for a novice, so it’s possible that challenge has been met. If not, seems likely that we’re pretty close to it.