Daniel Kokotajlo comments on EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Daniel Kokotajlo 15 Nov 2021 15:15 UTC
LW: 4 AF: 3
AF
They train for 220k steps for each agent and mention that 100k steps takes 7 hours on 4 GPUs (no mention of which gpus, but maybe RTX3090 would be a good guess?)
Holy cow, am I reading that right? RTX3090 costs, like, $2000. So they were able to train this whole thing for about one day’s worth of effort using equipment that cost less than $10K in total? That means there’s loads of room to scale this up… It means that they could (say) train a version of this architecture with 1000x more parameters and 100x more training data for about $10M and 100 days. Right?
- Razied 16 Nov 2021 1:02 UTC
  5 points
  Parent
  You’re missing a factor for the number of agents trained (one for each atari game), so in fact this should correspond to about one month of training for the whole game library. More if you want to run each game with multiple random seeds to get good statistics, as you would if you’re publishing a paper. But yeah, for a single task like protein folding or some other crucial RL task that only runs once, this could easily be scaled up a lot with GPT-3 scale money.
  - Daniel Kokotajlo 16 Nov 2021 11:03 UTC
    4 points
    Parent
    Ah right, thanks!
    How well do you think it would generalize? Like, say we made it 1000x bigger and trained it on 100x more training data, but instead of 1 game for 100x longer it was 100 games? Would it be able to do all the games? Would it be better or worse than models specialized to particular games, of similar size and architecture and training data length?

Daniel Kokotajlo comments on EfficientZero: human ALE sample-efficiency w/​MuZero+self-supervised

Daniel Kokotajlo comments on EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised