You’re missing a factor for the number of agents trained (one for each atari game), so in fact this should correspond to about one month of training for the whole game library. More if you want to run each game with multiple random seeds to get good statistics, as you would if you’re publishing a paper. But yeah, for a single task like protein folding or some other crucial RL task that only runs once, this could easily be scaled up a lot with GPT-3 scale money.
How well do you think it would generalize? Like, say we made it 1000x bigger and trained it on 100x more training data, but instead of 1 game for 100x longer it was 100 games? Would it be able to do all the games? Would it be better or worse than models specialized to particular games, of similar size and architecture and training data length?
You’re missing a factor for the number of agents trained (one for each atari game), so in fact this should correspond to about one month of training for the whole game library. More if you want to run each game with multiple random seeds to get good statistics, as you would if you’re publishing a paper. But yeah, for a single task like protein folding or some other crucial RL task that only runs once, this could easily be scaled up a lot with GPT-3 scale money.
Ah right, thanks!
How well do you think it would generalize? Like, say we made it 1000x bigger and trained it on 100x more training data, but instead of 1 game for 100x longer it was 100 games? Would it be able to do all the games? Would it be better or worse than models specialized to particular games, of similar size and architecture and training data length?