I don’t think it does, and reskimming the paper I don’t see any claim it does (using a single network seems to have been largely neglected since Popart). Prabhu might be thinking of how it uses a single fixed network architecture & set of hyperparameters across all games (which while showing generality, doesn’t give any transfer learning or anything).
It is a different net for each game. That is why they compare with DQN, not Agent57.
To train an Atari agent for 100k steps, it only needs 4 GPUs to train 7 hours.
The entire architecture is described in the Appendix A.1 Models and Hyper-parameters.
Yes.
This algorithm is more sample-efficient than humans, so it learned a specific game faster than a human could. This is definitely a huge breakthrough.
Do you have a source for Agent57 using the same network weights for all games?
I don’t think it does, and reskimming the paper I don’t see any claim it does (using a single network seems to have been largely neglected since Popart). Prabhu might be thinking of how it uses a single fixed network architecture & set of hyperparameters across all games (which while showing generality, doesn’t give any transfer learning or anything).