I don’t think it does, and reskimming the paper I don’t see any claim it does (using a single network seems to have been largely neglected since Popart). Prabhu might be thinking of how it uses a single fixed network architecture & set of hyperparameters across all games (which while showing generality, doesn’t give any transfer learning or anything).
Do you have a source for Agent57 using the same network weights for all games?
I don’t think it does, and reskimming the paper I don’t see any claim it does (using a single network seems to have been largely neglected since Popart). Prabhu might be thinking of how it uses a single fixed network architecture & set of hyperparameters across all games (which while showing generality, doesn’t give any transfer learning or anything).