I don’t think it does, and reskimming the paper I don’t see any claim it does (using a single network seems to have been largely neglected since Popart). Prabhu might be thinking of how it uses a single fixed network architecture & set of hyperparameters across all games (which while showing generality, doesn’t give any transfer learning or anything).
I don’t think it does, and reskimming the paper I don’t see any claim it does (using a single network seems to have been largely neglected since Popart). Prabhu might be thinking of how it uses a single fixed network architecture & set of hyperparameters across all games (which while showing generality, doesn’t give any transfer learning or anything).