axioman comments on EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

axioman 4 Nov 2021 23:45 UTC
3 points
Do you have a source for Agent57 using the same network weights for all games?
- gwern 5 Nov 2021 1:19 UTC
  4 points
  Parent
  I don’t think it does, and reskimming the paper I don’t see any claim it does (using a single network seems to have been largely neglected since Popart). Prabhu might be thinking of how it uses a single fixed network architecture & set of hyperparameters across all games (which while showing generality, doesn’t give any transfer learning or anything).