gwern comments on EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern 5 Nov 2021 1:19 UTC
4 points
I don’t think it does, and reskimming the paper I don’t see any claim it does (using a single network seems to have been largely neglected since Popart). Prabhu might be thinking of how it uses a single fixed network architecture & set of hyperparameters across all games (which while showing generality, doesn’t give any transfer learning or anything).

gwern comments on EfficientZero: human ALE sample-efficiency w/​MuZero+self-supervised