Vanessa Kosoy comments on EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Vanessa Kosoy 4 Nov 2021 11:47 UTC
LW: 5 AF: 1
AF
No, they are training all the networks together. The original MuZero didn’t have $L_{similarity}$ , it learned the dynamics only via the reward-prediction terms.