maximkazhenkov comments on EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

maximkazhenkov 4 Nov 2021 10:53 UTC
LW: 3 AF: 2
AF
Oh I see, did I misunderstand point 1. from Razied then or was it mistaken? I thought $H$ and $G$ were trained separately with $L_{similarity}$
- Vanessa Kosoy 4 Nov 2021 11:47 UTC
  LW: 5 AF: 1
  AF Parent
  No, they are training all the networks together. The original MuZero didn’t have $L_{similarity}$ , it learned the dynamics only via the reward-prediction terms.