gwern comments on EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern 3 Nov 2021 14:13 UTC
8 points
Even defining what is a ‘featureless room’ in full generality is difficult. After all, the literal pixel array will be different at most timesteps (and even if ALE games are discrete enough for that to not be true, there are plenty of environments with continuous state variables that never repeat exactly). That describes the opening room of Montezuma’s Revenge: you have to go in a long loop around the room, timing a jump over a monster that will kill you, before you get near the key which will give you the first reward after hundreds of timesteps. Go-Explore can solve MR and doesn’t suffer from the noisy TV problem because it does in fact do basically breadth+depth exploration (iterative widening), but it also relies on a human-written hack for deciding what states/nodes are novel or different from each other and potentially worth using as a starting point for exploration.

gwern comments on EfficientZero: human ALE sample-efficiency w/​MuZero+self-supervised