axioman comments on EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

axioman 4 Nov 2021 23:10 UTC
33 points
I guess I should update my paper on trends in sample efficiency soon / check whether recent developments are on trend (please message me if you are interested in doing this). This improvement does not seem to be extremely off-trend, but is definitely a bit more than I would have expected this year. Also, note that this results does NOT use the full suite of Atari games, but rather a subset of easier ones.
- gwern 4 Nov 2021 23:20 UTC
  7 points
  Parent
  It would probably be more on-trend if you filtered out the ones not intended to be sample-efficient (it’s a bit odd to plot both data-efficient Rainbow and regular Rainbow) which is going to make temporal trends weird (what if the data-efficient Rainbow had been published first and only later did someone just try to set SOTA by running as long as possible?) and covered more agents (you cite SimPLe & CURL, but don’t plot them? and seem to omit DrQ & SPR entirely).
  
  I also feel like there ought to be a better way to graph that to permit some eyeball extrapolation. For starts, maybe swap the x/y-axis? It’s weird to try to interpret it as ‘going up and to the left’.
  - axioman 4 Nov 2021 23:37 UTC
    7 points
    Parent
    A lot of the omissions you mention are due to inconsistent benchmarks (like the switch from the full Atari suite to Atari 100k with fewer and easier games) and me trying to keep results comparable.
    
    This particular plot only has each year’s SOTA, as it would get too crowded with a higher temporal resolution (I used it for the comment, as it was the only one including smaller-sample results on Atari 100k and related benchmarks). I agree that it is not optimal for eyeballing trends.
    
    I also agree that temporal trends can be problematic as people did not initially optimize for sample efficiency (I’m pretty sure I mention this in the paper); it might be useful to do a similar analysis for the recent Atari 100k results (but I felt that there was not enough temporal variation yet when I wrote the paper last year as sample efficiency seems to only have started receiving more interest starting in late 2019).
  - Ankesh Anand 7 Nov 2021 20:10 UTC
    1 point
    Parent
    They do seem to cover SPR (an earlier version of SPR was called MPR). @flodorner If you do decide to update the plot, maybe you could update the label as well?

axioman comments on EfficientZero: human ALE sample-efficiency w/​MuZero+self-supervised

axioman comments on EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised