It would probably be more on-trend if you filtered out the ones not intended to be sample-efficient (it’s a bit odd to plot both data-efficient Rainbow and regular Rainbow) which is going to make temporal trends weird (what if the data-efficient Rainbow had been published first and only later did someone just try to set SOTA by running as long as possible?) and covered more agents (you cite SimPLe & CURL, but don’t plot them? and seem to omit DrQ & SPR entirely).
I also feel like there ought to be a better way to graph that to permit some eyeball extrapolation. For starts, maybe swap the x/y-axis? It’s weird to try to interpret it as ‘going up and to the left’.
A lot of the omissions you mention are due to inconsistent benchmarks (like the switch from the full Atari suite to Atari 100k with fewer and easier games) and me trying to keep results comparable.
This particular plot only has each year’s SOTA, as it would get too crowded with a higher temporal resolution (I used it for the comment, as it was the only one including smaller-sample results on Atari 100k and related benchmarks). I agree that it is not optimal for eyeballing trends.
I also agree that temporal trends can be problematic as people did not initially optimize for sample efficiency (I’m pretty sure I mention this in the paper); it might be useful to do a similar analysis for the recent Atari 100k results (but I felt that there was not enough temporal variation yet when I wrote the paper last year as sample efficiency seems to only have started receiving more interest starting in late 2019).
They do seem to cover SPR (an earlier version of SPR was called MPR). @flodorner If you do decide to update the plot, maybe you could update the label as well?
It would probably be more on-trend if you filtered out the ones not intended to be sample-efficient (it’s a bit odd to plot both data-efficient Rainbow and regular Rainbow) which is going to make temporal trends weird (what if the data-efficient Rainbow had been published first and only later did someone just try to set SOTA by running as long as possible?) and covered more agents (you cite SimPLe & CURL, but don’t plot them? and seem to omit DrQ & SPR entirely).
I also feel like there ought to be a better way to graph that to permit some eyeball extrapolation. For starts, maybe swap the x/y-axis? It’s weird to try to interpret it as ‘going up and to the left’.
A lot of the omissions you mention are due to inconsistent benchmarks (like the switch from the full Atari suite to Atari 100k with fewer and easier games) and me trying to keep results comparable.
This particular plot only has each year’s SOTA, as it would get too crowded with a higher temporal resolution (I used it for the comment, as it was the only one including smaller-sample results on Atari 100k and related benchmarks). I agree that it is not optimal for eyeballing trends.
I also agree that temporal trends can be problematic as people did not initially optimize for sample efficiency (I’m pretty sure I mention this in the paper); it might be useful to do a similar analysis for the recent Atari 100k results (but I felt that there was not enough temporal variation yet when I wrote the paper last year as sample efficiency seems to only have started receiving more interest starting in late 2019).
They do seem to cover SPR (an earlier version of SPR was called MPR). @flodorner If you do decide to update the plot, maybe you could update the label as well?