If you care about human demonstrations, it seems like Atari-HEAD and the CrowdPlay Atari dataset both do exactly this? And while there haven’t been too much work in this area, a quick Google search let me find two papers that do analyze IRL variants on Atari-HEAD: https://arxiv.org/abs/1908.02511v2 and https://arxiv.org/abs/2004.00981v2 .
My guess is the reason there hasn’t been much recent work in this area is because there just aren’t many people who think that value learning from demonstrations is interesting (instead, people have moved to pairwise comparisons of trajectories or language feedback). In addition, as LMs have become more capable, most of the existing value learning researchers have also moved on from working with video games to moving on LMs.
Every single algorithmic IRL paper on video games does this, at least with Deep RL demonstrators. (Here’s a list of 4 examples: https://arxiv.org/abs/1810.10593, https://proceedings.mlr.press/v97/brown19a.html, https://arxiv.org/abs/1902.07742, https://arxiv.org/abs/2002.09089, )
If you care about human demonstrations, it seems like Atari-HEAD and the CrowdPlay Atari dataset both do exactly this? And while there haven’t been too much work in this area, a quick Google search let me find two papers that do analyze IRL variants on Atari-HEAD: https://arxiv.org/abs/1908.02511v2 and https://arxiv.org/abs/2004.00981v2 .
My guess is the reason there hasn’t been much recent work in this area is because there just aren’t many people who think that value learning from demonstrations is interesting (instead, people have moved to pairwise comparisons of trajectories or language feedback). In addition, as LMs have become more capable, most of the existing value learning researchers have also moved on from working with video games to moving on LMs.