Rohin Shah comments on Learning preferences by looking at the world

Rohin Shah 15 Feb 2019 22:17 UTC
LW: 4 AF: 2
AF
(But maybe these questions aren’t very important if the main point here isn’t offering RLSP as a concrete technique for people to use but more that “state of the world tells us a lot about what humans care about”.)
Yeah, I think that’s basically my position.
But to try to give an answer anyway, I suspect that the benefits of having a lot of data via large-scale IRL will make it significantly outperform RLSP, even if you could get a longer time horizon on RLSP. There might be weird effects where the RLSP reward is less Goodhart-able (since it tends to prioritize keeping the state the same) that make the RLSP reward better to maximize, even though it captures fewer aspects of “what humans care about”. On the other hand, RLSP is much more fragile; slight errors in dynamics / features / action space will lead to big errors in the inferred reward; I would guess this is less true of large-scale IRL, so in practice I’d guess that large-scale IRL would still be better. But both would be bad.