This seems like really great work, nice job! I’d be excited to see more empirical work around inner alignment.
One of the things I really like about this work is the cute videos that clearly demonstrate ‘this agent is doing dumb stuff because its objective is non-robust’. Have you considered putting shorter clips of some of the best bits to Youtube, or making GIFs? (Eg, a 5-10 second clip of the CoinRun agent during train, followed by a 5-10 second clip of the CoinRun agent during test). It seemed that one of the major strengths of the CoastRunners clip was how easily shareable and funny it was, and I could imagine this research getting more exposure if it’s easier to share highlights. I found the Google Drive pretty hard to navigate
This seems like really great work, nice job! I’d be excited to see more empirical work around inner alignment.
One of the things I really like about this work is the cute videos that clearly demonstrate ‘this agent is doing dumb stuff because its objective is non-robust’. Have you considered putting shorter clips of some of the best bits to Youtube, or making GIFs? (Eg, a 5-10 second clip of the CoinRun agent during train, followed by a 5-10 second clip of the CoinRun agent during test). It seemed that one of the major strengths of the CoastRunners clip was how easily shareable and funny it was, and I could imagine this research getting more exposure if it’s easier to share highlights. I found the Google Drive pretty hard to navigate