Oliver Sourbut comments on Goodhart’s Law in Reinforcement Learning

Oliver Sourbut 16 Oct 2023 7:55 UTC
6 points
4
As I’ve previously said in person, this is one of the best technical articulations of Goodhart I’ve encountered. Really glad you made it into a paper!

It’s worth mentioning that I think the ‘angle between reward functions’ in the occupancy space should relate somewhat neatly to the ‘distances between reward functions’ thread of research (‘reward function theory’?) including stuff Joar has worked on, and my upcoming paper on delegation games.
- Joar Skalse 16 Oct 2023 9:03 UTC
  5 points
  0
  Parent
  including stuff Joar has worked on
  That is right! See this paper.
  - Oliver Sourbut 17 Oct 2023 10:14 UTC
    1 point
    0
    Parent
    Thanks, amended the link in my comment to point to this updated version