Pattern comments on When Goodharting is optimal: linear vs diminishing returns, unlikely vs likely, and other factors

Pattern 20 Dec 2019 22:06 UTC
1 point
It’s not clear how well ‘increasing switching increases proximity to the ideal reward function’ generalizes beyond this problem. (And we probably want the robot to not run forever.)