OliverHayman comments on Goodhart’s Law in Reinforcement Learning

OliverHayman 16 Oct 2023 22:54 UTC
4 points
3
So more concretely, this is work towards some sort of RLHF training regime that “provably” avoids Goodharting. The main issue is that a lot of the numbers we’re using are quite hard to approximate.