OliverHayman comments on Goodhart’s Law in Reinforcement Learning

OliverHayman 16 Oct 2023 23:31 UTC
6 points
4
The reason we’re using a uniform distribution is that it follows naturally from the math, but maybe an intuitive explanation is the following: the reason this is weird is that most realistic distributions are only going to sample from a small number of states/actions. Whereas the uniform distribution more or less encodes that the reward functions are similar across most states/actions. So it’s encoding something about generalization.