leogao comments on Scaling Laws for Reward Model Overoptimization

leogao 26 Oct 2022 1:47 UTC
2 points
0
We mostly used KL divergence just because this is what prior works use to measure the amount of optimization done with RL. KL also has the property that it puts a hard constraint on how much the logits can change. There also isn’t any particular need for it to be a distance function.

I think there is at least some subset of Regressional Goodhart (where the proxy is the gold + some noise distribution satisfying the assumptions of appendix A) that can be cleanly separated out, but I agree that the rest is a bit blurry.