We mostly used KL divergence just because this is what prior works use to measure the amount of optimization done with RL. KL also has the property that it puts a hard constraint on how much the logits can change. There also isn’t any particular need for it to be a distance function.
I think there is at least some subset of Regressional Goodhart (where the proxy is the gold + some noise distribution satisfying the assumptions of appendix A) that can be cleanly separated out, but I agree that the rest is a bit blurry.
We mostly used KL divergence just because this is what prior works use to measure the amount of optimization done with RL. KL also has the property that it puts a hard constraint on how much the logits can change. There also isn’t any particular need for it to be a distance function.
I think there is at least some subset of Regressional Goodhart (where the proxy is the gold + some noise distribution satisfying the assumptions of appendix A) that can be cleanly separated out, but I agree that the rest is a bit blurry.