Thomas Kwa comments on Catastrophic Goodhart in RL with KL penalty

Thomas Kwa 4 Aug 2024 2:35 UTC
2 points
0
I think that paper and this one are complementary. Regularizing on the state-action distribution fixes problems with the action distribution, but if it’s still using KL divergence you still get the problems in this paper. The latest version on arxiv mentions this briefly.