I think TurnTrout’s Reward is not the optimization target addresses this, but I’m not entirely sure (feel free to correct me).
Thank you! That post then led me to https://www.lesswrong.com/posts/3RdvPS5LawYxLuHLH/hackable-rewards-as-a-safety-valve, which appears to be talking about exactly the same thing.
I think TurnTrout’s Reward is not the optimization target addresses this, but I’m not entirely sure (feel free to correct me).
Thank you! That post then led me to https://www.lesswrong.com/posts/3RdvPS5LawYxLuHLH/hackable-rewards-as-a-safety-valve, which appears to be talking about exactly the same thing.