One plausible answer is that it does in fact reward hack/optimize the reward, because reward hacking/reward optimization has happened before empirically, so there are reasonable grounds to raise the hypothesis to plausibility:
https://x.com/moyix/status/1885069457912996128
One plausible answer is that it does in fact reward hack/optimize the reward, because reward hacking/reward optimization has happened before empirically, so there are reasonable grounds to raise the hypothesis to plausibility:
https://x.com/moyix/status/1885069457912996128