Noosphere89 answers Seriously, what goes wrong with “reward the agent when it makes you smile”?

Noosphere89 10 Feb 2025 22:07 UTC
2 points
0
One plausible answer is that it does in fact reward hack/optimize the reward, because reward hacking/reward optimization has happened before empirically, so there are reasonable grounds to raise the hypothesis to plausibility:
https://x.com/moyix/status/1885069457912996128