A possible answer is that humans do optimize for reward, but the way they avoid falling into degenerate solutions to maximize rewards ultimately comes down to negative feedback loops akin to the hedonic treadmill.
It essentially exploits the principle that the problem is not the initial dose of reward, but the feedback loop of getting more and more reward via reward hacking that’s the problem.
Hedonic loops and Taming RL show how that’s done in the brain.
A possible answer is that humans do optimize for reward, but the way they avoid falling into degenerate solutions to maximize rewards ultimately comes down to negative feedback loops akin to the hedonic treadmill.
It essentially exploits the principle that the problem is not the initial dose of reward, but the feedback loop of getting more and more reward via reward hacking that’s the problem.
Hedonic loops and Taming RL show how that’s done in the brain.
https://www.lesswrong.com/posts/3mwfyLpnYqhqvprbb/hedonic-loops-and-taming-rl