I feel like there is some vocabulary confusion in the genesis of this post.
“Reward” is hard coded into the agents. The Dinosaurs of Jurrasic Park (spoiler alert) were genetically engineered to lack iodine. So, the trainers could use iodine as a reward to incentives other behaviors because be definition the dinos valued iodine as a terminal value.
In humans Seratonin and Dopamine bonding to appropriate brain receptors are DNA-coded terminal values that inherently train us to pursue certain behaviors (eg food, sex).
An AI is, by definition, going to take whatever actions maximize its Reward system. That’s what having a Reward system means.
I think the terminological confusion is with you: what you’re talking about is more like what is called in some RL algorithms a value function.
Does a chess-playing RL agent make whichever move maximises reward? Not unless it has converged to the optimal policy, which in practice it hasn’t. The reward signal of +1 for a win, 0 for a draw and −1 for a loss is, in a sense, hard-coded into the agent, but not in the sense that it’s the metric the agent uses to select actions. Instead the chess-playing agent uses its value function, which is an estimate of the reward the agent will get in the future, but is not the same thing.
The iodinosaurs example perhaps obscures the point since the iodinos seem inner aligned: they probably do terminally value (the feeling of) getting iodine and they are unlikely to instead optimise a proxy. In this case the value function which is used to select actions is very similar to the reward function, but in general it needn’t be, for example in the case where the agent has previously been rewarded for getting raspberries and now has the choice between a raspberry and a blueberry. Even if it knows the blueberry will get it higher reward, it might not care: it values raspberries, and it selects its actions based on what it values.
I feel like there is some vocabulary confusion in the genesis of this post. “Reward” is hard coded into the agents. The Dinosaurs of Jurrasic Park (spoiler alert) were genetically engineered to lack iodine. So, the trainers could use iodine as a reward to incentives other behaviors because be definition the dinos valued iodine as a terminal value. In humans Seratonin and Dopamine bonding to appropriate brain receptors are DNA-coded terminal values that inherently train us to pursue certain behaviors (eg food, sex). An AI is, by definition, going to take whatever actions maximize its Reward system. That’s what having a Reward system means.
I think the terminological confusion is with you: what you’re talking about is more like what is called in some RL algorithms a value function.
Does a chess-playing RL agent make whichever move maximises reward? Not unless it has converged to the optimal policy, which in practice it hasn’t. The reward signal of +1 for a win, 0 for a draw and −1 for a loss is, in a sense, hard-coded into the agent, but not in the sense that it’s the metric the agent uses to select actions. Instead the chess-playing agent uses its value function, which is an estimate of the reward the agent will get in the future, but is not the same thing.
The iodinosaurs example perhaps obscures the point since the iodinos seem inner aligned: they probably do terminally value (the feeling of) getting iodine and they are unlikely to instead optimise a proxy. In this case the value function which is used to select actions is very similar to the reward function, but in general it needn’t be, for example in the case where the agent has previously been rewarded for getting raspberries and now has the choice between a raspberry and a blueberry. Even if it knows the blueberry will get it higher reward, it might not care: it values raspberries, and it selects its actions based on what it values.