RL agents which don’t think about reward before getting reward, will not become reward optimizers, because there will be no reward-oriented computations for credit assignment to reinforce.
to
While it’s possible to have activations on “pizza consumption predicted to be rewarding” and “execute motor-subroutine-#51241” and then have credit assignment hook these up into a new motivational circuit, this is only one possible direction of value formation in the agent. Seemingly, the most direct way for an agent to become more of a reward optimizer is to already make decisions motivated by reward, and then have credit assignment further generalize that decision-making.
Update: Changed
to