TurnTrout comments on Reward is not the optimization target

TurnTrout 16 Nov 2022 3:45 UTC
LW: 2 AF: 2
0
AF
Edit 11/15/22: The original version of this post talked about how reward reinforces antecedent computations in policy gradient approaches. This is not true in general. I edited the post to instead talk about how reward is used to upweight certain kinds of actions in certain kinds of situations, and therefore reward chisels cognitive grooves into agents.