TurnTrout comments on Reward is not the optimization target

TurnTrout 16 Nov 2022 3:47 UTC
LW: 2 AF: 2
0
AF
As an example, at the level of informal discussion in this post I’m not sure why you aren’t surprised that GPT-3 ever thinks about the meaning of words rather than simply thinking about statistical associations between words (after all if it isn’t yet thinking about the meaning of words, how would gradient descent find the behavior of starting to think about meanings of words?).
I’ve updated the post to clarify. I think focus on “antecedent computation reinforcement” (while often probably ~accurate) was imprecise/wrong for reasons like this. I now instead emphasize that the math of policy gradient approaches means that reward chisels cognitive circuits into networks.