Nora Belrose comments on Seriously, what goes wrong with “reward the agent when it makes you smile”?

Nora Belrose 15 Aug 2022 14:27 UTC
3 points
0
This is actually a pretty good argument, and has caused me to update more strongly to the view that we should be optimizing only the thought process of chain of thought language models, not the outcomes that they produce