Sam Ringer comments on Models Don’t “Get Reward”

Sam Ringer 7 Jan 2023 11:49 UTC
1 point
0
Thanks for the feedback!

From reading your linked comment, I think we agree about selection arguments. In the post, when I mention “selection pressure towards a model”, I generally mean “such a model would score highly on the reward metric” as opposed to “SGD is likely to reach such a model”. I believe the former is correct and the later is very much an open-question.
To second what I think is your general point, a lot of the language used around selection can be confusing because it confounds “such a solution would do well under some metric” with “your optimization process is likely to produce such a solution”. The wolves with snipers example illustrates this pretty clearly. I’m definitely open to ideas for better language to distinguish the two cases!