johnswentworth comments on Will Capabilities Generalise More?

johnswentworth 29 Jun 2022 23:19 UTC
LW: 3 AF: 3
1
AF
Maybe “Humans iteratively designing useful systems and fixing problems provide a robust feedback signal for capabilities, but not for alignment”?
(Also, I now realize that I left this out of the original comment because I assumed it was obvious, but to be explicit: basically any feedback signal on a reasonably-complex/difficult task will select for capabilities. That’s just instrumental convergence.)