Ramana Kumar comments on Will Capabilities Generalise More?

Ramana Kumar 29 Jun 2022 22:54 UTC
LW: 1 AF: 1
0
AF
Nice—thanks for this comment—how would the argument be summarised as a nice heading to go on this list? Maybe “Capabilities can be optimised using feedback but alignment cannot” (and feedback is cheap, and optimisation eventually produces generality)?
- johnswentworth 29 Jun 2022 23:19 UTC
  LW: 3 AF: 3
  1
  AF Parent
  Maybe “Humans iteratively designing useful systems and fixing problems provide a robust feedback signal for capabilities, but not for alignment”?
  (Also, I now realize that I left this out of the original comment because I assumed it was obvious, but to be explicit: basically any feedback signal on a reasonably-complex/difficult task will select for capabilities. That’s just instrumental convergence.)