TurnTrout comments on The case for aligning narrowly superhuman models

TurnTrout 6 Mar 2021 16:46 UTC
LW: 13 AF: 5
AF
Impression before reading LW post comments & MIRI comments: this strikes me as a valuable “fourth area” of core research that we could start growing now. I’m uncertain about the technical fruits of the research itself (I expect it to be somewhere between ‘slightly positive’ and ‘moderate-high positive’), but it seems like we could indeed scale such research into its own healthy (& prestigious!) subfield in ML. This could diversify the alignment research portfolio in a way that scales sublinearly with long-termist research input: in the long run, we wouldn’t need everyone involved to be ‘core’ alignment researchers.
I have a few notes of unease that I haven’t yet sat down to figure out yet, so I may reply to this comment with more thoughts.