Signer comments on Thoughts on the impact of RLHF research

Signer Jan 26, 2023, 7:34 PM
3 points
0

I mostly think that AI doing research will accelerate both risk and alignment, so we’re aiming for it to be roughly a wash.

Yeah, I don’t understand why it would be a wash, when destructive capabilities are easier than alignment (humans already figured out nukes, but not alignment) and alignment is expected to be harder for more advanced AI. Even without straight misalignment risk, giving superhuman AI to the current civilization doesn’t sound like stability improvement. So without specific plan to stop everyone from misusing AI it still sounds safer to solve alignment without anyone building nearly-risky AI.