Do you think that the scalable oversight/iterative alignment proposal that we discussed can get us to the necessary amount of niceness to make humans survive with AGI?
I was only addressing the question “If we basically failed at alignment, or didn’t align the AI at all, but had a very small amount of niceness, would that lead to good outcomes?”
Do you think that the scalable oversight/iterative alignment proposal that we discussed can get us to the necessary amount of niceness to make humans survive with AGI?
My answer is basically yes.
I was only addressing the question “If we basically failed at alignment, or didn’t align the AI at all, but had a very small amount of niceness, would that lead to good outcomes?”