I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term… “Correctable”. If an AI were fully aligned, there would be no need to correct it.
Yes, there are dangers other than a high probability of killing almost every one. I didn’t say there arent. But it’s motte and baileying to fall back to “what about these lesser risks”.
If I remember correctly, Eliezer also believes in “intelligence explosion” (AGI won’t be just smarter than humanity, but many-many times
Yes, and that’s the specific argument I am addressing,not AI risk in general.
Except that if it’s many many times smarter, it’s ASI, not AGI.
I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term… “Correctable”. If an AI were fully aligned, there would be no need to correct it.
Perhaps I should make a better argument:
It’s possible that AGI is correctable, but (a) we don’t know what needs to be corrected or (b) we cause new, less noticeable problems, while correcting AGI.
So, I think there’s not two assumptions “alignment/interpretability is not solved + AGI is incorrigible”, but only one — “alignment/interpretability is not solved”. (A strong version of corrigibility counts as alignment/interpretability being solved.)
Yes, and that’s the specific argument I am addressing,not AI risk in general.
Except that if it’s many many times smarter, it’s ASI, not AGI.
I disagree that “doom” and “AGI going ASI very fast” are certain (> 90%) too.
I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term… “Correctable”. If an AI were fully aligned, there would be no need to correct it.
Yes, there are dangers other than a high probability of killing almost every one. I didn’t say there arent. But it’s motte and baileying to fall back to “what about these lesser risks”.
Yes, and that’s the specific argument I am addressing,not AI risk in general.
Except that if it’s many many times smarter, it’s ASI, not AGI.
Perhaps I should make a better argument:
It’s possible that AGI is correctable, but (a) we don’t know what needs to be corrected or (b) we cause new, less noticeable problems, while correcting AGI.
So, I think there’s not two assumptions “alignment/interpretability is not solved + AGI is incorrigible”, but only one — “alignment/interpretability is not solved”. (A strong version of corrigibility counts as alignment/interpretability being solved.)
I disagree that “doom” and “AGI going ASI very fast” are certain (> 90%) too.