I think corrigibility is “AGI doesn’t try to kill everyone and doesn’t try to prevent/manipulate its modification”. Therefore, in some global sense such AGI is aligned at every point in time. Even if it causes a local disaster.
Over 90% , as I said
Then I agree, thank you for re-explaining your opinion. But I think other probabilities count as high too.
To me, the ingredients of danger (but not “> 90%”) are those:
1st. AGI can be built without Alignment/Interpretability being solved. If that’s true, building AGI slowly or being able to fix visible problems may not matter that much.
2nd and 3rd. AGI can have planning ability. AGI can come up with the goal pursuing which would kill everyone.
2nd (alternative). AIs and AGIs can kill most humans without real intention of doing so, by destabilizing the world/amplifying already existing risks.
If I remember correctly, Eliezer also believes in “intelligence explosion” (AGI won’t be just smarter than humanity, but many-many times smarter than humanity: like humanity is smarter than ants/rats/chimps). Haven’t you forgot to add that assumption?
I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term… “Correctable”. If an AI were fully aligned, there would be no need to correct it.
Yes, there are dangers other than a high probability of killing almost every one. I didn’t say there arent. But it’s motte and baileying to fall back to “what about these lesser risks”.
If I remember correctly, Eliezer also believes in “intelligence explosion” (AGI won’t be just smarter than humanity, but many-many times
Yes, and that’s the specific argument I am addressing,not AI risk in general.
Except that if it’s many many times smarter, it’s ASI, not AGI.
I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term… “Correctable”. If an AI were fully aligned, there would be no need to correct it.
Perhaps I should make a better argument:
It’s possible that AGI is correctable, but (a) we don’t know what needs to be corrected or (b) we cause new, less noticeable problems, while correcting AGI.
So, I think there’s not two assumptions “alignment/interpretability is not solved + AGI is incorrigible”, but only one — “alignment/interpretability is not solved”. (A strong version of corrigibility counts as alignment/interpretability being solved.)
Yes, and that’s the specific argument I am addressing,not AI risk in general.
Except that if it’s many many times smarter, it’s ASI, not AGI.
I disagree that “doom” and “AGI going ASI very fast” are certain (> 90%) too.
I think corrigibility is “AGI doesn’t try to kill everyone and doesn’t try to prevent/manipulate its modification”. Therefore, in some global sense such AGI is aligned at every point in time. Even if it causes a local disaster.
Then I agree, thank you for re-explaining your opinion. But I think other probabilities count as high too.
To me, the ingredients of danger (but not “> 90%”) are those:
1st. AGI can be built without Alignment/Interpretability being solved. If that’s true, building AGI slowly or being able to fix visible problems may not matter that much.
2nd and 3rd. AGI can have planning ability. AGI can come up with the goal pursuing which would kill everyone.
2nd (alternative). AIs and AGIs can kill most humans without real intention of doing so, by destabilizing the world/amplifying already existing risks.
If I remember correctly, Eliezer also believes in “intelligence explosion” (AGI won’t be just smarter than humanity, but many-many times smarter than humanity: like humanity is smarter than ants/rats/chimps). Haven’t you forgot to add that assumption?
I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term… “Correctable”. If an AI were fully aligned, there would be no need to correct it.
Yes, there are dangers other than a high probability of killing almost every one. I didn’t say there arent. But it’s motte and baileying to fall back to “what about these lesser risks”.
Yes, and that’s the specific argument I am addressing,not AI risk in general.
Except that if it’s many many times smarter, it’s ASI, not AGI.
Perhaps I should make a better argument:
It’s possible that AGI is correctable, but (a) we don’t know what needs to be corrected or (b) we cause new, less noticeable problems, while correcting AGI.
So, I think there’s not two assumptions “alignment/interpretability is not solved + AGI is incorrigible”, but only one — “alignment/interpretability is not solved”. (A strong version of corrigibility counts as alignment/interpretability being solved.)
I disagree that “doom” and “AGI going ASI very fast” are certain (> 90%) too.