3. Will continue to exist regardless of how well you criticize any one part of it.
Depending on what you mean by “any one part of it”, I think 3 is false. E.g., a sufficiently good critique of “AGI won’t just have human-friendly values by default” would cause MIRI to throw a party and close up shop.
Huh, roll to disbelieve on ‘sufficient to close up shop’?. I don’t think this is my only crux for AI being really dangerous.
Even if sufficiently advanced AGI reliably converges to human-friendly values in a very strong sense (i.e. two rival humans trying to build AGIs for war, or many humans with many AGIs embarking on complex economic goals, will somehow always figure out the best things for humans even if it means disobeying orders by stupid humans)...
...there’s still a separate case to be made multipolar narrow non-fully-superhuman AIs won’t kill us before the AGI sovereign fixes everything.
I think a more likely thing we’d want to stick around to do in that world is ‘try to accelerate humanity to AGI ASAP’. “Sufficiently advanced AGI converges to human-friendly values” is weaker than “AGI will just have human-friendly values by default”.
Depending on what you mean by “any one part of it”, I think 3 is false. E.g., a sufficiently good critique of “AGI won’t just have human-friendly values by default” would cause MIRI to throw a party and close up shop.
Huh, roll to disbelieve on ‘sufficient to close up shop’?. I don’t think this is my only crux for AI being really dangerous.
Even if sufficiently advanced AGI reliably converges to human-friendly values in a very strong sense (i.e. two rival humans trying to build AGIs for war, or many humans with many AGIs embarking on complex economic goals, will somehow always figure out the best things for humans even if it means disobeying orders by stupid humans)...
...there’s still a separate case to be made multipolar narrow non-fully-superhuman AIs won’t kill us before the AGI sovereign fixes everything.
I think a more likely thing we’d want to stick around to do in that world is ‘try to accelerate humanity to AGI ASAP’. “Sufficiently advanced AGI converges to human-friendly values” is weaker than “AGI will just have human-friendly values by default”.