Is there something like a pie chart of outcomes from AGI?
I am trying to get a better understanding of the realistic scenarios and their likelihoods. I understand that the likelihoods are very disagreed upon.
My current opinion looks a bit like this:
30%: Human extinction
10%: Fast human extinction
20%: Slower human extinction
30%: Alignment with good outcomes
20%: Alignment with at best mediocre outcomes
20%: Unaligned AGI, but at least some humans are still alive
12%: We are instrumentally worth not killing
6%: The AI wireheads us
2%: S-risk from the AI having producing suffering as one of its terminal goals
I decided to break down the unaligned AGI scenarios a step further.
If there are any resources specifically to refine my understanding of the possible outcomes and their likelihoods, please tell me of them. Additionally, if you have any other relevant comments I’d be glad to hear them.
I think the scenario of “aligned AI, that then builds a stronger ruinous misaligned AI” deserves a special mention. I was briefly unusually hopeful last fall, after concluding that LLMs have a reasonable chance of loose NotKillEveryone-level alignment, but then realized that they also have a reasonable chance of starting out as autonomous AGIs at mearely near-human level (in rationality/coordination), in which case they are liable to build ruinous misaligned AGIs for exactly the same reasons the humans are currently rushing ahead, or under human instruction to do so, just faster. I’m still more hopeful than a year ago, but not by much, and most of my P(doom) is in this scenario.
I worry that a lot of good takes on alignment optimism are about alignment of first AGIs and don’t at all take into account this possibility. An aligned superintelligence won’t sort everything else out if it’s not a superintelligence yet or if it’s still under human control (in a sense that’s distinct from alignment).
Is there something like a pie chart of outcomes from AGI?
I am trying to get a better understanding of the realistic scenarios and their likelihoods. I understand that the likelihoods are very disagreed upon.
My current opinion looks a bit like this:
30%: Human extinction
10%: Fast human extinction
20%: Slower human extinction
30%: Alignment with good outcomes
20%: Alignment with at best mediocre outcomes
20%: Unaligned AGI, but at least some humans are still alive
12%: We are instrumentally worth not killing
6%: The AI wireheads us
2%: S-risk from the AI having producing suffering as one of its terminal goals
I decided to break down the unaligned AGI scenarios a step further.
If there are any resources specifically to refine my understanding of the possible outcomes and their likelihoods, please tell me of them. Additionally, if you have any other relevant comments I’d be glad to hear them.
I think the scenario of “aligned AI, that then builds a stronger ruinous misaligned AI” deserves a special mention. I was briefly unusually hopeful last fall, after concluding that LLMs have a reasonable chance of loose NotKillEveryone-level alignment, but then realized that they also have a reasonable chance of starting out as autonomous AGIs at mearely near-human level (in rationality/coordination), in which case they are liable to build ruinous misaligned AGIs for exactly the same reasons the humans are currently rushing ahead, or under human instruction to do so, just faster. I’m still more hopeful than a year ago, but not by much, and most of my P(doom) is in this scenario.
I worry that a lot of good takes on alignment optimism are about alignment of first AGIs and don’t at all take into account this possibility. An aligned superintelligence won’t sort everything else out if it’s not a superintelligence yet or if it’s still under human control (in a sense that’s distinct from alignment).