I think the scenario of “aligned AI, that then builds a stronger ruinous misaligned AI” deserves a special mention. I was briefly unusually hopeful last fall, after concluding that LLMs have a reasonable chance of loose NotKillEveryone-level alignment, but then realized that they also have a reasonable chance of starting out as autonomous AGIs at mearely near-human level (in rationality/coordination), in which case they are liable to build ruinous misaligned AGIs for exactly the same reasons the humans are currently rushing ahead, or under human instruction to do so, just faster. I’m still more hopeful than a year ago, but not by much, and most of my P(doom) is in this scenario.
I worry that a lot of good takes on alignment optimism are about alignment of first AGIs and don’t at all take into account this possibility. An aligned superintelligence won’t sort everything else out if it’s not a superintelligence yet or if it’s still under human control (in a sense that’s distinct from alignment).
I think the scenario of “aligned AI, that then builds a stronger ruinous misaligned AI” deserves a special mention. I was briefly unusually hopeful last fall, after concluding that LLMs have a reasonable chance of loose NotKillEveryone-level alignment, but then realized that they also have a reasonable chance of starting out as autonomous AGIs at mearely near-human level (in rationality/coordination), in which case they are liable to build ruinous misaligned AGIs for exactly the same reasons the humans are currently rushing ahead, or under human instruction to do so, just faster. I’m still more hopeful than a year ago, but not by much, and most of my P(doom) is in this scenario.
I worry that a lot of good takes on alignment optimism are about alignment of first AGIs and don’t at all take into account this possibility. An aligned superintelligence won’t sort everything else out if it’s not a superintelligence yet or if it’s still under human control (in a sense that’s distinct from alignment).