I wasn’t expecting the development endgame to be much different, though it’s a bit early. At least it’s LLMs and not Atari-playing RL agents. Also, I’m much less certain about inevitability of boundary-norm-ignoring optimizers now, in a world that’s not too dog eat dog at the top. This makes precise value targeting less crucial for mere survival, though most of the Future is still lost without it.
So the news is good. I’m personally down to 70% probability of extinction, mostly first AGIs failing to prevent the world from getting destroyed by their research output, since it isn’t looking like they are going to be superintelligent out of the box. I’m no longer expecting the first AGIs to intentionally destroy the world, unless users are allowed to explicitly and successfully wish for it to be destroyed, which bizarrely seems like a significant portion of the risk.
I wasn’t expecting the development endgame to be much different, though it’s a bit early. At least it’s LLMs and not Atari-playing RL agents. Also, I’m much less certain about inevitability of boundary-norm-ignoring optimizers now, in a world that’s not too dog eat dog at the top. This makes precise value targeting less crucial for mere survival, though most of the Future is still lost without it.
So the news is good. I’m personally down to 70% probability of extinction, mostly first AGIs failing to prevent the world from getting destroyed by their research output, since it isn’t looking like they are going to be superintelligent out of the box. I’m no longer expecting the first AGIs to intentionally destroy the world, unless users are allowed to explicitly and successfully wish for it to be destroyed, which bizarrely seems like a significant portion of the risk.