This essay discusses the possibility of making a good successor AI in lieu of an aligned AI. An aligned AI is trying to help us get what we want, whereas a good successor AI is one which we would be happy to see take over the world, even if it doesn’t try to help us.
I think it would clearly be better to get an aligned AI if we could(because if it turns out that it would be better to build a successor AI, the aligned AI could just help us do that). But if that turns out to be hard for some reason(such as mesa-alignment problems being fundamentally intractable) we instead might try to ensure that our successor is a good one.
This essay discusses the possibility of making a good successor AI in lieu of an aligned AI. An aligned AI is trying to help us get what we want, whereas a good successor AI is one which we would be happy to see take over the world, even if it doesn’t try to help us.
I think it would clearly be better to get an aligned AI if we could(because if it turns out that it would be better to build a successor AI, the aligned AI could just help us do that). But if that turns out to be hard for some reason(such as mesa-alignment problems being fundamentally intractable) we instead might try to ensure that our successor is a good one.