unlike other technologies, an AI disaster might not wait around for you to come clean it up
I think this piece is extremely important, and I would have put it in a more central place. The whole “instrumental goal preservation” argument makes AI risk very different from the knife/electricity/car analogies. It means that you only get one shot, and can’t rely on iterative engineering. Without that piece, the argument is effectively (but not exactly) considering only low-stakes alignment.
In fact, I think if we get rid of this piece of the alignment problem, basically all of the difficulty goes away. If you can always try again after something goes wrong, then if a solution exists you will always find it eventually.
This piece seems like much of what makes the difference between “AI could potentially cause harm” and “AI could potentially be the most important problem in the world”. And I think even the most bullish techno-optimist probably won’t deny the former claim if you press them on it.
I think this piece is extremely important, and I would have put it in a more central place. The whole “instrumental goal preservation” argument makes AI risk very different from the knife/electricity/car analogies. It means that you only get one shot, and can’t rely on iterative engineering. Without that piece, the argument is effectively (but not exactly) considering only low-stakes alignment.
In fact, I think if we get rid of this piece of the alignment problem, basically all of the difficulty goes away. If you can always try again after something goes wrong, then if a solution exists you will always find it eventually.
This piece seems like much of what makes the difference between “AI could potentially cause harm” and “AI could potentially be the most important problem in the world”. And I think even the most bullish techno-optimist probably won’t deny the former claim if you press them on it.
Might follow this up with a post?