a very slight misalignment would be disastrous. That seems possible, per Eliezer’s Rocket Example, but is far from certain.
Just a minor nitpick, I don’t think the point of the Rocket Alignment Metaphor was supposed to be that slight misalignment was catastrophic. I think the more apt interpretation is that apparent alignment does not equal actual alignment, and you need to do a lot of work before you get to the point where you can talk meaningfully about aligning an AI at all. Relevant quote from the essay,
It’s not that current rocket ideas are almost right, and we just need to solve one or two more problems to make them work. The conceptual distance that separates anyone from solving the rocket alignment problem is much greater than that.
Right now everyone is confused about rocket trajectories, and we’re trying to become less confused. That’s what we need to do next, not run out and advise rocket engineers to build their rockets the way that our current math papers are talking about. Not until we stop being confused about extremely basic questions like why the Earth doesn’t fall into the Sun.
Just a minor nitpick, I don’t think the point of the Rocket Alignment Metaphor was supposed to be that slight misalignment was catastrophic. I think the more apt interpretation is that apparent alignment does not equal actual alignment, and you need to do a lot of work before you get to the point where you can talk meaningfully about aligning an AI at all. Relevant quote from the essay,
Fully agree—I was using the example to make a far less fundamental point.