Separate from my other comment, I want to question your assumption that we must worry about an AI-takeoff that is exponentially better than humans at everything, so that a very slight misalignment would be disastrous. That seems possible, per Eliezer’s Rocket Example, but is far from certain.
It seems likely that instead there are fundamental limits on intelligence (for a given architecture, at least) and while it is unlikely that the overall limits are coincidentally the same as / near human intelligence, it seems plausible that the first superhuman AI system still plateaus somewhere far short of infinite optimization power. If so, we only need to mitigate well, instead of perfectly align the AI to our goals.
I don’t think I have anything unique to add to this discussion. Basically I defer to Eliezer and Nick (Bostom) for written arguments since they are largely the ones who provided the arguments that lead me to strongly believe we live in a world with hard takeoff via recursive self improvement that will lead to a “singularity” in the sense that we pass some threshold of intelligence/capabilities beyond which we cannot meaningfully reason about or control what happens after the fact, though we may be able to influence how it happens in ways that don’t cut off the possibilities of outcomes we would be happy with.
a very slight misalignment would be disastrous. That seems possible, per Eliezer’s Rocket Example, but is far from certain.
Just a minor nitpick, I don’t think the point of the Rocket Alignment Metaphor was supposed to be that slight misalignment was catastrophic. I think the more apt interpretation is that apparent alignment does not equal actual alignment, and you need to do a lot of work before you get to the point where you can talk meaningfully about aligning an AI at all. Relevant quote from the essay,
It’s not that current rocket ideas are almost right, and we just need to solve one or two more problems to make them work. The conceptual distance that separates anyone from solving the rocket alignment problem is much greater than that.
Right now everyone is confused about rocket trajectories, and we’re trying to become less confused. That’s what we need to do next, not run out and advise rocket engineers to build their rockets the way that our current math papers are talking about. Not until we stop being confused about extremely basic questions like why the Earth doesn’t fall into the Sun.
Separate from my other comment, I want to question your assumption that we must worry about an AI-takeoff that is exponentially better than humans at everything, so that a very slight misalignment would be disastrous. That seems possible, per Eliezer’s Rocket Example, but is far from certain.
It seems likely that instead there are fundamental limits on intelligence (for a given architecture, at least) and while it is unlikely that the overall limits are coincidentally the same as / near human intelligence, it seems plausible that the first superhuman AI system still plateaus somewhere far short of infinite optimization power. If so, we only need to mitigate well, instead of perfectly align the AI to our goals.
I don’t think I have anything unique to add to this discussion. Basically I defer to Eliezer and Nick (Bostom) for written arguments since they are largely the ones who provided the arguments that lead me to strongly believe we live in a world with hard takeoff via recursive self improvement that will lead to a “singularity” in the sense that we pass some threshold of intelligence/capabilities beyond which we cannot meaningfully reason about or control what happens after the fact, though we may be able to influence how it happens in ways that don’t cut off the possibilities of outcomes we would be happy with.
Just a minor nitpick, I don’t think the point of the Rocket Alignment Metaphor was supposed to be that slight misalignment was catastrophic. I think the more apt interpretation is that apparent alignment does not equal actual alignment, and you need to do a lot of work before you get to the point where you can talk meaningfully about aligning an AI at all. Relevant quote from the essay,
Fully agree—I was using the example to make a far less fundamental point.