ALFONSO: So the first few rockets might not hit the Moon, I get that. Might fall back down and hurt someone even. But that’s how technology advances in general, doesn’t it? The first planes crashed often, and it took a long time for all the details to be worked out both theoretically and experimentally, both helping each other. Wouldn’t Moon Rocketry follow the standard pattern?
BETH: We are very much worried that this is not what would happen at all! Moon Rocketry is special! The very first rocket that goes very high but misses the Moon will have enough power to destroy both the Earth and the Moon. And the Sun. And there is nothing we can do about it once it’s fired. So we better get the first rocket as right as we possibly can, the stakes are just too high.
ALFONSO: This sounds alarmist and super far fetched! But humor me, explain what reasons do you have for your suspicion.
BETH: Well, we suspect that to get really high up, a rocket will need to keep gaining power, the higher up, the more power, and if it misses the Moon, the fallout from its engines will be so bad, it may end up causing worldwide destruction.
ALFONSO: I think your analogy between propulsion and information is getting a bit strained.
I mean, analogies don’t have to be similar in all respects to be useful explanations, just in the few respects that you’re using the analogy for. OP isn’t arguing that AI alignment is important because rocket alignment is important, it’s only using the analogy to describe the type of work that it thinks needs to be done to align AGI—which I’m guessing has been difficult to describe before writing this post. Arguments that AGI needs to be built right the first time have been discussed elsewhere, and you’re right that this post doesn’t make that arg.
(On this side-topic of whether AGI needs to be build precisely right first time, and counter to your point that we-always-get-stuff-wrong-a-bunch-at-first-and-that’s-fine, I liked Max Tegmark’s story of how we’re building technologies that increasingly have less affordance for error—fire, nukes, AGI—some of these having a few mistakes was of small damage, then of big damage, and in principle we may hit tech where initial mistakes are existential in nature. I think there are some sane args that make AGI seems like a plausible instance of this.
My point, clearly not well expressed, is that the main issue why the AI alignment has to be figured out in advance is not even mentioned in the OP’s dialogue:
We think the most important thing to do next is to advance our understanding of rocket trajectories until we have a better, deeper understanding of what we’ve started calling the “rocket alignment problem” There are other safety problems, but this rocket alignment problem will probably take the most total time to work on, so it’s the most urgent.
… why? So what if this problem remains after the other problems are solved and the rockets are flying every which way? I have tried to answer that, since Eliezer hasn’t in this post, despite this being the main impetus of MIRI’s work.
I feel like the post is trying to convince the reader that AI alignment needs to be solved AT ALL. You can worry about arguing when it needs to be solved after the other person in convinced there is a problem to solve in the first place.
I agree with Ben, and also, humanity successfully sent a spaceship to the moon surface on the second attempt and successfully sent people (higher stakes) to the moon surface on the first attempt. This shows that difficult technological problems can be solved without extensive trial and error. (Obviously some trial and error on easier problems was done to get to the point of landing on the moon, and no doubt the same will be true of AGI. But, there is hope that the actual AGI can be constructed without trial and error, or at least without the sort of trial and error where error is potentially catastrophic.)
The trouble with this problem is the rocket used for this was a system of welded and bolted together parts. The functions and rules of each system remained the same throughout the flight and thus it was possible to model. Self improving AI, it would be like if we used the rocket exhaust from the Saturn V to melt metal used in other parts of the rocket during the flight to the Moon.
I can see a way to do self-improving AI : separate modular subsystems, each being evaluated by some connection either directly or indirectly to the real world. But in that case, while each subsystem may be a “black box” that is ever-evolving, basically the function remains the same. Like you might have a box that re-renders scenes from a camera without shadows. And there’s feedback and ways it can get better at it’s job. And there’s a meta-system that can gut the architecture of that box and replace it with a new internal way to do this task. But, all of the time, the box is still just subtracting shadows, it never does anything else.
I don’t think we need to explicitly plan for the later stages. If we have a sufficiently advanced AI that we know is aligned and capable of intelligently self-modifying without becoming unaligned, we can probably put more confidence in the seed AI’s ability to construct the final stages than in our ability to shape the seed AI to better construct the final stages.
Edit: that’s insufficient. What I mean is that once you make the seed AI I described, any change you make to the seed AI that’s explicitly for the purpose of guiding its takeoff will be practically useless and possibly harmful given the AI’s advantage. I think we may reach a point where we can trust the seed AI to do the job well better than we can trust ourselves to do the job well.
ALFONSO: So the first few rockets might not hit the Moon, I get that. Might fall back down and hurt someone even. But that’s how technology advances in general, doesn’t it? The first planes crashed often, and it took a long time for all the details to be worked out both theoretically and experimentally, both helping each other. Wouldn’t Moon Rocketry follow the standard pattern?
BETH: We are very much worried that this is not what would happen at all! Moon Rocketry is special! The very first rocket that goes very high but misses the Moon will have enough power to destroy both the Earth and the Moon. And the Sun. And there is nothing we can do about it once it’s fired. So we better get the first rocket as right as we possibly can, the stakes are just too high.
ALFONSO: This sounds alarmist and super far fetched! But humor me, explain what reasons do you have for your suspicion.
BETH: Well, we suspect that to get really high up, a rocket will need to keep gaining power, the higher up, the more power, and if it misses the Moon, the fallout from its engines will be so bad, it may end up causing worldwide destruction.
ALFONSO: I think your analogy between propulsion and information is getting a bit strained.
I mean, analogies don’t have to be similar in all respects to be useful explanations, just in the few respects that you’re using the analogy for. OP isn’t arguing that AI alignment is important because rocket alignment is important, it’s only using the analogy to describe the type of work that it thinks needs to be done to align AGI—which I’m guessing has been difficult to describe before writing this post. Arguments that AGI needs to be built right the first time have been discussed elsewhere, and you’re right that this post doesn’t make that arg.
(On this side-topic of whether AGI needs to be build precisely right first time, and counter to your point that we-always-get-stuff-wrong-a-bunch-at-first-and-that’s-fine, I liked Max Tegmark’s story of how we’re building technologies that increasingly have less affordance for error—fire, nukes, AGI—some of these having a few mistakes was of small damage, then of big damage, and in principle we may hit tech where initial mistakes are existential in nature. I think there are some sane args that make AGI seems like a plausible instance of this.
For discussion of the AI details I’d point elsewhere, to things like Gwern on “Why Tool AIs Want to be Agent AIs”, Paul Christiano discussing arguments for fast-takeoff speeds, the paper Intelligence Explosion Microeconomics, and of course Bostrom’s book.)
(edited a few lines to be clearer/shorter)
My point, clearly not well expressed, is that the main issue why the AI alignment has to be figured out in advance is not even mentioned in the OP’s dialogue:
… why? So what if this problem remains after the other problems are solved and the rockets are flying every which way? I have tried to answer that, since Eliezer hasn’t in this post, despite this being the main impetus of MIRI’s work.
I feel like the post is trying to convince the reader that AI alignment needs to be solved AT ALL. You can worry about arguing when it needs to be solved after the other person in convinced there is a problem to solve in the first place.
I agree with Ben, and also, humanity successfully sent a spaceship to the moon surface on the second attempt and successfully sent people (higher stakes) to the moon surface on the first attempt. This shows that difficult technological problems can be solved without extensive trial and error. (Obviously some trial and error on easier problems was done to get to the point of landing on the moon, and no doubt the same will be true of AGI. But, there is hope that the actual AGI can be constructed without trial and error, or at least without the sort of trial and error where error is potentially catastrophic.)
The trouble with this problem is the rocket used for this was a system of welded and bolted together parts. The functions and rules of each system remained the same throughout the flight and thus it was possible to model. Self improving AI, it would be like if we used the rocket exhaust from the Saturn V to melt metal used in other parts of the rocket during the flight to the Moon.
I can see a way to do self-improving AI : separate modular subsystems, each being evaluated by some connection either directly or indirectly to the real world. But in that case, while each subsystem may be a “black box” that is ever-evolving, basically the function remains the same. Like you might have a box that re-renders scenes from a camera without shadows. And there’s feedback and ways it can get better at it’s job. And there’s a meta-system that can gut the architecture of that box and replace it with a new internal way to do this task. But, all of the time, the box is still just subtracting shadows, it never does anything else.
I don’t think we need to explicitly plan for the later stages. If we have a sufficiently advanced AI that we know is aligned and capable of intelligently self-modifying without becoming unaligned, we can probably put more confidence in the seed AI’s ability to construct the final stages than in our ability to shape the seed AI to better construct the final stages.
Edit: that’s insufficient. What I mean is that once you make the seed AI I described, any change you make to the seed AI that’s explicitly for the purpose of guiding its takeoff will be practically useless and possibly harmful given the AI’s advantage. I think we may reach a point where we can trust the seed AI to do the job well better than we can trust ourselves to do the job well.