I agree with the first bit! I think a universe of relativistic grabby AGI spheres-of-control is at least slightly cool.
It’s possible that the actual post-unaligned-singularity world doesn’t look like that, though. Like, “space empires” is a concept that reflects a lot of what we care about. Maybe large-scale instrumentally efficient behavior in our world is really boring, and doesn’t end up keeping any of the cool bits of our hard sci-fi’s speculative mature technology.
The chance of a mars rover nearly but not quite hitting mars is large. It was optimized in the direction of mars, and one little mistake can cause a near miss.
For AI, if humans try to make FAI, but make some fairly small mistake, we could well get a near miss. Of course, if we make lots of big mistakes, we don’t get that.
I don’t think the mars mission analogy goes through given the risk of deceptive alignment.
Optimizing in the direction of a mars landing when there’s a risk of a deceptive agent just means everything looks golden and mars-oriented up until you deploy the model … and then the deployed model starts pursuing the unrelated goal it’s been waiting to pursue all along. The deployed model’s goal is allowed to be completely detached from mars, as long as the model with that unrelated goal was good at playing along once it was instantiated in training.
Well if we screw up that badly with deceptive misalignment, that corresponds to crashing on the launchpad.
It is reasonably likely that humans will have some technique they use that is intended to minimize deceptive misalignment. Or that gradient descent shapes the goals to something similar to what we want before the AI is smart enough to be deceptive.
I agree with the first bit! I think a universe of relativistic grabby AGI spheres-of-control is at least slightly cool.
It’s possible that the actual post-unaligned-singularity world doesn’t look like that, though. Like, “space empires” is a concept that reflects a lot of what we care about. Maybe large-scale instrumentally efficient behavior in our world is really boring, and doesn’t end up keeping any of the cool bits of our hard sci-fi’s speculative mature technology.
I don’t think the mars mission analogy goes through given the risk of deceptive alignment.
Optimizing in the direction of a mars landing when there’s a risk of a deceptive agent just means everything looks golden and mars-oriented up until you deploy the model … and then the deployed model starts pursuing the unrelated goal it’s been waiting to pursue all along. The deployed model’s goal is allowed to be completely detached from mars, as long as the model with that unrelated goal was good at playing along once it was instantiated in training.
Well if we screw up that badly with deceptive misalignment, that corresponds to crashing on the launchpad.
It is reasonably likely that humans will have some technique they use that is intended to minimize deceptive misalignment. Or that gradient descent shapes the goals to something similar to what we want before the AI is smart enough to be deceptive.