The AI’s primary goal is specifically to have a reduced impact, as opposed to AI boxing where the AI’s goals are artificially restricted by the box. So long as the AI is good at goal-preservation it can self-improve while continuing to have a goal of reduced impact. Increasing intelligence/power makes reduced impact more effective but makes AI-boxing less effective.
Yes, it’s the equivalent of making the AI design it’s own box—the box is better-designed the smarter the AI is.
But it doesn’t matter how well-designed your box is. It’s probably not impossible to design a box an arbitrarily smart entity can’t escape (don’t quote me on that.) But they will escape anyway, because you put a channel in for them to give you stuff—advice, nanotech plans, whatever—and they will escape out that channel. The channel out—the human siting at a terminal somewhere—is the weak point, not the durability of the box itself.
But they will escape anyway, because you put a channel in for them to give you stuff—advice, nanotech plans, whatever—and they will escape out that channel. The channel out—the human siting at a terminal somewhere—is the weak point, not the durability of the box itself.
I almost completely agree; don’t make the channel too large otherwise the AI has too much choice over the future. In the simplest and safest case, however, we could start up a reduced-impact AI whose goal is to alter the world only through a single binary choice (a 1-bit channel). Since the goal is not to alter the world more than a 1 bit choice could allow, it would be virtually impossible for the AI to escape through that channel. For instance, the AI would attempt to prevent a human from releasing it since that would have a much larger than 1 bit change on the world. That is not a terribly useful AI, but perhaps we could have it approve or reject critical plans for space exploration, nanotechnology, the reduced-impact safety of other AI (but this is probably the most dangerous one; if it agrees with the goals but not the reduced-impact nature of the new AI of course it will approve it), etc. Each time it was used, one more bit would be given to it to influence the future, so we would still have to be very careful that it didn’t escape over a long history of use. I don’t have a good strategy for preventing the latter. Perhaps we only ask it for unique answers to NP-hard problems so that we can verify the solutions are not an attempt to use tiny parts of the channel to form an aggregate message to escape, and make each question small enough that we can stop using it after it appears to be trying to abuse the channel without it being able to actually escape. I don’t know the shortest bit-sequence that an AI could use to escape, so I have no upper bound for a safe channel size.
Yes, it’s the equivalent of making the AI design it’s own box—the box is better-designed the smarter the AI is.
But it doesn’t matter how well-designed your box is. It’s probably not impossible to design a box an arbitrarily smart entity can’t escape (don’t quote me on that.) But they will escape anyway, because you put a channel in for them to give you stuff—advice, nanotech plans, whatever—and they will escape out that channel. The channel out—the human siting at a terminal somewhere—is the weak point, not the durability of the box itself.
I almost completely agree; don’t make the channel too large otherwise the AI has too much choice over the future. In the simplest and safest case, however, we could start up a reduced-impact AI whose goal is to alter the world only through a single binary choice (a 1-bit channel). Since the goal is not to alter the world more than a 1 bit choice could allow, it would be virtually impossible for the AI to escape through that channel. For instance, the AI would attempt to prevent a human from releasing it since that would have a much larger than 1 bit change on the world. That is not a terribly useful AI, but perhaps we could have it approve or reject critical plans for space exploration, nanotechnology, the reduced-impact safety of other AI (but this is probably the most dangerous one; if it agrees with the goals but not the reduced-impact nature of the new AI of course it will approve it), etc. Each time it was used, one more bit would be given to it to influence the future, so we would still have to be very careful that it didn’t escape over a long history of use. I don’t have a good strategy for preventing the latter. Perhaps we only ask it for unique answers to NP-hard problems so that we can verify the solutions are not an attempt to use tiny parts of the channel to form an aggregate message to escape, and make each question small enough that we can stop using it after it appears to be trying to abuse the channel without it being able to actually escape. I don’t know the shortest bit-sequence that an AI could use to escape, so I have no upper bound for a safe channel size.