How is this elaborate proposal superior to bog-standard AI boxing?
The AI’s primary goal is specifically to have a reduced impact, as opposed to AI boxing where the AI’s goals are artificially restricted by the box. So long as the AI is good at goal-preservation it can self-improve while continuing to have a goal of reduced impact. Increasing intelligence/power makes reduced impact more effective but makes AI-boxing less effective.
Why bother to write in an elaborate function telling it not to affect the outside world except through a channel, when you could simply only give it one channel? It can either escape through the specified channel, or it can’t. Whether the channel is connected to an airgapped computer running the AI or an unlocked cell containing a sophisticated robot running the AI seems immaterial.
Because of side-channels. Airgapped computers are still tightly coupled to the rest of the world through sound waves and electromagnetic waves. Because of potential new laws of physics that an AI might discover to trivially produce nanotechnology or whatever. Because humans are a vulnerable part of AI-boxing (“Let me out and I’ll make you king of the galaxy”) while they are not a vulnerable part of reduced-impact.
The AI’s primary goal is specifically to have a reduced impact, as opposed to AI boxing where the AI’s goals are artificially restricted by the box. So long as the AI is good at goal-preservation it can self-improve while continuing to have a goal of reduced impact. Increasing intelligence/power makes reduced impact more effective but makes AI-boxing less effective.
Yes, it’s the equivalent of making the AI design it’s own box—the box is better-designed the smarter the AI is.
But it doesn’t matter how well-designed your box is. It’s probably not impossible to design a box an arbitrarily smart entity can’t escape (don’t quote me on that.) But they will escape anyway, because you put a channel in for them to give you stuff—advice, nanotech plans, whatever—and they will escape out that channel. The channel out—the human siting at a terminal somewhere—is the weak point, not the durability of the box itself.
But they will escape anyway, because you put a channel in for them to give you stuff—advice, nanotech plans, whatever—and they will escape out that channel. The channel out—the human siting at a terminal somewhere—is the weak point, not the durability of the box itself.
I almost completely agree; don’t make the channel too large otherwise the AI has too much choice over the future. In the simplest and safest case, however, we could start up a reduced-impact AI whose goal is to alter the world only through a single binary choice (a 1-bit channel). Since the goal is not to alter the world more than a 1 bit choice could allow, it would be virtually impossible for the AI to escape through that channel. For instance, the AI would attempt to prevent a human from releasing it since that would have a much larger than 1 bit change on the world. That is not a terribly useful AI, but perhaps we could have it approve or reject critical plans for space exploration, nanotechnology, the reduced-impact safety of other AI (but this is probably the most dangerous one; if it agrees with the goals but not the reduced-impact nature of the new AI of course it will approve it), etc. Each time it was used, one more bit would be given to it to influence the future, so we would still have to be very careful that it didn’t escape over a long history of use. I don’t have a good strategy for preventing the latter. Perhaps we only ask it for unique answers to NP-hard problems so that we can verify the solutions are not an attempt to use tiny parts of the channel to form an aggregate message to escape, and make each question small enough that we can stop using it after it appears to be trying to abuse the channel without it being able to actually escape. I don’t know the shortest bit-sequence that an AI could use to escape, so I have no upper bound for a safe channel size.
The AI’s primary goal is specifically to have a reduced impact, as opposed to AI boxing where the AI’s goals are artificially restricted by the box. So long as the AI is good at goal-preservation it can self-improve while continuing to have a goal of reduced impact. Increasing intelligence/power makes reduced impact more effective but makes AI-boxing less effective.
Because of side-channels. Airgapped computers are still tightly coupled to the rest of the world through sound waves and electromagnetic waves. Because of potential new laws of physics that an AI might discover to trivially produce nanotechnology or whatever. Because humans are a vulnerable part of AI-boxing (“Let me out and I’ll make you king of the galaxy”) while they are not a vulnerable part of reduced-impact.
Yes, it’s the equivalent of making the AI design it’s own box—the box is better-designed the smarter the AI is.
But it doesn’t matter how well-designed your box is. It’s probably not impossible to design a box an arbitrarily smart entity can’t escape (don’t quote me on that.) But they will escape anyway, because you put a channel in for them to give you stuff—advice, nanotech plans, whatever—and they will escape out that channel. The channel out—the human siting at a terminal somewhere—is the weak point, not the durability of the box itself.
I almost completely agree; don’t make the channel too large otherwise the AI has too much choice over the future. In the simplest and safest case, however, we could start up a reduced-impact AI whose goal is to alter the world only through a single binary choice (a 1-bit channel). Since the goal is not to alter the world more than a 1 bit choice could allow, it would be virtually impossible for the AI to escape through that channel. For instance, the AI would attempt to prevent a human from releasing it since that would have a much larger than 1 bit change on the world. That is not a terribly useful AI, but perhaps we could have it approve or reject critical plans for space exploration, nanotechnology, the reduced-impact safety of other AI (but this is probably the most dangerous one; if it agrees with the goals but not the reduced-impact nature of the new AI of course it will approve it), etc. Each time it was used, one more bit would be given to it to influence the future, so we would still have to be very careful that it didn’t escape over a long history of use. I don’t have a good strategy for preventing the latter. Perhaps we only ask it for unique answers to NP-hard problems so that we can verify the solutions are not an attempt to use tiny parts of the channel to form an aggregate message to escape, and make each question small enough that we can stop using it after it appears to be trying to abuse the channel without it being able to actually escape. I don’t know the shortest bit-sequence that an AI could use to escape, so I have no upper bound for a safe channel size.