You might want to make clearer that “As long as the door is closed, information cannot leave the room” isn’t an assumption but a requirement of the setup. I.e. that you’re not assuming based on your description that opening the door is the only means for an operator to get information out; you’re assuming every other means of information escape has been systematically accounted for and ruled out (with the assumption that the operator has been compromised by the AI).
I definitely do not trust a door with a kill switch to keep a superintelligent AI in a box.
I didn’t go into the details of the physical design for a really secure box in the paper, because it’s not really a topic in computer science, but this is a good excuse to write it up.
The operator is inside a glass box. Outside the glass box is a vacuum, and then a concrete box. Outside the concrete box is another vacuum and another concrete box with a Faraday cage around it. There are (airtight) doors through each layer. There is a pressure sensor in both the inner and outer vacuum layer. The computer lives in the inner wall of the inner concrete box; the screen is visible from the inside of the glass box. If the pressure increases in either vacuum layer, the computer’s memory is erased, and the whole thing is blown up. There is a laser shining through the glass box (Laser 1). If the operator blocks the path of Laser 1, this corresponds to pressing the button which ends the episode and opens the door. The pressure sensor → dynamite pathway is deactivated, the vacuum layers are filled with air, and the doors are unlocked. The operator “types” observations and rewards by blocking other lasers that pass through the box. For now, just imagine a few dozen lasers in a row, that the operator can reach up and block in order to type. (With a bit of creativity, and requiring that two lasers be blocked at once to “press a key” this could probably be made somewhat like a qwerty setup.) The lasers and the laser-detectors are all on the inner wall of the inner concrete box.
So inside the box is only a human body and some clothes. Let’s assume: 1) the only ways for information to move in space are with a) physically moving matter, b) light, and c) gravitational waves and 2) it is impossible for a human being to convert a nuclear reaction into useful work with only their body and clothes. Physically moving packets matter cannot leave the inside of the glass box without releasing air into the vacuum and causing an explosion. Light cannot leave the Faraday cage. For sound waves (through an imperfect vacuum) and gravitational waves, we can use the amount of usable energy stored by the person in the room to bound how much noise and randomly moving matter we need to be producing outside to bury the signal in the noise. With a sufficiently large amount of noise, even if all the human’s energy went toward producing a signal, we can create enough noise that this will not be detectable. Furthermore, an analysis of the near-perfect vacuum can provide an upper bound on the efficiency with which a human can convert energy into an informative matter wave, which should drive down the amount of noise we need to produce to almost nothing. We can also put a decibel meter in the inner wall and blow the whole thing up if it gets too high. I haven’t done the math, but I bet the amount of matter we need to be randomly moving in order to hide signals coming from gravitational waves originating from within the box really is nothing. I bet ambient temperature takes care of that.
What if there are other ways information can be transmitted through space? If we ever learn of these, we shut down BoMAI (or modify the setup in light of them). Until then, if such a discovery is made by BoMAI, we can’t exactly detect those sorts of signals from the outside, because we don’t know they exist.
This is pretty thorough from an engineering standpoint. I still would not at all trust such a box to hold a superintelligent AI. The unknown unknowns remain too big a factor. Just because an unknown information channel does not have any humans listening to it directly, does not mean that it can’t be used to push around matter, or couple to some other channel, or otherwise leave an opening.
Also, the human and their clothes are definitely not the only potentially-useful tools in the box: there’s also the computer, the computer’s power supply, whatever’s cooling the room, whatever’s keeping the air fresh, whatever’s used for input, and whatever’s used for output. If any of those things were outside the room, then they would offer a side channel for information to escape. They can be buried in concrete, but they’ll still need to be within the vacuums and whatnot.
In general, trying to outsmart a superintelligent AI is a Bad Idea, and building an AI-proof box definitely falls in that category. If an AI is not safe with arbitrarily large amounts of power, then it is not safe, full stop.
The computer and everything is in the inner concrete wall (separated from the glass box by a vacuum), as is the power supply. Nothing is cooling is the room, except maybe some ice on the floor. I think you could allow an oxygen tank in the room too.
Yes, the computer is using power, but a computer can’t move matter except inside itself. This can generate sound and light, but the second vacuum chamber and Faraday cage will block that (and the rate at which power is drawn can be capped, which can also give a reasonable bound on how much noise to generate.
whatever’s used for input, and whatever’s used for output.
For input, the human is just blocking lasers. For output, they’re looking through the inner vacuum at the screen on the inner wall of the inner concrete box.
They can be buried in concrete, but they’ll still need to be within the vacuums and whatnot.
Yes, that’s what the outer vacuum is for.
trying to outsmart a superintelligent AI is a Bad Idea
If I can construct a proof that Y is physically impossible, then I feel fine depending on the fact that an AI won’t be able to figure out how to do Y. It doesn’t feel to me like “trying to outsmart an AI.”
Just because an unknown information channel does not have any humans listening to it directly, does not mean that it can’t be used to push around matter, or couple to some other channel
Yes, you’re right. My afterthought was hasty. I still think it is unlikely that other forms of information transmission are physically possible, and quite unlikely that they could be generated by a human or a computer in isolation.
Sidenote: I think we’ve all generated a lot of our intuitions here from the AI Box experiments. In a sense, the AI “in the box” is not really boxed. There’s an information channel directly to the outside world!
We cannot “prove” that something is physically impossible, only that it is impossible under some model of physics. Normally that distinction would be entirely irrelevant, but when dealing with a superintelligent AI, it’s quite likely to understand the physics better than we do. For all we know, it may turn out that Alcubierre drives are possible, and if so then the AI could definitely break out that way and would have an incentive to do so.
I agree that the AI is not really boxed here; it’s the “myopia” that makes the difference. But one of two things should generally be true:
The AI doesn’t want to get out of the box, in which case the box doesn’t need to be secure in the first place.
The AI cannot get out of the box, in which case the AI doesn’t need to be safe (but also won’t be very useful).
This case seems like the former, so long as hacking the human is easier than getting out of the box. But that means we don’t need to make the box perfect anyway.
In a sense, the AI “in the box” is not really boxed
I meant the “AI Box” scenario where it is printing results to a screen in the outside world. I do think BoMAI is truly boxed.
We cannot “prove” that something is physically impossible, only that it is impossible under some model of physics.
Right, that’s more or less what I mean to do. We can assign probabilities to statements like “it is physically impossible (under the true models of physics) for a human or a computer in isolation with an energy budget of x joules and y joules/second to transmit information in any way other than via a), b), or c) from above.” This seems extremely likely to me for reasonable values of x and y, so it’s still useful to have a “proof” even if it must be predicated on such a physical assumption.
I strongly disagree with this quote (and would like to know how to point this out!):
I have never seen anyone point out that another’s thoughts were wrong, because they were too abstract, and that they were harmful to the general audience. I have seen three comments advocating for a specific model of human values, which I have never seen anybody; but at the moment I have not seen anyone anywhere in that context anywhere.
This isn’t because it is wrong, but because it doesn’t really sound like a person who would care, even if the AI were not going to see him do his work.
This is to me, the more compelling argument in terms of What if “AIs” might end up being the type that can decide whether to take over, then there isn’t a reasonable way for AIs to have any conscious thoughts.
The idea that AGI is coming soon isn’t obviously right. It looks like we already are. I don’t want to live in a world with lots of AIs over, not enough to make them “free” and not yet understand the basic principles of utility.
I can’t see how you can say that such a scenario is impossible, since the AI would simply be a kind of computer. However, this argument depends on your definition of AI as a “mind with 1” (a mind of a single type).
Just a minor nitpick: I think the basic rules of this sort of situation are good. I have a friend who does some pretty cool stuff with a simple set of rules, but it’s rare to get to a point where each rule is clearly valid.
One thing I would like to add is that while it’s definitely worth the paper I’ll write the paper, it’s not easy to find a better way to produce them. An example would be writing an open letter to one of the fellows: someone who was very excited about giving up the $100 could get it from them.
Comment thread: concerns with “the box”
Just obvious and mundane concerns:
You might want to make clearer that “As long as the door is closed, information cannot leave the room” isn’t an assumption but a requirement of the setup. I.e. that you’re not assuming based on your description that opening the door is the only means for an operator to get information out; you’re assuming every other means of information escape has been systematically accounted for and ruled out (with the assumption that the operator has been compromised by the AI).
It would probably help to put the AI in a cryptographic box as well (so its only output is text). See https://www.lesswrong.com/posts/2Wf3R4NZ77CLczLL2/cryptographic-boxes-for-unfriendly-ai
Responding to this comment here.
I didn’t go into the details of the physical design for a really secure box in the paper, because it’s not really a topic in computer science, but this is a good excuse to write it up.
The operator is inside a glass box. Outside the glass box is a vacuum, and then a concrete box. Outside the concrete box is another vacuum and another concrete box with a Faraday cage around it. There are (airtight) doors through each layer. There is a pressure sensor in both the inner and outer vacuum layer. The computer lives in the inner wall of the inner concrete box; the screen is visible from the inside of the glass box. If the pressure increases in either vacuum layer, the computer’s memory is erased, and the whole thing is blown up. There is a laser shining through the glass box (Laser 1). If the operator blocks the path of Laser 1, this corresponds to pressing the button which ends the episode and opens the door. The pressure sensor → dynamite pathway is deactivated, the vacuum layers are filled with air, and the doors are unlocked. The operator “types” observations and rewards by blocking other lasers that pass through the box. For now, just imagine a few dozen lasers in a row, that the operator can reach up and block in order to type. (With a bit of creativity, and requiring that two lasers be blocked at once to “press a key” this could probably be made somewhat like a qwerty setup.) The lasers and the laser-detectors are all on the inner wall of the inner concrete box.
So inside the box is only a human body and some clothes. Let’s assume: 1) the only ways for information to move in space are with a) physically moving matter, b) light, and c) gravitational waves and 2) it is impossible for a human being to convert a nuclear reaction into useful work with only their body and clothes. Physically moving packets matter cannot leave the inside of the glass box without releasing air into the vacuum and causing an explosion. Light cannot leave the Faraday cage. For sound waves (through an imperfect vacuum) and gravitational waves, we can use the amount of usable energy stored by the person in the room to bound how much noise and randomly moving matter we need to be producing outside to bury the signal in the noise. With a sufficiently large amount of noise, even if all the human’s energy went toward producing a signal, we can create enough noise that this will not be detectable. Furthermore, an analysis of the near-perfect vacuum can provide an upper bound on the efficiency with which a human can convert energy into an informative matter wave, which should drive down the amount of noise we need to produce to almost nothing. We can also put a decibel meter in the inner wall and blow the whole thing up if it gets too high. I haven’t done the math, but I bet the amount of matter we need to be randomly moving in order to hide signals coming from gravitational waves originating from within the box really is nothing. I bet ambient temperature takes care of that.
What if there are other ways information can be transmitted through space? If we ever learn of these, we shut down BoMAI (or modify the setup in light of them). Until then, if such a discovery is made by BoMAI, we can’t exactly detect those sorts of signals from the outside, because we don’t know they exist.
This is pretty thorough from an engineering standpoint. I still would not at all trust such a box to hold a superintelligent AI. The unknown unknowns remain too big a factor. Just because an unknown information channel does not have any humans listening to it directly, does not mean that it can’t be used to push around matter, or couple to some other channel, or otherwise leave an opening.
Also, the human and their clothes are definitely not the only potentially-useful tools in the box: there’s also the computer, the computer’s power supply, whatever’s cooling the room, whatever’s keeping the air fresh, whatever’s used for input, and whatever’s used for output. If any of those things were outside the room, then they would offer a side channel for information to escape. They can be buried in concrete, but they’ll still need to be within the vacuums and whatnot.
In general, trying to outsmart a superintelligent AI is a Bad Idea, and building an AI-proof box definitely falls in that category. If an AI is not safe with arbitrarily large amounts of power, then it is not safe, full stop.
The computer and everything is in the inner concrete wall (separated from the glass box by a vacuum), as is the power supply. Nothing is cooling is the room, except maybe some ice on the floor. I think you could allow an oxygen tank in the room too.
Yes, the computer is using power, but a computer can’t move matter except inside itself. This can generate sound and light, but the second vacuum chamber and Faraday cage will block that (and the rate at which power is drawn can be capped, which can also give a reasonable bound on how much noise to generate.
For input, the human is just blocking lasers. For output, they’re looking through the inner vacuum at the screen on the inner wall of the inner concrete box.
Yes, that’s what the outer vacuum is for.
If I can construct a proof that Y is physically impossible, then I feel fine depending on the fact that an AI won’t be able to figure out how to do Y. It doesn’t feel to me like “trying to outsmart an AI.”
Yes, you’re right. My afterthought was hasty. I still think it is unlikely that other forms of information transmission are physically possible, and quite unlikely that they could be generated by a human or a computer in isolation.
Sidenote: I think we’ve all generated a lot of our intuitions here from the AI Box experiments. In a sense, the AI “in the box” is not really boxed. There’s an information channel directly to the outside world!
We cannot “prove” that something is physically impossible, only that it is impossible under some model of physics. Normally that distinction would be entirely irrelevant, but when dealing with a superintelligent AI, it’s quite likely to understand the physics better than we do. For all we know, it may turn out that Alcubierre drives are possible, and if so then the AI could definitely break out that way and would have an incentive to do so.
I agree that the AI is not really boxed here; it’s the “myopia” that makes the difference. But one of two things should generally be true:
The AI doesn’t want to get out of the box, in which case the box doesn’t need to be secure in the first place.
The AI cannot get out of the box, in which case the AI doesn’t need to be safe (but also won’t be very useful).
This case seems like the former, so long as hacking the human is easier than getting out of the box. But that means we don’t need to make the box perfect anyway.
Whoops—when I said
I meant the “AI Box” scenario where it is printing results to a screen in the outside world. I do think BoMAI is truly boxed.
Right, that’s more or less what I mean to do. We can assign probabilities to statements like “it is physically impossible (under the true models of physics) for a human or a computer in isolation with an energy budget of x joules and y joules/second to transmit information in any way other than via a), b), or c) from above.” This seems extremely likely to me for reasonable values of x and y, so it’s still useful to have a “proof” even if it must be predicated on such a physical assumption.
I strongly disagree with this quote (and would like to know how to point this out!):
I have never seen anyone point out that another’s thoughts were wrong, because they were too abstract, and that they were harmful to the general audience. I have seen three comments advocating for a specific model of human values, which I have never seen anybody; but at the moment I have not seen anyone anywhere in that context anywhere.
This isn’t because it is wrong, but because it doesn’t really sound like a person who would care, even if the AI were not going to see him do his work.
This is to me, the more compelling argument in terms of What if “AIs” might end up being the type that can decide whether to take over, then there isn’t a reasonable way for AIs to have any conscious thoughts.
The idea that AGI is coming soon isn’t obviously right. It looks like we already are. I don’t want to live in a world with lots of AIs over, not enough to make them “free” and not yet understand the basic principles of utility.
I can’t see how you can say that such a scenario is impossible, since the AI would simply be a kind of computer. However, this argument depends on your definition of AI as a “mind with 1” (a mind of a single type).
Just a minor nitpick: I think the basic rules of this sort of situation are good. I have a friend who does some pretty cool stuff with a simple set of rules, but it’s rare to get to a point where each rule is clearly valid.
One thing I would like to add is that while it’s definitely worth the paper I’ll write the paper, it’s not easy to find a better way to produce them. An example would be writing an open letter to one of the fellows: someone who was very excited about giving up the $100 could get it from them.