I just thought of a question. If there is a boxed AI that has access to the internet, but only through Get requests, it might still communicate with the outside world through network traffic patterns. I’m reading a book right now where the AI overloads pages on dictionary websites to recruit programmers under the guise of it being a technical recruiting challenge.
My question: should we raise awareness of this escape avenue so that if, in the year 2030, a mid level web dev gets a mysterious message through web traffic, they know enough to be suspicious?
Then another basic question? Why have we given up? I know that an ASI will almost definitely be uncontainable. But that does not mean that it can’t be hindered significantly given an asymmetric enough playing field.
Stockfish would beat me 100 times in a row, even playing without a queen. But take away its rooks as well, and I can usually beat it. Easy avenues to escaping the box might be the difference between having a fire alarm and not having one.
So here’s my memory of the history of this discourse. Please know that I haven’t been a highly engaged member of this community for very long and this is all from memory/is just my perspective.
About 10 or 15 years ago, people debated AI boxing. Eliezer Yudkowsky was of course highly prominent in this debate, and his position has always been that AGI is uncontainable. He also believes in a very fast take off, so he does not think that we will have much experience with weak AGIs before ruin happens. To prove his point, he engaged in a secret role playing game played with others on LW, where he took on the role of an AI and another player took on the role of a human trying to keep the AI boxed. The game was played twice, for betting money (which I believe was donated to charity). EY won both times by somehow persuading the other player to let him out of the box. EY insisted that the actual dialog of the game be kept secret.
After that, the community mostly stopped talking about boxing, and no one pushed for the position that AI labs like OpenAI or DeepMind should keep their AIs in a box. It’s just not something anyone is advocating for. You’re certainly welcome to reopen this and make a push for boxing, if you think it has some merit. Check out the AI Boxing tag on LW to learn more. (In my view, this community is too dismissive of partial solutions, and we should be more open to the Swiss Cheese Model since we don’t have the ultimate solution to the problem.)
The consequence is that the big AI labs, who do at least pay lip service to AI safety, are under no community pressure to do any kind of containment. Recently, OpenAI experimented with an AI that had complete web access. Other groups have experimented with AIs that access the bash shell. If AI boxing was a safety approach that the community advocated for, then we would be pressuring labs to stop this kind of research or at least be open about their security measures (i.e. always running in a VM, can only make GET requests, here’s how we know that it can’t leak out, etc)
I don’t think this is an accurate summary of historical discourse. I think there is broad consensus that boxing is very far from something that can contain AI for long given continuous progress on capabilities, but I don’t think there is consensus on whether it’s a good idea to do anyways in the meantime, in the hope of buying a few months or maybe even years if you do it well, and takeoff dynamics are less sharp. I personally think it seems pretty reasonably to prevent AIs from accessing the internet, as is argued for in the recent discussion around Web-GPT.
Thank you for the response. I do think there should be at least some emphasis on boxing. I mean, hell. If we just give AIs unrestricted access to the web, they don’t even need to be general to wreak havoc. That’s how you end up with a smart virus, if not worse.
I just thought of a question. If there is a boxed AI that has access to the internet, but only through Get requests, it might still communicate with the outside world through network traffic patterns. I’m reading a book right now where the AI overloads pages on dictionary websites to recruit programmers under the guise of it being a technical recruiting challenge.
My question: should we raise awareness of this escape avenue so that if, in the year 2030, a mid level web dev gets a mysterious message through web traffic, they know enough to be suspicious?
More likely, the AI just finds a website with a non-compliant GET request, or a GET request with a SQL injection vulnerability.
So in your opinion, is an AI with access to GET requests essentially already out of the box?
I think most people have given up on the idea of containing the AI, and now we’re just trying to figure out how to align the AI directly.
Then another basic question? Why have we given up? I know that an ASI will almost definitely be uncontainable. But that does not mean that it can’t be hindered significantly given an asymmetric enough playing field.
Stockfish would beat me 100 times in a row, even playing without a queen. But take away its rooks as well, and I can usually beat it. Easy avenues to escaping the box might be the difference between having a fire alarm and not having one.
So here’s my memory of the history of this discourse. Please know that I haven’t been a highly engaged member of this community for very long and this is all from memory/is just my perspective.
About 10 or 15 years ago, people debated AI boxing. Eliezer Yudkowsky was of course highly prominent in this debate, and his position has always been that AGI is uncontainable. He also believes in a very fast take off, so he does not think that we will have much experience with weak AGIs before ruin happens. To prove his point, he engaged in a secret role playing game played with others on LW, where he took on the role of an AI and another player took on the role of a human trying to keep the AI boxed. The game was played twice, for betting money (which I believe was donated to charity). EY won both times by somehow persuading the other player to let him out of the box. EY insisted that the actual dialog of the game be kept secret.
After that, the community mostly stopped talking about boxing, and no one pushed for the position that AI labs like OpenAI or DeepMind should keep their AIs in a box. It’s just not something anyone is advocating for. You’re certainly welcome to reopen this and make a push for boxing, if you think it has some merit. Check out the AI Boxing tag on LW to learn more. (In my view, this community is too dismissive of partial solutions, and we should be more open to the Swiss Cheese Model since we don’t have the ultimate solution to the problem.)
The consequence is that the big AI labs, who do at least pay lip service to AI safety, are under no community pressure to do any kind of containment. Recently, OpenAI experimented with an AI that had complete web access. Other groups have experimented with AIs that access the bash shell. If AI boxing was a safety approach that the community advocated for, then we would be pressuring labs to stop this kind of research or at least be open about their security measures (i.e. always running in a VM, can only make GET requests, here’s how we know that it can’t leak out, etc)
I don’t think this is an accurate summary of historical discourse. I think there is broad consensus that boxing is very far from something that can contain AI for long given continuous progress on capabilities, but I don’t think there is consensus on whether it’s a good idea to do anyways in the meantime, in the hope of buying a few months or maybe even years if you do it well, and takeoff dynamics are less sharp. I personally think it seems pretty reasonably to prevent AIs from accessing the internet, as is argued for in the recent discussion around Web-GPT.
Thank you for the response. I do think there should be at least some emphasis on boxing. I mean, hell. If we just give AIs unrestricted access to the web, they don’t even need to be general to wreak havoc. That’s how you end up with a smart virus, if not worse.