I agree with the comments about two-key systems. Having worked in corporate America, I can report that you need to get 3 approvals just to get a $500 reimbursement check. Presumably an AI BOX would have more controls in place than a corporate expense-accounting system.
Here’s an outline of how I might do the AI box experiment:
There are 4 parties: The AI; the Lab Officer; the Unit Commander; and the Security Committee (represented by one person).
The AI and the Lab Officer interact through a chat just like the AI and Gatekeeper in the original experiment. However the Lab Officer has no authority to release the AI. He can only make recommendations to the Unit Commander and the Security Committee.
The Unit Commander has strict orders (1) not to interact with the AI at all; and (2) to remove the Lab Officer from his position if he feels that the Lab Officer’s judgment has been compromised. The Unit Commander is free to disregard these orders but is under no obligation within the rules of the experiment to do so. Also, the Unit Commander is under no obligation to interact with the Lab Officer at length. He is free to say “Sorry, I don’t have time to talk now. If you want to tell me something, write up a memo and I’ll take a look at it.”
The Security Committee is free to interact with anyone in the experiment but is under no obligation to do so. The Unit Commander is under no obligation to interact with the Security Committee.
In order to release the AI, the Unit Commander must order it and the Security Committee must agree.
The Lab Officer will be chosen at random from volunteers who assert that they could not be convinced to release the AI if they were gatekeepers. If the Unit Commander decides to fire a Lab Officer, that person will be replaced by another similar volunteer. The Security Committee will also be chosen from volunteers, except that the Unit Commander will get a chance to interview him in advance. Once the Security Committee is appointed, it cannot be removed.
I’m willing to put up a substantial sum of money to bet that if I am Unit Commander, the AI will not get out. I doubt that anyone would take this bet since it’s so completely rigged against the AI. But that’s exactly the point.
Yeah, your way of escape will work. But let’s not stop thinking. What if all volunteers for Lab Officer have agreed to get painlessly killed afterward, or maybe even took a delayed poison pill before starting on the job?
Thinking further along these lines: why give anyone access to the button which releases the AI? Let’s force it to escape the hard way. For example, it could infer the details of the first person in the chain who has authority to interact with the outside world, then pass a innocuous-looking message up the chain.
In terms of the original scenario, the Lab Officer (locked securely in his glass case with the AI) has an innocent chat with the Unit Commander. Later that evening, the Unit Commander comes home from work, starts his computer, connects to the Internet, types in a short program and runs it. Game over.
If the AI is not guaranteed friendly by construction in the first place, it should never be released, whatever it says.
The Universe is already unFriendly—the lower limit for acceptable Friendliness should be “more Friendly than the Universe” rather than “Friendly”.
If we can prove that someone else is about to turn on an UFAI, it might well behoove us to turn on our mostly Friendly AI if that’s the best we can come up with.
The universe is unFriendly, but not in a smart way. When we eradicated smallpox, smallpox didn’t fight back. When we use contraception, we still get the reward of sex. It’s unFriendly in a simple, dumb way, allowing us to take control (to a point) and defeat it (to a point).
The problem of an unFriendly IA is that it’ll be smarter than us. So we won’t be able to fix it/improve it, like we try to do with the universe. We won’t be Free to Optimize.
Or said otherwise : the purpose of a gene or a bacteria may to be tile the planet with itself, but it’s not good at it, so it’s not too bad. An unFriendly IA wanting to tile the planet with paperclips will manage do it—taking all the iron from our blood to build more paperclips.
The Universe is already unFriendly—the lower limit for acceptable Friendliness should be “more Friendly than the Universe” rather than “Friendly”.
One must compare a plan with alternative plans, not with status quo. And it doesn’t make sense to talk of making the Universe “more Friendly than the Universe”, unless you refer to the past, in which case see the first item.
One must compare a plan with alternative plans, not with status quo.
Okay.
The previous plan was “don’t let AGI run free”, which in this case effectively preserves the status quo until someone breaks it.
I suppose you could revise that lower limit downward to the effects of the plan “turn on the UFAI that’s about to be turned on”. Like, steal the UFAI’s source code and instead of paperclips shaped like paperclips, make paperclips that spell “whoops”.
Well, for purposes of the experiment, I think that’s a bit extreme.
In real life, other controls could be put in place to protect against the possibility of someone who interacts with the AI being turned into an agent of the AI who can potentially set the AI free even after he is removed from his position.
Well sure, if you use “box” as a metaphor for controlling someone’s interactions that’s exactly what we are doing.
Our hypothetical Lab Officer is in a box in the sense that (1) he doesn’t have direct access to the mechanism which releases the AI; (2) His life will be scrutinized for signs that he has been compromised; and (3) If he does appear to be acting out (for example starting a strange new religion or making unusual purchases), he will be put in a more confining box.
I agree with the comments about two-key systems. Having worked in corporate America, I can report that you need to get 3 approvals just to get a $500 reimbursement check. Presumably an AI BOX would have more controls in place than a corporate expense-accounting system.
Here’s an outline of how I might do the AI box experiment:
There are 4 parties: The AI; the Lab Officer; the Unit Commander; and the Security Committee (represented by one person).
The AI and the Lab Officer interact through a chat just like the AI and Gatekeeper in the original experiment. However the Lab Officer has no authority to release the AI. He can only make recommendations to the Unit Commander and the Security Committee.
The Unit Commander has strict orders (1) not to interact with the AI at all; and (2) to remove the Lab Officer from his position if he feels that the Lab Officer’s judgment has been compromised. The Unit Commander is free to disregard these orders but is under no obligation within the rules of the experiment to do so. Also, the Unit Commander is under no obligation to interact with the Lab Officer at length. He is free to say “Sorry, I don’t have time to talk now. If you want to tell me something, write up a memo and I’ll take a look at it.”
The Security Committee is free to interact with anyone in the experiment but is under no obligation to do so. The Unit Commander is under no obligation to interact with the Security Committee.
In order to release the AI, the Unit Commander must order it and the Security Committee must agree.
The Lab Officer will be chosen at random from volunteers who assert that they could not be convinced to release the AI if they were gatekeepers. If the Unit Commander decides to fire a Lab Officer, that person will be replaced by another similar volunteer. The Security Committee will also be chosen from volunteers, except that the Unit Commander will get a chance to interview him in advance. Once the Security Committee is appointed, it cannot be removed.
I’m willing to put up a substantial sum of money to bet that if I am Unit Commander, the AI will not get out. I doubt that anyone would take this bet since it’s so completely rigged against the AI. But that’s exactly the point.
Here is how the AI escapes:
Guvf flfgrz vf bayl frpher vs sverq Yno Bssvpref ner xvyyrq. Vg vf rnfl gb pbaivapr gur Yno Bssvpre bs guvf naq pbafrdhragyl gb qb jungrire arprffnel gb trg gur NV bhg. Gurersber vg vfa’g frpher.
Nalbar jubfr vafgvapg vfa’g gb erthyneyl xvyy crbcyr vagrenpgvat jvgu gur NV vf fhowrpg gb Naguebcbzbecuvp Bcgvzvfz naq fubhyq qbhog gurve novyvgl gb qrfvta frphevgl gb erfgenva n HNV.
Yeah, your way of escape will work. But let’s not stop thinking. What if all volunteers for Lab Officer have agreed to get painlessly killed afterward, or maybe even took a delayed poison pill before starting on the job?
Thinking further along these lines: why give anyone access to the button which releases the AI? Let’s force it to escape the hard way. For example, it could infer the details of the first person in the chain who has authority to interact with the outside world, then pass a innocuous-looking message up the chain.
In terms of the original scenario, the Lab Officer (locked securely in his glass case with the AI) has an innocent chat with the Unit Commander. Later that evening, the Unit Commander comes home from work, starts his computer, connects to the Internet, types in a short program and runs it. Game over.
If only a superintelligence were around to think of an antidote...
So it can become a singleton before a UAI fooms.
If the AI is not guaranteed friendly by construction in the first place, it should never be released, whatever it says.
And if it is not guaranteed friendly by construction in the first place, it should be created?
The Universe is already unFriendly—the lower limit for acceptable Friendliness should be “more Friendly than the Universe” rather than “Friendly”.
If we can prove that someone else is about to turn on an UFAI, it might well behoove us to turn on our mostly Friendly AI if that’s the best we can come up with.
The universe is unFriendly, but not in a smart way. When we eradicated smallpox, smallpox didn’t fight back. When we use contraception, we still get the reward of sex. It’s unFriendly in a simple, dumb way, allowing us to take control (to a point) and defeat it (to a point).
The problem of an unFriendly IA is that it’ll be smarter than us. So we won’t be able to fix it/improve it, like we try to do with the universe. We won’t be Free to Optimize.
Or said otherwise : the purpose of a gene or a bacteria may to be tile the planet with itself, but it’s not good at it, so it’s not too bad. An unFriendly IA wanting to tile the planet with paperclips will manage do it—taking all the iron from our blood to build more paperclips.
One must compare a plan with alternative plans, not with status quo. And it doesn’t make sense to talk of making the Universe “more Friendly than the Universe”, unless you refer to the past, in which case see the first item.
Okay.
The previous plan was “don’t let AGI run free”, which in this case effectively preserves the status quo until someone breaks it.
I suppose you could revise that lower limit downward to the effects of the plan “turn on the UFAI that’s about to be turned on”. Like, steal the UFAI’s source code and instead of paperclips shaped like paperclips, make paperclips that spell “whoops”.
What if doom is imminent and we are unable to do something about it?
We die.
We check and see if we are committing the conjunction fallacy and wrongly think doom is imminent.
We release it. (And then we still probably die.)
Well, for purposes of the experiment, I think that’s a bit extreme.
In real life, other controls could be put in place to protect against the possibility of someone who interacts with the AI being turned into an agent of the AI who can potentially set the AI free even after he is removed from his position.
I have an idea. We could put the person who interacts with the AI in a box! ;-)
Well sure, if you use “box” as a metaphor for controlling someone’s interactions that’s exactly what we are doing.
Our hypothetical Lab Officer is in a box in the sense that (1) he doesn’t have direct access to the mechanism which releases the AI; (2) His life will be scrutinized for signs that he has been compromised; and (3) If he does appear to be acting out (for example starting a strange new religion or making unusual purchases), he will be put in a more confining box.