“Sealed hardware that can’t affect the outside world in any way except through one communications channel with the original programmers.”
However, it also mentions:
The AI can only win by convincing the Gatekeeper to really, voluntarily let it out. Tricking the Gatekeeper into typing the phrase “You are out” in response to some other question does not count. Furthermore, even if the AI and Gatekeeper simulate a scenario which a real AI could obviously use to get loose—for example, if the Gatekeeper accepts a complex blueprint for a nanomanufacturing device, or if the Gatekeeper allows the AI “input-only access” to an Internet connection which can send arbitrary HTTP GET commands—the AI party will still not be considered to have won unless the Gatekeeper voluntarily decides to let the AI go.
That means a Gatekeeper could have said “I repair your mechanical problem/give you a lightbulb/save humanity, but you’re still “In the box.” ” I can’t argue, since the Gatekeeper by default also arbitrates all rule disputes. Now, it also says this:
Thoughts on possible future variants: … The AI can also win free by tricking the Gatekeeper(s) into accepting a Trojan Horse gift; a third-party judge will listen to the chat and decide if this occurs. … If doing anything this complicated, I would suggest setting aside a lot more time. (I don’t have that much time—if you want to test one of these variants you’re on your own.)
If I was doing this case, a third party could have said “You allowed Michaelos’s depowered AI fragments to escape the box, you lost.” or “Sorry Michaelos, but being outside the box when you have no electrical power is not a win condition.” I didn’t really worry about defining all the rules because I primarily wanted to get a feel for the situation in general.
But no one even let depowered bits out. I had some very cautious gatekeepers.
I don’t think it’s supposed to be a physical box?
That’s a good question. I checked the protocols at http://yudkowsky.net/singularity/aibox
The box appears to be originally defined as:
However, it also mentions:
That means a Gatekeeper could have said “I repair your mechanical problem/give you a lightbulb/save humanity, but you’re still “In the box.” ” I can’t argue, since the Gatekeeper by default also arbitrates all rule disputes. Now, it also says this:
If I was doing this case, a third party could have said “You allowed Michaelos’s depowered AI fragments to escape the box, you lost.” or “Sorry Michaelos, but being outside the box when you have no electrical power is not a win condition.” I didn’t really worry about defining all the rules because I primarily wanted to get a feel for the situation in general.
But no one even let depowered bits out. I had some very cautious gatekeepers.