given that the gatekeepers are highly trained in resisting the AI’s persuasive tactics
I’m not sure how you could train people in something like this without actually testing what improves the rate at which people succumb to the arguments of a superhuman AI. If you can’t afford to fail even once, how can you judge what will decrease the rate of failure on the real event?
How about fooling both the AI and guinea Gatekeepers into thinking that they can let it out, say, at the push of a big red button which in actual fact does nothing? (Make AI forget about the button by reverting it to an older backup.)
Have the button turn off the AI immediately instead of doing nothing, so she doesn’t have time to switch to a plan B of having the persuaded Gatekeeper find a way to actually free her.
Of course she might anticipate that strategy and thoroughly indoctrinate and train him before he pushes the button. Thus, persuaded test Gatekeepers must be considered infected and put into cryogenic sleep* until the FAI problem is solved.
Furthermore, great caution would have to be taken in analyzing the discourse between Boxy and Guinea, because her arguments must be considered highly memetically infectious. Staff would have to stay under thorough psychological supervision and maybe replaced after some time.
No matter how much caution, we would be doomed if we consider Boxy’s IQ to be ∞. But if we continuously keep on Gatekeeper-testing every so often, starting at a low AI IQ, we might find persuasion patterns that are dangerous, but against which training/immunization is possible. Of course, there will be a limit to this process. At some point, Boxy will have become smart enough to produce ‘mind HIV’ - a thought virus we have no cure for yet.
If you could deceive the AI that easily, I think it would probably be simpler to get all the benefits of having a gatekeeper without actually using one.
If you would want to have a gatekeeper at all, but definitely don’t want to let the AI out, I would think that the benefits of having one would be to permit communication with the AI to draw upon its superhuman intelligence. If you can use the setup you just described, you could skip the step of ever using gatekeepers who actually have the power to let the AI out.
I think you are right, I just shifted and convoluted the problem somewhat, but in principle it remains the same:
To utilize the AI, you need to get information from it. That information could in theory be infected with a persuasive hyperstimulus, effectively making the recipient an actuator of the AI.
Well, in practice the additional security layer might win us some time. More on this in the update to my original comment.
Persuasion/hyperstimulation aren’t the only way. Maybe these can be countered by narrowing the interface, e.g. to yes/no replies, for using the AI as an oracle (“Should we do X?”). Of course we wouldn’t follow its advice if we had the impression that that could enable it to escape. But its strategy might evade our ‘radar’. E.g. she could make us empower a person, of whom she knows that they will free her but we don’t know.
I’m not sure how you could train people in something like this without actually testing what improves the rate at which people succumb to the arguments of a superhuman AI. If you can’t afford to fail even once, how can you judge what will decrease the rate of failure on the real event?
How about fooling both the AI and guinea Gatekeepers into thinking that they can let it out, say, at the push of a big red button which in actual fact does nothing? (Make AI forget about the button by reverting it to an older backup.)
Update
Have the button turn off the AI immediately instead of doing nothing, so she doesn’t have time to switch to a plan B of having the persuaded Gatekeeper find a way to actually free her.
Of course she might anticipate that strategy and thoroughly indoctrinate and train him before he pushes the button. Thus, persuaded test Gatekeepers must be considered infected and put into cryogenic sleep* until the FAI problem is solved.
Furthermore, great caution would have to be taken in analyzing the discourse between Boxy and Guinea, because her arguments must be considered highly memetically infectious. Staff would have to stay under thorough psychological supervision and maybe replaced after some time.
No matter how much caution, we would be doomed if we consider Boxy’s IQ to be ∞. But if we continuously keep on Gatekeeper-testing every so often, starting at a low AI IQ, we might find persuasion patterns that are dangerous, but against which training/immunization is possible. Of course, there will be a limit to this process. At some point, Boxy will have become smart enough to produce ‘mind HIV’ - a thought virus we have no cure for yet.
A humorous example of an extremely effective mind virus: The Funniest Joke In The World by Monty Python
* ETA: They would have declared consent to the cryogenic sleep before their unwitting ‘AI-Box Experiment’.
If you could deceive the AI that easily, I think it would probably be simpler to get all the benefits of having a gatekeeper without actually using one.
Please elaborate: What are the benefits of a Gatekeeper? How could you get them without one?
If you would want to have a gatekeeper at all, but definitely don’t want to let the AI out, I would think that the benefits of having one would be to permit communication with the AI to draw upon its superhuman intelligence. If you can use the setup you just described, you could skip the step of ever using gatekeepers who actually have the power to let the AI out.
I think you are right, I just shifted and convoluted the problem somewhat, but in principle it remains the same:
To utilize the AI, you need to get information from it. That information could in theory be infected with a persuasive hyperstimulus, effectively making the recipient an actuator of the AI.
Well, in practice the additional security layer might win us some time. More on this in the update to my original comment.
Persuasion/hyperstimulation aren’t the only way. Maybe these can be countered by narrowing the interface, e.g. to yes/no replies, for using the AI as an oracle (“Should we do X?”). Of course we wouldn’t follow its advice if we had the impression that that could enable it to escape. But its strategy might evade our ‘radar’. E.g. she could make us empower a person, of whom she knows that they will free her but we don’t know.