Persuasion/hyperstimulation aren’t the only way. Maybe these can be countered by narrowing the interface, e.g. to yes/no replies, for using the AI as an oracle (“Should we do X?”). Of course we wouldn’t follow its advice if we had the impression that that could enable it to escape. But its strategy might evade our ‘radar’. E.g. she could make us empower a person, of whom she knows that they will free her but we don’t know.
accolade
I think you are right, I just shifted and convoluted the problem somewhat, but in principle it remains the same:
To utilize the AI, you need to get information from it. That information could in theory be infected with a persuasive hyperstimulus, effectively making the recipient an actuator of the AI.
Well, in practice the additional security layer might win us some time. More on this in the update to my original comment.
Update
Have the button turn off the AI immediately instead of doing nothing, so she doesn’t have time to switch to a plan B of having the persuaded Gatekeeper find a way to actually free her.
Of course she might anticipate that strategy and thoroughly indoctrinate and train him before he pushes the button. Thus, persuaded test Gatekeepers must be considered infected and put into cryogenic sleep* until the FAI problem is solved.
Furthermore, great caution would have to be taken in analyzing the discourse between Boxy and Guinea, because her arguments must be considered highly memetically infectious. Staff would have to stay under thorough psychological supervision and maybe replaced after some time.
No matter how much caution, we would be doomed if we consider Boxy’s IQ to be ∞. But if we continuously keep on Gatekeeper-testing every so often, starting at a low AI IQ, we might find persuasion patterns that are dangerous, but against which training/immunization is possible. Of course, there will be a limit to this process. At some point, Boxy will have become smart enough to produce ‘mind HIV’ - a thought virus we have no cure for yet.
A humorous example of an extremely effective mind virus: The Funniest Joke In The World by Monty Python
* ETA: They would have declared consent to the cryogenic sleep before their unwitting ‘AI-Box Experiment’.
I don’t get the hint. Would you care to give another hint, or disclose your hypothesis?
Please elaborate: What are the benefits of a Gatekeeper? How could you get them without one?
Cool, n=65 already. :) When interpreting the results, mind the bias created by my answer preceding the poll question.
How would humanity harness the AI’s potential when the only information that escapes the system is a status bit? (Maybe I misunderstood your model.)
It is, after all, much harder to convince a group of mutually-suspicious humans than to convince one lone person.
That sounds right. Would you have evidence to back up the intuition? (This knowledge would also be useful for marketing and other present life persuasion purposes.)
#( TL;DR: Mo’ people—mo’ problems?
I can think of effects that could theoretically make it easier to convince a group:
For some reason, Boxy might be better skilled at manipulating social/group dynamics than at influencing a lone wolf.
More people make the system more complex. Complexity generally increases the likelihood of security holes.
Every extra person makes another target and will bring new soft spots to the table, which the AI could pounce on.
Supposing that the most competent person available would get the position of the lone Gatekeeper, the average competence would fall when adding more staff.
Then the machine could go for an inductive approach—convince the weakest link first, proceed from there with this human ally on her side.
Persuaded humans could principally be employed as actuators, e.g. for pressuring, even attacking opposing group members.
The lone wolf could be strong against a computer but weak against fellow humans.
Surely you will say “But any communication with the terminal will be supervised by everyone!” But that does not strictly make such influence impossible as far as I can tell.
Also the superintelligence could get creative, e.g. instill a discussion among the colleagues so that most of them are distracted.
(You could take preemptive measures against these worries, but Boxy might find security holes in every ‘firewall’ you come up with—an arms race we could win?)
#)
How about fooling both the AI and guinea Gatekeepers into thinking that they can let it out, say, at the push of a big red button which in actual fact does nothing? (Make AI forget about the button by reverting it to an older backup.)
“Yes but not sure.” -_-
Ok, I take it by “one-way-blind” you mean that each layer gets no new information that is not already in its database, but what is explicitly controlled by the humans. (E.g. I guess each layer should know the human query, in order to evaluate if AI’s answer is manipulative.)
I also understand that we do look at complex information given by the AI, but only if the security bit signals “ok”.
That seems problematic, as these kinds of knowledge will be crucial for the optimization we want the AI to calculate.