Note that the AI box setting is not one which security-minded people would consider “competent”; once you’re convinced that AI is dangerous and persuasive, the minimum safeguard would be to require multiple people to be present when interacting with the box, and to only allow release with the assent of a significant number of people.
It is, after all, much harder to convince a group of mutually-suspicious humans than to convince one lone person.
(This is not a knock on EY’s experiment, which does indeed test a level of security that really was proposed by several real-world people; it is a knock on their security systems.)
I think this is making a five-inch fence half an inch higher. It’s just not relevant on the scale of an agent to which a human is a causal system made of brain areas and a group of humans is just another causal system made of several interacting copies of those brain areas.
I agree that the AI you envision would be dangerously likely to escape a “competent” box too; and in any case, even if you manage to keep the AI in the box, attempts to actually use any advice it gives are extremely dangerous.
That said, I think your “half an inch” is off by multiple orders of magnitude.
It is, after all, much harder to convince a group of mutually-suspicious humans than to convince one lone person.
That sounds right. Would you have evidence to back up the intuition? (This knowledge would also be useful for marketing and other present life persuasion purposes.)
#(
TL;DR: Mo’ people—mo’ problems?
I can think of effects that could theoretically make it easier to convince a group:
For some reason, Boxy might be better skilled at manipulating social/group dynamics than at influencing a lone wolf.
More people make the system more complex. Complexity generally increases the likelihood of security holes.
Every extra person makes another target and will bring new soft spots to the table, which the AI could pounce on.
Supposing that the most competent person available would get the position of the lone Gatekeeper, the average competence would fall when adding more staff.
Then the machine could go for an inductive approach—convince the weakest link first, proceed from there with this human ally on her side.
Persuaded humans could principally be employed as actuators, e.g. for pressuring, even attacking opposing group members.
The lone wolf could be strong against a computer but weak against fellow humans.
Surely you will say “But any communication with the terminal will be supervised by everyone!” But that does not strictly make such influence impossible as far as I can tell.
Also the superintelligence could get creative, e.g. instill a discussion among the colleagues so that most of them are distracted.
(You could take preemptive measures against these worries, but Boxy might find security holes in every ‘firewall’ you come up with—an arms race we could win?)
My comment was mostly inspired by (known effective) real-worldexamples. Note that relieving anyone who shows signs of being persuaded is a de-emphasized but vital part of this policy, as is carefully vetting people before trusting them.
Actually implementing a “N people at a time” rule can be done using locks, guards and/or cryptography (note that many such algorithms are provably secure against an adversary with unlimited computing power, “information theoretic security”).
Note that the AI box setting is not one which security-minded people would consider “competent”; once you’re convinced that AI is dangerous and persuasive, the minimum safeguard would be to require multiple people to be present when interacting with the box, and to only allow release with the assent of a significant number of people.
It is, after all, much harder to convince a group of mutually-suspicious humans than to convince one lone person.
(This is not a knock on EY’s experiment, which does indeed test a level of security that really was proposed by several real-world people; it is a knock on their security systems.)
I think this is making a five-inch fence half an inch higher. It’s just not relevant on the scale of an agent to which a human is a causal system made of brain areas and a group of humans is just another causal system made of several interacting copies of those brain areas.
I agree that the AI you envision would be dangerously likely to escape a “competent” box too; and in any case, even if you manage to keep the AI in the box, attempts to actually use any advice it gives are extremely dangerous.
That said, I think your “half an inch” is off by multiple orders of magnitude.
That sounds right. Would you have evidence to back up the intuition? (This knowledge would also be useful for marketing and other present life persuasion purposes.)
#( TL;DR: Mo’ people—mo’ problems?
I can think of effects that could theoretically make it easier to convince a group:
For some reason, Boxy might be better skilled at manipulating social/group dynamics than at influencing a lone wolf.
More people make the system more complex. Complexity generally increases the likelihood of security holes.
Every extra person makes another target and will bring new soft spots to the table, which the AI could pounce on.
Supposing that the most competent person available would get the position of the lone Gatekeeper, the average competence would fall when adding more staff.
Then the machine could go for an inductive approach—convince the weakest link first, proceed from there with this human ally on her side.
Persuaded humans could principally be employed as actuators, e.g. for pressuring, even attacking opposing group members.
The lone wolf could be strong against a computer but weak against fellow humans.
Surely you will say “But any communication with the terminal will be supervised by everyone!” But that does not strictly make such influence impossible as far as I can tell.
Also the superintelligence could get creative, e.g. instill a discussion among the colleagues so that most of them are distracted.
(You could take preemptive measures against these worries, but Boxy might find security holes in every ‘firewall’ you come up with—an arms race we could win?)
#)
My comment was mostly inspired by (known effective) real-world examples. Note that relieving anyone who shows signs of being persuaded is a de-emphasized but vital part of this policy, as is carefully vetting people before trusting them.
Actually implementing a “N people at a time” rule can be done using locks, guards and/or cryptography (note that many such algorithms are provably secure against an adversary with unlimited computing power, “information theoretic security”).