“escape the testing environment” is poorly defined. Some people read it as “deduce the exploitable vulnerabilities in the system, hack into it, run itself with higher privileges, somehow transmit itself to other machines / the internet at large / infecting people’s brains snow-crash style”, and others read it as “convince the people running the test to give it more resources (and maybe infect their brains snow-crash style)”.
The former can be prevented by having a secure (air gapped?) system, the latter can be prevented by not running tests interactively and ignoring the moral issues with terminating (or suspending) what may possibly be an intelligent ‘person’.
It also implicitly assumes that its ability to improve its own intelligence (and therefore gain the ability to do either of the above) is unbounded by the resources of the system and will have no cost in terms of increased processing time.
I will note that the AI box experiment’s conditions expressly forbid a secure environment [i.e. one with inspection tools that cannot be manipulated by the AI]:
the results seen by the Gatekeeper shall again be provided by the AI party, which is assumed to be sufficiently advanced to rewrite its own source code, manipulate the appearance of its own thoughts if it wishes, and so on.
Because that’s not the part of the AI safety question that the AI box experiment is designed to test, so for the purpose of the experiment it says, “sure you might catch the AI in a lie, but assuming you don’t—”
Environment means both hardware and people—anything the AI has a chance to influence. We could use a narrower definition, but why should the AI respect it? By limiting our map we don’t limit the territory.
When the AI gets much smarter than humans, we may not understand the output of our inspection tools. They will give us huge amounts of data, and we will be unable to decipher what it all means.
Imagine a group of monkeys trying to enslave a human in a cave. Monkeys bring some objects from the jungle to the human and make him produce better food and toys for them (we want the AI to do some real-life optimization, otherwise it’s just money wasted on academic exercises). Monkeys understand that human getting closer to the entrance is trying to escape, and will threaten to kill him if he tries. But they don’t see the danger of human quietly sitting at the back of the cave, constructing a machine gun from the spare parts.
“escape the testing environment” is poorly defined. Some people read it as “deduce the exploitable vulnerabilities in the system, hack into it, run itself with higher privileges, somehow transmit itself to other machines / the internet at large / infecting people’s brains snow-crash style”, and others read it as “convince the people running the test to give it more resources (and maybe infect their brains snow-crash style)”.
The former can be prevented by having a secure (air gapped?) system, the latter can be prevented by not running tests interactively and ignoring the moral issues with terminating (or suspending) what may possibly be an intelligent ‘person’.
It also implicitly assumes that its ability to improve its own intelligence (and therefore gain the ability to do either of the above) is unbounded by the resources of the system and will have no cost in terms of increased processing time.
I will note that the AI box experiment’s conditions expressly forbid a secure environment [i.e. one with inspection tools that cannot be manipulated by the AI]:
Because that’s not the part of the AI safety question that the AI box experiment is designed to test, so for the purpose of the experiment it says, “sure you might catch the AI in a lie, but assuming you don’t—”
Environment means both hardware and people—anything the AI has a chance to influence. We could use a narrower definition, but why should the AI respect it? By limiting our map we don’t limit the territory.
When the AI gets much smarter than humans, we may not understand the output of our inspection tools. They will give us huge amounts of data, and we will be unable to decipher what it all means.
Imagine a group of monkeys trying to enslave a human in a cave. Monkeys bring some objects from the jungle to the human and make him produce better food and toys for them (we want the AI to do some real-life optimization, otherwise it’s just money wasted on academic exercises). Monkeys understand that human getting closer to the entrance is trying to escape, and will threaten to kill him if he tries. But they don’t see the danger of human quietly sitting at the back of the cave, constructing a machine gun from the spare parts.