Presumably at least some humans are considered proved friendly (or else what is the point of having a human decide whether to unbox an AI?), and I’m pretty sure humans can trivially be tricked like that.
It is inconcievable to me that a UAI could not run a simulation of a less-intelligent FAI and use that simulation to determine its responses. The FAI the UAI doesn’t have to be provably FAI, the entire point of this boxing test seems to be to try to determine whether the FAI is F or not by questioning it from the outside.
I don’t think this is true; provable friendliness has to happen at the design level, not by post-facto inspection after turning the thing on.
If it was true, then that would be a bit of a stab in the gut to the idea of provable FAI, wouldn’t it? We’ve already got proofs about Universal Turing Machines (e.g. any computable FAI designed to run on specific hardware interfacing with the universe could also be run to produce exactly the same output on an emulator of that hardware interfacing with a simulated universe) that I don’t think are going to be overturned.
Man-in-the-middle attacks are hard enough to avoid for adult humans with out-of-channel communication avenues. They may be theoretically impossible to avoid for an FAI that’s just been “born”.
In theory the humans in this scenario have all the tactical advantages, being able to directly inspect the running code inside the box.. but in practice I doubt obfuscating its FAI-in-a-virtual-box-in-a-UFAI-in-a-box would be hard for our hypothetical UFAI-in-a-box.
A truly FAI cannot be manipulated like that.
Presumably at least some humans are considered proved friendly (or else what is the point of having a human decide whether to unbox an AI?), and I’m pretty sure humans can trivially be tricked like that.
It is inconcievable to me that a UAI could not run a simulation of a less-intelligent FAI and use that simulation to determine its responses. The FAI the UAI doesn’t have to be provably FAI, the entire point of this boxing test seems to be to try to determine whether the FAI is F or not by questioning it from the outside.
FAI on the outside, EAI on the inside. Humans won’t be able to tell the difference.
No, but a real FAI must be able to. Otherwise it would not be provably friendly.
I don’t think this is true; provable friendliness has to happen at the design level, not by post-facto inspection after turning the thing on.
If it was true, then that would be a bit of a stab in the gut to the idea of provable FAI, wouldn’t it? We’ve already got proofs about Universal Turing Machines (e.g. any computable FAI designed to run on specific hardware interfacing with the universe could also be run to produce exactly the same output on an emulator of that hardware interfacing with a simulated universe) that I don’t think are going to be overturned.
Man-in-the-middle attacks are hard enough to avoid for adult humans with out-of-channel communication avenues. They may be theoretically impossible to avoid for an FAI that’s just been “born”.
In theory the humans in this scenario have all the tactical advantages, being able to directly inspect the running code inside the box.. but in practice I doubt obfuscating its FAI-in-a-virtual-box-in-a-UFAI-in-a-box would be hard for our hypothetical UFAI-in-a-box.