Presumably at least some humans are considered proved friendly (or else what is the point of having a human decide whether to unbox an AI?), and I’m pretty sure humans can trivially be tricked like that.
It is inconcievable to me that a UAI could not run a simulation of a less-intelligent FAI and use that simulation to determine its responses. The FAI the UAI doesn’t have to be provably FAI, the entire point of this boxing test seems to be to try to determine whether the FAI is F or not by questioning it from the outside.
Presumably at least some humans are considered proved friendly (or else what is the point of having a human decide whether to unbox an AI?), and I’m pretty sure humans can trivially be tricked like that.
It is inconcievable to me that a UAI could not run a simulation of a less-intelligent FAI and use that simulation to determine its responses. The FAI the UAI doesn’t have to be provably FAI, the entire point of this boxing test seems to be to try to determine whether the FAI is F or not by questioning it from the outside.