Does this assume that they would be protected from any consequences of messing the Friendliness up and building a UFAI by accident?
Since, at present, the only criterion for judging FAI/UFAI is whether you disagree with the moral evaluations the AI makes, this is even more problematic than you think.
Assuming the AI is canny enough to avoid saying things that will offend your moral sensibilities, there is absolutely no way to determine whether it’s F or UF without letting it out and permitting it to act. If we accept Eliezer’s contentions about the implications of an AI being ‘UF’, no reasonable person would let the AI out. The AI would have to hack the person’s physiology (inducing a seizure, maybe)… or the person would have to be unreasonable.
Given the vast quantities of unreasonable people, that seems the most likely reason someone would fail as a Gatekeeper.
Assuming the AI is canny enough to avoid saying things that will offend your moral sensibilities, there is absolutely no way to determine whether it’s F or UF without letting it out and permitting it to act. If we accept Eliezer’s contentions about the implications of an AI being ‘UF’, no reasonable person would let the AI out. The AI would have to hack the person’s physiology (inducing a seizure, maybe)… or the person would have to be unreasonable.
Given the vast quantities of unreasonable people, that seems the most likely reason someone would fail as a Gatekeeper.