In this document from 2004 Yudkowsky describes a safeguard to be added “on top of” programming Friendliness, a Last Judge. The idea is that the FAI’s goal is initially only to compute what an FAI should do. Then the Last Judge looks at the FAI’s report, and decides whether or not to switch the AI’s goal system to implement the described world. The document should not be taken as representative of Yudkosky’s current views, because it’s been marked obsolete, but I favor the idea of having a Last Judge check to make sure before anybody hits the red button.
In this document from 2004 Yudkowsky describes a safeguard to be added “on top of” programming Friendliness, a Last Judge. The idea is that the FAI’s goal is initially only to compute what an FAI should do. Then the Last Judge looks at the FAI’s report, and decides whether or not to switch the AI’s goal system to implement the described world. The document should not be taken as representative of Yudkosky’s current views, because it’s been marked obsolete, but I favor the idea of having a Last Judge check to make sure before anybody hits the red button.