Pentashagon comments on The Need for Human Friendliness

Pentashagon 7 Mar 2013 23:08 UTC
−1 points
One possibility to prevent the god-emperor scenario is for multiple teams to simultaneously implement and turn on their own best efforts at FAI. All the teams should check all the other teams’ FAI candidate, and nothing should be turned on until all the teams think it’s safe. The first thing the new FAIs should do is compare their goals with each other and terminate all instances immediately if it looks like there are any incompatible goals.

One weakness is that most teams might blindly accept the most competent team’s submission, especially if that team is vastly more competent. Breaking a competent team up may reduce that risk but would also reduce the likelihood of successful FAI. Another weakness is that perhaps multiple teams implementing an FAI will always produce slightly different goals that will cause immediate termination of the FAI instances. There is always the increasing risk over time of a third party (or one of the FAI teams accidentally) turning on uFAI, too.