Maybe a more scary question isn’t whether we can stop our AIs from blackmailing us, but whether we want to. If the AI has an opportunity to blackmail Alice for a dollar to save Bob from some suffering, do we want the AI to do that, or let Bob suffer? Eliezer seems to think that we obviously don’t want our FAI to use certain tactics, but I’m not sure why he thinks that.
By that term I simply mean Eliezer’s idea that the correct decision theory ought to use a maximization vantage points with a no-blackmail equilibrium.
Maybe a more scary question isn’t whether we can stop our AIs from blackmailing us, but whether we want to. If the AI has an opportunity to blackmail Alice for a dollar to save Bob from some suffering, do we want the AI to do that, or let Bob suffer? Eliezer seems to think that we obviously don’t want our FAI to use certain tactics, but I’m not sure why he thinks that.