Maybe a more scary question isn’t whether we can stop our AIs from blackmailing us, but whether we want to. If the AI has an opportunity to blackmail Alice for a dollar to save Bob from some suffering, do we want the AI to do that, or let Bob suffer? Eliezer seems to think that we obviously don’t want our FAI to use certain tactics, but I’m not sure why he thinks that.
Maybe a more scary question isn’t whether we can stop our AIs from blackmailing us, but whether we want to. If the AI has an opportunity to blackmail Alice for a dollar to save Bob from some suffering, do we want the AI to do that, or let Bob suffer? Eliezer seems to think that we obviously don’t want our FAI to use certain tactics, but I’m not sure why he thinks that.