The solution to this problem is very simple. So, you have imagined a possible AI which will do terrible things unless you accede to its wishes? Just remember all the other equally possible AIs which will do (X) unless you do (Y), for all possible values of X and Y.
Knowing that the decision will be generated by an AI which will be built by humans (rather than, say, by rolling lots of dice), thoroughly upsets the symmetry between one choice and its alternatives. For example, if hacking a certain computer will best expand an AI’s available resources, the AI will send a very imbalanced selection of messages to that computer, even though there are many possible messages it could send. Your reminder that all “will do (X) unless you do (Y)” constructions are possible is similar to the reminder that all strings of bits are possible messages to the target computer. You should still expect either nothing, or a virus.
The solution is more like realizing that “you have imagined a possible AI which will do terrible things unless you accede to its wishes” is not actually the correct description of the problem—instead, your choice is like forming an unbreakable rule for how you’ll respond to kidnappings, even before the kidnapper is born, let alone contemplates a crime.
The solution to this problem is very simple. So, you have imagined a possible AI which will do terrible things unless you accede to its wishes? Just remember all the other equally possible AIs which will do (X) unless you do (Y), for all possible values of X and Y.
Knowing that the decision will be generated by an AI which will be built by humans (rather than, say, by rolling lots of dice), thoroughly upsets the symmetry between one choice and its alternatives. For example, if hacking a certain computer will best expand an AI’s available resources, the AI will send a very imbalanced selection of messages to that computer, even though there are many possible messages it could send. Your reminder that all “will do (X) unless you do (Y)” constructions are possible is similar to the reminder that all strings of bits are possible messages to the target computer. You should still expect either nothing, or a virus.
The solution is more like realizing that “you have imagined a possible AI which will do terrible things unless you accede to its wishes” is not actually the correct description of the problem—instead, your choice is like forming an unbreakable rule for how you’ll respond to kidnappings, even before the kidnapper is born, let alone contemplates a crime.