Consider Alice, the mad computer scientist. Alice has just solved general artifical intelligence and the alignment problem. On her computer she has two files, each containing a seed for a superintelligent AI, one of them is aligned with human values, the other one is a paperclip maximizer. The two AIs only differ in their goals/values, the rest of the algorithms, including decision procedures, are identical.
Alice decides to flipp a coin. If the coin comes up heads, she starts the friendly AI, and if it comes up tails, she starts the paperclip maximizer.
The coin comes up heads. Alice starts the friendly AI, and everyone rejoice. Some years later the friendly AI learns about the coinflip and of the paperclip maximizer.
Should the friendly AI counterfactually cooperate with the paperclip maximizer?
What does various decision theories say in this situation?
The Mad Scientist Decision Problem
Consider Alice, the mad computer scientist. Alice has just solved general artifical intelligence and the alignment problem. On her computer she has two files, each containing a seed for a superintelligent AI, one of them is aligned with human values, the other one is a paperclip maximizer. The two AIs only differ in their goals/values, the rest of the algorithms, including decision procedures, are identical.
Alice decides to flipp a coin. If the coin comes up heads, she starts the friendly AI, and if it comes up tails, she starts the paperclip maximizer.
The coin comes up heads. Alice starts the friendly AI, and everyone rejoice. Some years later the friendly AI learns about the coinflip and of the paperclip maximizer.
Should the friendly AI counterfactually cooperate with the paperclip maximizer?
What does various decision theories say in this situation?
What do you think is the correct answer?