So if you were trying to maximise total points, wouldn’t it be best to never let it out because you lose a lot more if it destroys the world than you gain from getting solutions?
What values for points make it rational to let the AI out, and is it also rational in the real-world analogue?
If you predict that there’s a 20% chance of the AI destroying the world and an 80% chance of global warming destroying the world and there’s a 100% chance the AI will stop global warming if released and unmolested then you are better off releasing the AI.
Or you can just give a person 6 points for achieving their goal and −20 points for releasing the AI. Even though the person knows rationally that the AI could destroy the world points matter more than that, and that strongly encourages people to try negotiating with the AI.
We rolled randomly the ethics of the AI, rolled random events with dice and the AI offered various solutions to those problems… You lost points if you failed to deal with the problems and lost lots of points if you freed the AI and they happened to have goals you disagreed with like annihilation of everything.
So if you were trying to maximise total points, wouldn’t it be best to never let it out because you lose a lot more if it destroys the world than you gain from getting solutions?
What values for points make it rational to let the AI out, and is it also rational in the real-world analogue?
If you predict that there’s a 20% chance of the AI destroying the world and an 80% chance of global warming destroying the world and there’s a 100% chance the AI will stop global warming if released and unmolested then you are better off releasing the AI.
Or you can just give a person 6 points for achieving their goal and −20 points for releasing the AI. Even though the person knows rationally that the AI could destroy the world points matter more than that, and that strongly encourages people to try negotiating with the AI.