Your AI doesn’t figure out how to do a reasonable “values handshake” with a competitor (where two agents agree to both pursue some appropriate compromise values in order to be Pareto efficient)...
I think it refers to something like this: Imagine that a superintelligent human-friendly AI meets a superintelligent paperclip maximizer, and they both realize their powers are approximately balanced. What should they do?
For humans, “let’s fight, and to the victor go the spoils” is the intuitive answer, but the superintelligences can possibly do better. If they fight, they have a 50% chance of achieving nothing, and a 50% chance of winning the universe… minus whatever was sacrificed to Moloch, which could possibly be a lot. If they split the universe to halves, and find out a way how to trust each other, that is better that war. But there is a possibility of even better solution, when both of them would agree on acting as if they were a single superintelligence that values both humans and paperclips equally.
The cooperative solution can be better than a 50% split of the universe, because you could build paperclip factories on places humans care less about, such as uninhabitable planets; or perhaps you could find a way how to introduce paperclips to human environment without reducing the human quality of life. For example, would you mind using paperclips to reinforce the walls of your house? Would you mind if almost all materials used to build stuff for humans contained little paperclips inside? Would you mind living in a simulation implemented on paperclip-shaped circuits? So maybe at the end, humans could get like 70% of the potential utility of the universe, while 70% of potential material would be converted to paperclips.
I found a reference to “value handshakes” here:
I think it refers to something like this: Imagine that a superintelligent human-friendly AI meets a superintelligent paperclip maximizer, and they both realize their powers are approximately balanced. What should they do?
For humans, “let’s fight, and to the victor go the spoils” is the intuitive answer, but the superintelligences can possibly do better. If they fight, they have a 50% chance of achieving nothing, and a 50% chance of winning the universe… minus whatever was sacrificed to Moloch, which could possibly be a lot. If they split the universe to halves, and find out a way how to trust each other, that is better that war. But there is a possibility of even better solution, when both of them would agree on acting as if they were a single superintelligence that values both humans and paperclips equally.
The cooperative solution can be better than a 50% split of the universe, because you could build paperclip factories on places humans care less about, such as uninhabitable planets; or perhaps you could find a way how to introduce paperclips to human environment without reducing the human quality of life. For example, would you mind using paperclips to reinforce the walls of your house? Would you mind if almost all materials used to build stuff for humans contained little paperclips inside? Would you mind living in a simulation implemented on paperclip-shaped circuits? So maybe at the end, humans could get like 70% of the potential utility of the universe, while 70% of potential material would be converted to paperclips.