Even if A is FAI and B is a paperclipper, as long as both use correct decision theory, they will instantly merge into a new SI with a combined utility function. Avoiding arms races and any other kind of waste (including waste due to being separate SIs) is in their mutual interest. I don’t expect rational agents to fail achieving mutual interest. If you expect that, your idea of rationality leads to predictably suboptimal utility, so it shouldn’t be called “rationality”. That’s covered in the sequences.
I could imagine some failure modes, but surely I can’t imagine the best one. For example, “both original AIs shut down” simultaneously is vulnerable for defecting.
I also have some busyness experience, and I found that almost every deal includes some cheating, and the cheating is everytime something new. So I always have to ask myself - where is the cheating from the other side? If don’t see it, it’s bad, as it could be something really unexpected. Personally, I hate cheating.
So if the price of turning off paperclip is Y, if Y is higher than X/2 , we should cooperate?
But if we agree on this, we create for the papercliper an incentive to increase Y, until it reaches X/2. To increase Y, papercliper has to invest in defense mechanisms or offensive weapons. It creates arms race, until negotiations become more profitable. However, arms race is risky and could turn into war.
The paperclipper doesn’t need to invest anything. The AIs will just merge without any arms race or war. The possibility of an arms race or war, and its full predicted cost to both sides, will be taken into account during barganing instead. For example, if the paperclipper has a button that can nuke half of our utility, the merged AI will prioritize paperclips more.
So they meet before the possible start of the arms race and compare each other relative advantages? I still think that they may try to demonstrate higher barging power than they actually have and that it is almost impossible for us to predict how their game will play because of its complexity.
Thanks for participating in this interesting conversation.
Yeah, bargaining between AIs is a very hard problem and we know almost nothing about it. It will probably have all sorts of deception tactics. But in any case, using bargaining instead of war is still in both AI’s common interest, and AIs should be able to achieve common interest.
For example, if A has hidden information that will give it an advantage in war, then B can precommit to giving A more share conditional on seeing it (e.g. by constructing a successor AI that visibly includes the precommitment under A’s watch). Eventually the AIs should agree on all questions of fact and disagree only on values, at which point they agree on how the war will likely go, so they skip the war and share the bigger pie according to the war’s predicted outcome.
BTW, the book “On thermonuclear war” by Kahn is exactly an attempt to predict the ways of war, negotiations and barging between two presumably rational agents (superpowers). Even an idea to move all resources to new third agent is discussed, as I remember—that is donating all nukes to UN.
How B could see that A has hidden information?
Personally, I feel like you have a mathematically correct, but idealistic and unrealistic model of relations between two perfect agents.
Yeah, Schelling’s “Strategy of Conflict” deals with many of the same topics.
A: “I would have an advantage in war so I demand a bigger share now” B: “Prove it” A: “Giving you the info would squander my advantage” B: “Let’s agree on a procedure to check the info, and I precommit to giving you a bigger share if the check succeeds” A: “Cool”
If visible precommitment by B requires it to share the source code for its successor AI, then it would also be giving up any hidden information it has. Essentially both sides have to be willing to share all information with each other, creating some sort of neutral arbitration about which side would have won and at what cost to the other. That basically means creating a merged superintelligence is necessary just to start the bargaining process, since they each have to prove to the other that the neutral arbiter will control all relevant resources to prevent cheating.
Realistically, there will be many cases where one side thinks its hidden information is sufficient to make the cost of conflict smaller than the costs associated with bargaining, especially given the potential for cheating.
A: “I would have an advantage in war so I demand a bigger share now” B: “Prove it” A: “Giving you the info would squander my advantage” B: “Let’s agree on a procedure to check the info, and I precommit to giving you a bigger share if the check succeeds” A: “Cool”
Simply by telling B about the existence of an advantage A is giving B info that could weaken it. Also, what if the advantage is a way to partially cheat in precommitments?
Even if A is FAI and B is a paperclipper, as long as both use correct decision theory, they will instantly merge into a new SI with a combined utility function.
What combined utility function? There is no way to combine utility functions.
Even if A is FAI and B is a paperclipper, as long as both use correct decision theory, they will instantly merge into a new SI with a combined utility function. Avoiding arms races and any other kind of waste (including waste due to being separate SIs) is in their mutual interest. I don’t expect rational agents to fail achieving mutual interest. If you expect that, your idea of rationality leads to predictably suboptimal utility, so it shouldn’t be called “rationality”. That’s covered in the sequences.
But how I could be sure that paperclip maximiser is a rational agent with correct decision theory? I would not expect it from the papercliper.
If an agent is irrational, it can cause all sorts of waste. I was talking about sufficiently rational agents.
If the problem is proving rationality to another agent, SI will find a way.
My point is exactly this. If SI is able to prove its rationality (meaning that it is always cooperating in PD etc.), it also able fake any such proof.
If you have two options: to turn off papercliper, or to cooperate with it by giving it half of the universe, what would you do?
I imagine merging like this:
1) Bargain about a design for a joint AI, using any means of communication
2) Build it in a location monitored by both parties
3) Gradually transfer all resources to the new AI
4) Both original AIs shut down, new AI fulfills their combined goals
No proof of rationality required. You can design the process so that any deviation will help the opposing side.
I could imagine some failure modes, but surely I can’t imagine the best one. For example, “both original AIs shut down” simultaneously is vulnerable for defecting.
I also have some busyness experience, and I found that almost every deal includes some cheating, and the cheating is everytime something new. So I always have to ask myself - where is the cheating from the other side? If don’t see it, it’s bad, as it could be something really unexpected. Personally, I hate cheating.
An AI could devise a very secure merging process. We don’t have to code it ourselves.
But should we merge with papercliper if we could turn it off?
It reminds me Great Britain policy towards Hitler before WW2, which suggested to give him what he wants to prevent the war. https://en.wikipedia.org/wiki/Appeasement
If we can turn off the paperclipper for free, sure. But if war would destroy X resources, it’s better to merge and spend X/2 on paperclips.
So if the price of turning off paperclip is Y, if Y is higher than X/2 , we should cooperate?
But if we agree on this, we create for the papercliper an incentive to increase Y, until it reaches X/2. To increase Y, papercliper has to invest in defense mechanisms or offensive weapons. It creates arms race, until negotiations become more profitable. However, arms race is risky and could turn into war.
Edited: higher.
The paperclipper doesn’t need to invest anything. The AIs will just merge without any arms race or war. The possibility of an arms race or war, and its full predicted cost to both sides, will be taken into account during barganing instead. For example, if the paperclipper has a button that can nuke half of our utility, the merged AI will prioritize paperclips more.
So they meet before the possible start of the arms race and compare each other relative advantages? I still think that they may try to demonstrate higher barging power than they actually have and that it is almost impossible for us to predict how their game will play because of its complexity.
Thanks for participating in this interesting conversation.
Yeah, bargaining between AIs is a very hard problem and we know almost nothing about it. It will probably have all sorts of deception tactics. But in any case, using bargaining instead of war is still in both AI’s common interest, and AIs should be able to achieve common interest.
For example, if A has hidden information that will give it an advantage in war, then B can precommit to giving A more share conditional on seeing it (e.g. by constructing a successor AI that visibly includes the precommitment under A’s watch). Eventually the AIs should agree on all questions of fact and disagree only on values, at which point they agree on how the war will likely go, so they skip the war and share the bigger pie according to the war’s predicted outcome.
BTW, the book “On thermonuclear war” by Kahn is exactly an attempt to predict the ways of war, negotiations and barging between two presumably rational agents (superpowers). Even an idea to move all resources to new third agent is discussed, as I remember—that is donating all nukes to UN.
How B could see that A has hidden information?
Personally, I feel like you have a mathematically correct, but idealistic and unrealistic model of relations between two perfect agents.
Yeah, Schelling’s “Strategy of Conflict” deals with many of the same topics.
A: “I would have an advantage in war so I demand a bigger share now” B: “Prove it” A: “Giving you the info would squander my advantage” B: “Let’s agree on a procedure to check the info, and I precommit to giving you a bigger share if the check succeeds” A: “Cool”
If visible precommitment by B requires it to share the source code for its successor AI, then it would also be giving up any hidden information it has. Essentially both sides have to be willing to share all information with each other, creating some sort of neutral arbitration about which side would have won and at what cost to the other. That basically means creating a merged superintelligence is necessary just to start the bargaining process, since they each have to prove to the other that the neutral arbiter will control all relevant resources to prevent cheating.
Realistically, there will be many cases where one side thinks its hidden information is sufficient to make the cost of conflict smaller than the costs associated with bargaining, especially given the potential for cheating.
Simply by telling B about the existence of an advantage A is giving B info that could weaken it. Also, what if the advantage is a way to partially cheat in precommitments?
I think there are two other failure modes, which need to be a resolved:
A weaker side is making negotiation longer if it helps it to gain power
A weaker side could fake the size of its army (Like North Korea did with its wooden missiles on last military show)
What combined utility function? There is no way to combine utility functions.
Weighted sum, with weights determined by bargaining.