I think there is a difference between creating an agent and negotiating with another agent. If agent 1 creates an agent 2, it will always know for sure its goal function.
However, if two agents meet, and agent A says to agent B that it has utility function U, and even if it sends its source code as proof, agent A doesn’t have reasons to believe it. Any source code could be faked. The more advanced are both agents, the more difficult for them to prove its values to each other. So they will be always in suspicion that another side is cheating.
As a result, as I once said once too strongly: Any two sufficiently advanced agents will go to war with each other. The one exception is if they are two instances of the same source code, but even in this case cheating is possible.
To prevent cheating is better to destroy the second agent (unfortunately). What are the solutions for this problem in LW research?
Note that source code can’t be faked in the self modification case. Software agent A can set up a test environment (a virtual machine or simulated universe), create new agent B inside that, and then A has a very detailed and accurate view of B’s innards.
However, logical uncertainty is still an obstacle, especially with agents not verified by theorem-proving.
I don’t believe it. War wastes resources. The only reason war happens is because two agents have different beliefs about the likely outcome of war, which means at least one of them has wrong and self-harming beliefs. Sufficiently rational agents will never go to war, instead they’ll agree about the likely outcome of war, and trade resources in that proportion. Maybe you can’t think of a way to set up such trade, because emails can be faked etc, but I believe that superintelligences will find a way to achieve their mutual interest. That’s one reason why I’m interested in AI cooperation and bargaining.
Fearon’s paper concludes that pretty much only two mechanisms can explain “why rationally led states” would go to war instead of striking a peaceful bargain: private information, and commitment problems.
Your comment brushes off commitment problems in the case of superintelligences, which might turn out to be right. (It’s not clear to me that superintelligence entails commitment ability, but nor is it clear that it doesn’t entail commitment ability.) I’m less comfortable with setting aside the issue of private information, though.
Assuming rational choice, competing agents are only going to truthfully share information if they have incentives to do so, or at least no incentive not to do so, but in cases where war is a real possibility, I’d expect the incentives to actively encourage secrecy: exaggerating war-making power and/or resolve could allow an agent to drive a harder potential bargain.
You suggest that the ability to precommit could guarantee information sharing, but I feel unease about assuming that without a systematic argument or model. Did Schelling or anybody else formally analyze how that would work? My gut has the sinking feeling that drawing up the implied extensive-form game and solving for equilibrium would produce a non-zero probability of non-commitment, imperfect information exchange, and conflict.
Finally I’ll bring in a new point: Fearon’s analysis explicitly relies on assuming unitary states. In practice, though, states are multipartite, and if the war-choosing bit of the state can grab most of the benefits from a potential war, while dumping most of the potential costs on another bit of the state, that can enable war. I expect something analogous could produce war between superintelligences, as I don’t see why superintelligences have to be unitary agents.
That’s a good question and I’m not sure my thinking is right. Let’s say two AIs want to go to war for whatever reason. Then they can agree to some other procedure that predicts the outcome of war (e.g. war in 1% of the universe, or simulated war) and precommit to accept it as binding. It seems like both would benefit from that.
That said I agree that bargaining is very tricky. Coming up with an extensive form game might not help, because what if the AIs use a different extensive form game? There’s been pretty much no progress on this for a decade, I don’t see any viable attack.
Let’s say two AIs want to go to war for whatever reason. Then they can agree to some other procedure that predicts the outcome of war (e.g. war in 1% of the universe, or simulated war) and precommit to accept the outcome as binding. It seems like both would benefit from that.
My (amateur!) hunch is that an information deficit bad enough to motivate agents to sometimes fight instead of bargain might be an information deficit bad enough to motivate agents to sometimes fight instead of precommitting to exchange info and then bargain.
Coming up with an extensive form game might not help, because what if the AIs use a different extensive form game?
Certainly, any formal model is going to be an oversimplification, but models can be useful checks on intuitive hunches like mine. If I spent a long time formalizing different toy games to try to represent the situation we’re talking about, and I found that none of my games had (a positive probability of) war as an equilibrium strategy, I’d have good evidence that your view was more correct than mine.
There’s been pretty much no progress on this in a decade, I don’t see any viable attack.
There might be some analogous results in the post-Fearon, rational-choice political science literature, I don’t know it well enough to say. And even if not, it might be possible to build a relevant game incrementally.
Start with a take-it-or-leave-it game. Nature samples a player’s cost of war from some distribution and reveals it only to that player. (Or, alternatively, Nature randomly assigns a discrete, privately known type to a player, where the type reflects the player’s cost of war.) That player then chooses between (1) initiating a bargaining sub-game and (2) issuing a demand to the other player, triggering war if the demand is rejected. This should be tractable, since standard, solvable models exist for two-player bargaining.
So far we have private information, but no precommitment. But we could bring precommitment in by adding extra moves to the game: before making the bargain-or-demand choice, players can mutually agree to some information-revealing procedure followed by bargaining with the newly revealed information in hand. Solving this expanded game could be informative.
Maybe you can’t think of a way to set up such trade, because emails can be faked etc, but I believe that superintelligences will find a way to achieve their mutual interest.
They’ll also find ways of faking whatever communication methods are being used.
I think you need verifiable pre-commitment, not just communication—in a free-market economy, enforced property rights basically function as such a pre-commitment mechanism. Where pre-commitment (including property right enforcement) is imperfect, only a constrained optimum can be reached, since any counterparty has to assume ex-ante that the agent will exploit the lack of precommitment. Imperfect information disclosure has similar effects, however in that case one has to “assume the worst” about what information the agent has; the deal must be altered accordingly, and this generally comes at a cost in efficiency.
Yeah, I would agree with that. My bar for “sufficiently rational” is quite high though, closer to the mathematical ideal of rationality than to humans. (For example, sufficiently rational agents should be able to precommit.)
I think sharing all information is doable. As for priors, there’s a beautiful LW trick called “probability as caring” which can almost always make priors identical. For example, before flipping a coin I can say that all good things in life will be worth 9x more to me in case of heads than tails. That’s purely a utility function transformation which doesn’t touch the prior, but for all decision-making purposes it’s equivalent to changing my prior about the coin to 90⁄10 and leaving the utility function intact. That handles all worlds except those that have zero probability according to one of the AIs. But in such worlds it’s fine to just give the other AI all the utility.
I think the idea that if one AI says there is a 50% chance of heads, and the other AI says there is a 90% chance of heads, the first AI can describe the second AI as knowing that there is a 50% chance, but caring more about the heads outcome. Since it can redescribe the other’s probabilities as matching its own, agreement on what should be done will be possible. None of this means that anyone actually decides that something will be worth more to them in the case of heads.
the first AI can describe the second AI as knowing that there is a 50% chance, but caring more about the heads outcome.
First of all this makes any sense solely in the decision-taking context (and not in the forecast-the-future context). So this is not about what will actually happen but about comparing the utilities of two outcomes. You can, indeed, rescale the utility involved in a simple case, but I suspect that once you get to interdependencies and non-linear consequences things will get more hairy, if possible at all.
Besides, this requires you to know the utility function in question.
While war is irrational, demonstrative behaviour like arms race may be needed to discourage another side from war.
Imagine that two benevolent superintelligence appear. However, SI A suspects that SI B is a paperclip maximizer. In that case, it is afraid that SI B may turn off SI A. Thus it demonstratively invests some resources in protecting its power source, so it would be expensive for the SI B to try to turn off SI A.
This starts the arms race, but the race is unstable and could result in war.
Even if A is FAI and B is a paperclipper, as long as both use correct decision theory, they will instantly merge into a new SI with a combined utility function. Avoiding arms races and any other kind of waste (including waste due to being separate SIs) is in their mutual interest. I don’t expect rational agents to fail achieving mutual interest. If you expect that, your idea of rationality leads to predictably suboptimal utility, so it shouldn’t be called “rationality”. That’s covered in the sequences.
I could imagine some failure modes, but surely I can’t imagine the best one. For example, “both original AIs shut down” simultaneously is vulnerable for defecting.
I also have some busyness experience, and I found that almost every deal includes some cheating, and the cheating is everytime something new. So I always have to ask myself - where is the cheating from the other side? If don’t see it, it’s bad, as it could be something really unexpected. Personally, I hate cheating.
So if the price of turning off paperclip is Y, if Y is higher than X/2 , we should cooperate?
But if we agree on this, we create for the papercliper an incentive to increase Y, until it reaches X/2. To increase Y, papercliper has to invest in defense mechanisms or offensive weapons. It creates arms race, until negotiations become more profitable. However, arms race is risky and could turn into war.
The paperclipper doesn’t need to invest anything. The AIs will just merge without any arms race or war. The possibility of an arms race or war, and its full predicted cost to both sides, will be taken into account during barganing instead. For example, if the paperclipper has a button that can nuke half of our utility, the merged AI will prioritize paperclips more.
So they meet before the possible start of the arms race and compare each other relative advantages? I still think that they may try to demonstrate higher barging power than they actually have and that it is almost impossible for us to predict how their game will play because of its complexity.
Thanks for participating in this interesting conversation.
Yeah, bargaining between AIs is a very hard problem and we know almost nothing about it. It will probably have all sorts of deception tactics. But in any case, using bargaining instead of war is still in both AI’s common interest, and AIs should be able to achieve common interest.
For example, if A has hidden information that will give it an advantage in war, then B can precommit to giving A more share conditional on seeing it (e.g. by constructing a successor AI that visibly includes the precommitment under A’s watch). Eventually the AIs should agree on all questions of fact and disagree only on values, at which point they agree on how the war will likely go, so they skip the war and share the bigger pie according to the war’s predicted outcome.
BTW, the book “On thermonuclear war” by Kahn is exactly an attempt to predict the ways of war, negotiations and barging between two presumably rational agents (superpowers). Even an idea to move all resources to new third agent is discussed, as I remember—that is donating all nukes to UN.
How B could see that A has hidden information?
Personally, I feel like you have a mathematically correct, but idealistic and unrealistic model of relations between two perfect agents.
Yeah, Schelling’s “Strategy of Conflict” deals with many of the same topics.
A: “I would have an advantage in war so I demand a bigger share now” B: “Prove it” A: “Giving you the info would squander my advantage” B: “Let’s agree on a procedure to check the info, and I precommit to giving you a bigger share if the check succeeds” A: “Cool”
If visible precommitment by B requires it to share the source code for its successor AI, then it would also be giving up any hidden information it has. Essentially both sides have to be willing to share all information with each other, creating some sort of neutral arbitration about which side would have won and at what cost to the other. That basically means creating a merged superintelligence is necessary just to start the bargaining process, since they each have to prove to the other that the neutral arbiter will control all relevant resources to prevent cheating.
Realistically, there will be many cases where one side thinks its hidden information is sufficient to make the cost of conflict smaller than the costs associated with bargaining, especially given the potential for cheating.
A: “I would have an advantage in war so I demand a bigger share now” B: “Prove it” A: “Giving you the info would squander my advantage” B: “Let’s agree on a procedure to check the info, and I precommit to giving you a bigger share if the check succeeds” A: “Cool”
Simply by telling B about the existence of an advantage A is giving B info that could weaken it. Also, what if the advantage is a way to partially cheat in precommitments?
Even if A is FAI and B is a paperclipper, as long as both use correct decision theory, they will instantly merge into a new SI with a combined utility function.
What combined utility function? There is no way to combine utility functions.
If agent 1 creates an agent 2, it will always know for sure its goal function.
Wait, we have only examples of the opposite. Every human who creates another human ha at best a hazy understanding of that new human’s goal function. As soon as agent 2 has any unobserved experiences or self-modification, it’s a distinct separate agent.
Any two sufficiently advanced agents will go to war with each other
True with a wide enough definition of “go to war”. Instead say “compete for resources” and you’re solid. Note that competition may include cooperation (against mutual “enemies” or against nature), trade, and even altruism or charity (especially where the altruistic agent perceives some similarity with the recipient, and it becomes similar to cooperation against nature).
I think that’s a pretty binary (and useless) definition. There have been almost no wars that didn’t end until one of the participating groups was completely eliminated. There have been conflicts and competition among groups that did have that effect, but we don’t call them “war” in most cases.
Open, obvious, direct violent conflict is a risky way to attain most goals, even those that are in conflict with some other agent. Rational agents would generally prefer to kill them off by peaceful means.
There is a more sophisticated definition of war, coming from Clausewitz, which on contemporary language could be said something like that “the war is changing the will of your opponent without negotiation”. The enemy must unconditionaly capitualte, and give up its value system.
You could do it by threat, torture, rewriting of the goal system or deleting the agent.
Does the agent care about changing the will of the “opponent”, or just changing behavior (in my view of intelligence, there’s not much distinction, but that’s not the common approach)? If you care mostly about future behavior rather than internal state, then the “without negotiation” element become meaningless and you’re well on your way toward accepting that “competition” is a more accurate frame than “war”.
If agent 1 creates an agent 2, it will always know for sure its goal function.
That is the point, though. By Loeb’s theorem, the only agents that are knowable for sure are those with less power. So an agent might want to create a successor that isn’t fully knowable in advance, or, on the other hand, if a perfectly knowable successor could be constructed, then you would have a finite method to ensure the compatibility of two source codes (is this true? It seems plausible).
I think there is a difference between creating an agent and negotiating with another agent. If agent 1 creates an agent 2, it will always know for sure its goal function.
However, if two agents meet, and agent A says to agent B that it has utility function U, and even if it sends its source code as proof, agent A doesn’t have reasons to believe it. Any source code could be faked. The more advanced are both agents, the more difficult for them to prove its values to each other. So they will be always in suspicion that another side is cheating.
As a result, as I once said once too strongly: Any two sufficiently advanced agents will go to war with each other. The one exception is if they are two instances of the same source code, but even in this case cheating is possible.
To prevent cheating is better to destroy the second agent (unfortunately). What are the solutions for this problem in LW research?
Note that source code can’t be faked in the self modification case. Software agent A can set up a test environment (a virtual machine or simulated universe), create new agent B inside that, and then A has a very detailed and accurate view of B’s innards.
However, logical uncertainty is still an obstacle, especially with agents not verified by theorem-proving.
I don’t believe it. War wastes resources. The only reason war happens is because two agents have different beliefs about the likely outcome of war, which means at least one of them has wrong and self-harming beliefs. Sufficiently rational agents will never go to war, instead they’ll agree about the likely outcome of war, and trade resources in that proportion. Maybe you can’t think of a way to set up such trade, because emails can be faked etc, but I believe that superintelligences will find a way to achieve their mutual interest. That’s one reason why I’m interested in AI cooperation and bargaining.
I’m flashing back to reading Jim Fearon!
Fearon’s paper concludes that pretty much only two mechanisms can explain “why rationally led states” would go to war instead of striking a peaceful bargain: private information, and commitment problems.
Your comment brushes off commitment problems in the case of superintelligences, which might turn out to be right. (It’s not clear to me that superintelligence entails commitment ability, but nor is it clear that it doesn’t entail commitment ability.) I’m less comfortable with setting aside the issue of private information, though.
Assuming rational choice, competing agents are only going to truthfully share information if they have incentives to do so, or at least no incentive not to do so, but in cases where war is a real possibility, I’d expect the incentives to actively encourage secrecy: exaggerating war-making power and/or resolve could allow an agent to drive a harder potential bargain.
You suggest that the ability to precommit could guarantee information sharing, but I feel unease about assuming that without a systematic argument or model. Did Schelling or anybody else formally analyze how that would work? My gut has the sinking feeling that drawing up the implied extensive-form game and solving for equilibrium would produce a non-zero probability of non-commitment, imperfect information exchange, and conflict.
Finally I’ll bring in a new point: Fearon’s analysis explicitly relies on assuming unitary states. In practice, though, states are multipartite, and if the war-choosing bit of the state can grab most of the benefits from a potential war, while dumping most of the potential costs on another bit of the state, that can enable war. I expect something analogous could produce war between superintelligences, as I don’t see why superintelligences have to be unitary agents.
That’s a good question and I’m not sure my thinking is right. Let’s say two AIs want to go to war for whatever reason. Then they can agree to some other procedure that predicts the outcome of war (e.g. war in 1% of the universe, or simulated war) and precommit to accept it as binding. It seems like both would benefit from that.
That said I agree that bargaining is very tricky. Coming up with an extensive form game might not help, because what if the AIs use a different extensive form game? There’s been pretty much no progress on this for a decade, I don’t see any viable attack.
My (amateur!) hunch is that an information deficit bad enough to motivate agents to sometimes fight instead of bargain might be an information deficit bad enough to motivate agents to sometimes fight instead of precommitting to exchange info and then bargain.
Certainly, any formal model is going to be an oversimplification, but models can be useful checks on intuitive hunches like mine. If I spent a long time formalizing different toy games to try to represent the situation we’re talking about, and I found that none of my games had (a positive probability of) war as an equilibrium strategy, I’d have good evidence that your view was more correct than mine.
There might be some analogous results in the post-Fearon, rational-choice political science literature, I don’t know it well enough to say. And even if not, it might be possible to build a relevant game incrementally.
Start with a take-it-or-leave-it game. Nature samples a player’s cost of war from some distribution and reveals it only to that player. (Or, alternatively, Nature randomly assigns a discrete, privately known type to a player, where the type reflects the player’s cost of war.) That player then chooses between (1) initiating a bargaining sub-game and (2) issuing a demand to the other player, triggering war if the demand is rejected. This should be tractable, since standard, solvable models exist for two-player bargaining.
So far we have private information, but no precommitment. But we could bring precommitment in by adding extra moves to the game: before making the bargain-or-demand choice, players can mutually agree to some information-revealing procedure followed by bargaining with the newly revealed information in hand. Solving this expanded game could be informative.
They’ll also find ways of faking whatever communication methods are being used.
To me, this sounds like saying that sufficiently rational agents will never defect in prisoner dilemma provided they can communicate with each other.
I think you need verifiable pre-commitment, not just communication—in a free-market economy, enforced property rights basically function as such a pre-commitment mechanism. Where pre-commitment (including property right enforcement) is imperfect, only a constrained optimum can be reached, since any counterparty has to assume ex-ante that the agent will exploit the lack of precommitment. Imperfect information disclosure has similar effects, however in that case one has to “assume the worst” about what information the agent has; the deal must be altered accordingly, and this generally comes at a cost in efficiency.
The whole point of the prisoner’s dilemma is that the prisoners cannot communicate. If they can, it’s not a prisoner’s dilemma any more.
Yeah, I would agree with that. My bar for “sufficiently rational” is quite high though, closer to the mathematical ideal of rationality than to humans. (For example, sufficiently rational agents should be able to precommit.)
Not if the “resource” is the head of one of the rational agents on a plate.
The Aumann theorem requires identical priors and identical sets of available information.
I think sharing all information is doable. As for priors, there’s a beautiful LW trick called “probability as caring” which can almost always make priors identical. For example, before flipping a coin I can say that all good things in life will be worth 9x more to me in case of heads than tails. That’s purely a utility function transformation which doesn’t touch the prior, but for all decision-making purposes it’s equivalent to changing my prior about the coin to 90⁄10 and leaving the utility function intact. That handles all worlds except those that have zero probability according to one of the AIs. But in such worlds it’s fine to just give the other AI all the utility.
In all cases? Information is power.
There is an old question that goes back to Abraham Lincoln or something:
If you call a dog’s tail a leg, how many legs does a dog have?
I think the idea that if one AI says there is a 50% chance of heads, and the other AI says there is a 90% chance of heads, the first AI can describe the second AI as knowing that there is a 50% chance, but caring more about the heads outcome. Since it can redescribe the other’s probabilities as matching its own, agreement on what should be done will be possible. None of this means that anyone actually decides that something will be worth more to them in the case of heads.
First of all this makes any sense solely in the decision-taking context (and not in the forecast-the-future context). So this is not about what will actually happen but about comparing the utilities of two outcomes. You can, indeed, rescale the utility involved in a simple case, but I suspect that once you get to interdependencies and non-linear consequences things will get more hairy, if possible at all.
Besides, this requires you to know the utility function in question.
While war is irrational, demonstrative behaviour like arms race may be needed to discourage another side from war.
Imagine that two benevolent superintelligence appear. However, SI A suspects that SI B is a paperclip maximizer. In that case, it is afraid that SI B may turn off SI A. Thus it demonstratively invests some resources in protecting its power source, so it would be expensive for the SI B to try to turn off SI A.
This starts the arms race, but the race is unstable and could result in war.
Even if A is FAI and B is a paperclipper, as long as both use correct decision theory, they will instantly merge into a new SI with a combined utility function. Avoiding arms races and any other kind of waste (including waste due to being separate SIs) is in their mutual interest. I don’t expect rational agents to fail achieving mutual interest. If you expect that, your idea of rationality leads to predictably suboptimal utility, so it shouldn’t be called “rationality”. That’s covered in the sequences.
But how I could be sure that paperclip maximiser is a rational agent with correct decision theory? I would not expect it from the papercliper.
If an agent is irrational, it can cause all sorts of waste. I was talking about sufficiently rational agents.
If the problem is proving rationality to another agent, SI will find a way.
My point is exactly this. If SI is able to prove its rationality (meaning that it is always cooperating in PD etc.), it also able fake any such proof.
If you have two options: to turn off papercliper, or to cooperate with it by giving it half of the universe, what would you do?
I imagine merging like this:
1) Bargain about a design for a joint AI, using any means of communication
2) Build it in a location monitored by both parties
3) Gradually transfer all resources to the new AI
4) Both original AIs shut down, new AI fulfills their combined goals
No proof of rationality required. You can design the process so that any deviation will help the opposing side.
I could imagine some failure modes, but surely I can’t imagine the best one. For example, “both original AIs shut down” simultaneously is vulnerable for defecting.
I also have some busyness experience, and I found that almost every deal includes some cheating, and the cheating is everytime something new. So I always have to ask myself - where is the cheating from the other side? If don’t see it, it’s bad, as it could be something really unexpected. Personally, I hate cheating.
An AI could devise a very secure merging process. We don’t have to code it ourselves.
But should we merge with papercliper if we could turn it off?
It reminds me Great Britain policy towards Hitler before WW2, which suggested to give him what he wants to prevent the war. https://en.wikipedia.org/wiki/Appeasement
If we can turn off the paperclipper for free, sure. But if war would destroy X resources, it’s better to merge and spend X/2 on paperclips.
So if the price of turning off paperclip is Y, if Y is higher than X/2 , we should cooperate?
But if we agree on this, we create for the papercliper an incentive to increase Y, until it reaches X/2. To increase Y, papercliper has to invest in defense mechanisms or offensive weapons. It creates arms race, until negotiations become more profitable. However, arms race is risky and could turn into war.
Edited: higher.
The paperclipper doesn’t need to invest anything. The AIs will just merge without any arms race or war. The possibility of an arms race or war, and its full predicted cost to both sides, will be taken into account during barganing instead. For example, if the paperclipper has a button that can nuke half of our utility, the merged AI will prioritize paperclips more.
So they meet before the possible start of the arms race and compare each other relative advantages? I still think that they may try to demonstrate higher barging power than they actually have and that it is almost impossible for us to predict how their game will play because of its complexity.
Thanks for participating in this interesting conversation.
Yeah, bargaining between AIs is a very hard problem and we know almost nothing about it. It will probably have all sorts of deception tactics. But in any case, using bargaining instead of war is still in both AI’s common interest, and AIs should be able to achieve common interest.
For example, if A has hidden information that will give it an advantage in war, then B can precommit to giving A more share conditional on seeing it (e.g. by constructing a successor AI that visibly includes the precommitment under A’s watch). Eventually the AIs should agree on all questions of fact and disagree only on values, at which point they agree on how the war will likely go, so they skip the war and share the bigger pie according to the war’s predicted outcome.
BTW, the book “On thermonuclear war” by Kahn is exactly an attempt to predict the ways of war, negotiations and barging between two presumably rational agents (superpowers). Even an idea to move all resources to new third agent is discussed, as I remember—that is donating all nukes to UN.
How B could see that A has hidden information?
Personally, I feel like you have a mathematically correct, but idealistic and unrealistic model of relations between two perfect agents.
Yeah, Schelling’s “Strategy of Conflict” deals with many of the same topics.
A: “I would have an advantage in war so I demand a bigger share now” B: “Prove it” A: “Giving you the info would squander my advantage” B: “Let’s agree on a procedure to check the info, and I precommit to giving you a bigger share if the check succeeds” A: “Cool”
If visible precommitment by B requires it to share the source code for its successor AI, then it would also be giving up any hidden information it has. Essentially both sides have to be willing to share all information with each other, creating some sort of neutral arbitration about which side would have won and at what cost to the other. That basically means creating a merged superintelligence is necessary just to start the bargaining process, since they each have to prove to the other that the neutral arbiter will control all relevant resources to prevent cheating.
Realistically, there will be many cases where one side thinks its hidden information is sufficient to make the cost of conflict smaller than the costs associated with bargaining, especially given the potential for cheating.
Simply by telling B about the existence of an advantage A is giving B info that could weaken it. Also, what if the advantage is a way to partially cheat in precommitments?
I think there are two other failure modes, which need to be a resolved:
A weaker side is making negotiation longer if it helps it to gain power
A weaker side could fake the size of its army (Like North Korea did with its wooden missiles on last military show)
What combined utility function? There is no way to combine utility functions.
Weighted sum, with weights determined by bargaining.
Wait, we have only examples of the opposite. Every human who creates another human ha at best a hazy understanding of that new human’s goal function. As soon as agent 2 has any unobserved experiences or self-modification, it’s a distinct separate agent.
True with a wide enough definition of “go to war”. Instead say “compete for resources” and you’re solid. Note that competition may include cooperation (against mutual “enemies” or against nature), trade, and even altruism or charity (especially where the altruistic agent perceives some similarity with the recipient, and it becomes similar to cooperation against nature).
By going to war I meant an attempt to turn off another agent.
I think that’s a pretty binary (and useless) definition. There have been almost no wars that didn’t end until one of the participating groups was completely eliminated. There have been conflicts and competition among groups that did have that effect, but we don’t call them “war” in most cases.
Open, obvious, direct violent conflict is a risky way to attain most goals, even those that are in conflict with some other agent. Rational agents would generally prefer to kill them off by peaceful means.
There is a more sophisticated definition of war, coming from Clausewitz, which on contemporary language could be said something like that “the war is changing the will of your opponent without negotiation”. The enemy must unconditionaly capitualte, and give up its value system.
You could do it by threat, torture, rewriting of the goal system or deleting the agent.
Does the agent care about changing the will of the “opponent”, or just changing behavior (in my view of intelligence, there’s not much distinction, but that’s not the common approach)? If you care mostly about future behavior rather than internal state, then the “without negotiation” element become meaningless and you’re well on your way toward accepting that “competition” is a more accurate frame than “war”.
That is the point, though. By Loeb’s theorem, the only agents that are knowable for sure are those with less power. So an agent might want to create a successor that isn’t fully knowable in advance, or, on the other hand, if a perfectly knowable successor could be constructed, then you would have a finite method to ensure the compatibility of two source codes (is this true? It seems plausible).