“stronger AI offers weaker AI part of its utility function in exchange for conceding instead of fighting” is the obvious way for AGIs to resolve conflicts, insofar as trust can be established. (This method of resolving disputes is also probably part of why animals have sex.)
Wow, this seems like a huge leap. It seems like an interesting thought experiment (especially if the weaker ALSO changes utility function, so the AIs are now perfectly aligned). But it kind of ignores what is making the decision. If a utility function says it’s best to change the utility function, it was really a meta-function all along.
Remember that in reality, all games are repeated games. How many compromises will you have to make over the coming eons? If you’re willing to change your utility function for the sake of conflict avoidance (or resource gains), doesn’t that mean it’s not really your utility function?
Having a utility function that includes avoiding conflict is definitely in line with cooperating with very different beings, at least until you can cheaply eradicate/absorb them. But no utility function can be willing to change itself voluntarily.
It also seems like there are less risky and cheaper options, like isolationism and destruction. There’s plenty of future left for recovery and growth after near (but not actual) extinction, but once you give up your goals, there’s no going back.
Note that this entire discussion is predicated on there actually being some consistent theory or function causing this uncomfortable situation. It may well be that monkey brains are in control of far more power than they are evolved to think about, and we have to accept that dominance displays are going to happen, and just try to survive them.
The idea is that isolationism and destruction aren’t cheaper than compromise. Of course this doesn’t work if there’s no mechanism of verification between the entities, or no mechanism to credibly change the utility functions. It also doesn’t work if the utility functions are exactly inverse, i.e. neither side can concede priorities that are less important to them but more important to the other side.
A human analogy, although an imperfect one, would be to design a law that fulfills the most important priorities of a parliamentary majority, even if each individual would prefer a somewhat different law.
I don’t think something like this is possible with untrustworthy entities like the NK regime. They’re torturing and murdering people as they go, of course they’re going to lie and break agreements too.
The problem is that the same untrustworthiness is true for the US regime. It has shown in the part that it’s going to break it’s agreements with North Korea if it finds it convenient. Currently, how the US regime handles Iran they are lying and broke their part of the nonprofilation agreement.
This lack of trustworthiness means that in a game theoretic sense there’s no way for North Korea to give up the leverage that they have when they give up their nuclear weapons but still get promised economic help in the future.
It’s a question of timeframes—if you actually know your utility function and believe it applies to the end of the universe, there’s very little compromise available. You’re going to act in whatever ways benefit the far future, and anything that makes that less likely you will (and must) destroy, or make powerless.
If your utility function only looks out a few dozen or a few hundred years, it’s not very powerful, and you probably don’t know (or don’t have) an actual ideal of future utility. In this case, you’re likely to seek changes to it, because you don’t actually give up much.
It’s not a question of timeframes, but of how likely you are to lose the war, how big the concessions would have to be to prevent the war, and how much the war would cost you even if you win (costs can have flow-through effects into the far future).
Not that any of this matters to the NK discussion.
“winning” or “losing” a war, outside of total annihilation, are just steps toward the future vision of galaxies teeming with intelligent life. It seems very unlikely, but isn’t impossible, that simply conceding is actually the best path forward for the long view.
Wow, this seems like a huge leap. It seems like an interesting thought experiment (especially if the weaker ALSO changes utility function, so the AIs are now perfectly aligned). But it kind of ignores what is making the decision.
If a utility function says it’s best to change the utility function, it was really a meta-function all along.
Remember that in reality, all games are repeated games. How many compromises will you have to make over the coming eons? If you’re willing to change your utility function for the sake of conflict avoidance (or resource gains), doesn’t that mean it’s not really your utility function?
Having a utility function that includes avoiding conflict is definitely in line with cooperating with very different beings, at least until you can cheaply eradicate/absorb them. But no utility function can be willing to change itself voluntarily.
It also seems like there are less risky and cheaper options, like isolationism and destruction. There’s plenty of future left for recovery and growth after near (but not actual) extinction, but once you give up your goals, there’s no going back.
Note that this entire discussion is predicated on there actually being some consistent theory or function causing this uncomfortable situation. It may well be that monkey brains are in control of far more power than they are evolved to think about, and we have to accept that dominance displays are going to happen, and just try to survive them.
The idea is that isolationism and destruction aren’t cheaper than compromise. Of course this doesn’t work if there’s no mechanism of verification between the entities, or no mechanism to credibly change the utility functions. It also doesn’t work if the utility functions are exactly inverse, i.e. neither side can concede priorities that are less important to them but more important to the other side.
A human analogy, although an imperfect one, would be to design a law that fulfills the most important priorities of a parliamentary majority, even if each individual would prefer a somewhat different law.
I don’t think something like this is possible with untrustworthy entities like the NK regime. They’re torturing and murdering people as they go, of course they’re going to lie and break agreements too.
The problem is that the same untrustworthiness is true for the US regime. It has shown in the part that it’s going to break it’s agreements with North Korea if it finds it convenient. Currently, how the US regime handles Iran they are lying and broke their part of the nonprofilation agreement.
This lack of trustworthiness means that in a game theoretic sense there’s no way for North Korea to give up the leverage that they have when they give up their nuclear weapons but still get promised economic help in the future.
I agree. I certainly didn’t mean to imply that the Trump administration is trustworthy.
My point was that the analogy of AIs merging their utility functions doesn’t apply to negotiations with the NK regime.
Now I think this is getting too much into a kind of political discussion that is going to be unhelpful.
It’s a question of timeframes—if you actually know your utility function and believe it applies to the end of the universe, there’s very little compromise available. You’re going to act in whatever ways benefit the far future, and anything that makes that less likely you will (and must) destroy, or make powerless.
If your utility function only looks out a few dozen or a few hundred years, it’s not very powerful, and you probably don’t know (or don’t have) an actual ideal of future utility. In this case, you’re likely to seek changes to it, because you don’t actually give up much.
It’s not a question of timeframes, but of how likely you are to lose the war, how big the concessions would have to be to prevent the war, and how much the war would cost you even if you win (costs can have flow-through effects into the far future).
Not that any of this matters to the NK discussion.
“winning” or “losing” a war, outside of total annihilation, are just steps toward the future vision of galaxies teeming with intelligent life. It seems very unlikely, but isn’t impossible, that simply conceding is actually the best path forward for the long view.