Yeah, that’s a good idea. It was proposed a decade ago by Wei Dai and Tim Freeman on the SL4 mailing list and got some discussion in various places. Some starting points are this SL4 post or this LW post, though the discussion kinda diverges. Here’s my current view:
1) Any conflict can be decomposed into bargaining (which Pareto optimal outcome do we want to achieve?) and enforcement (how do we achieve that outcome without anyone cheating?)
2) Bargaining is hard. We tried and failed many times to find a “fair” way to choose among Pareto optimal outcomes. The hardest part is nailing down the difference between bargaining and extortion.
3) Assuming we have some solution to bargaining, enforcement is easy enough for AIs. Most imaginable mechanisms for enforcement, like source code inspection, lead to the same set of outcomes.
4) The simplest way to think about enforcement is two AIs jointly building a new AI and passing all resources to it. If the two original AIs were Bayesian-rational and had utility functions U1 and U2, the new one should also be Bayesian-rational and have a utility function that’s a weighted sum of U1 and U2. This generalizes to any number of AIs.
5) The only subtlety is that weights shouldn’t be set by bargaining, as you might think. Instead, bargaining should determine some probability distribution over weights, then one sample from that distribution should be used as the actual weights. Think of it as flipping a coin to break ties between U1 and u2. That’s necessary to deal with flat Pareto frontiers, like the divide-the-dollar game.
At first I proposed this math as a solution to another problem by Stuart (handling meta-uncertainty about which utility function you have), but it works for AI merging too.
Ah, but my idea is different! It’s not just that these two AIs will physically merge. I claim that two AIs that are able to communicate are already indistinguishable from one AI with a different utility function. I reject the entire concept of meaningfully counting AIs.
There is a trivial idea that two humans together form a kind of single agent. This agent is not a human (there are too many conditions for being a human), and it might not be very smart (if the humans’ goals don’t align).
Now consider the same idea for two superintelligent AIs. I claim that the “combined” mind is also superintelligent, and it acts as though its utility function was a combination of the two initial utility functions. There are only complications from the possibly distributed physical architecture of the AI.
To take it even further, I claim that given any two AIs called A and B, if they together would choose strategy S, then there also exists a single AI called M(A,B), that would also choose strategy S. If we take the paperclip and staple maximizers, they might physically merge (or they might just randomly destroy one of them?). Now I claim that there is another single AI, with a slightly funky but reasonable architecture, which would be rewarded both for 60% staples and for 60% paperclips, and that this AI would choose to construct a new AI with a more coherent utility function (or it would choose to self modify to make its own utility coherent).
Also, thank you for digging for the old threads. It’s frustrating that there is so much out there that I would never know to even look for.
Edit: damn, I think the second link basically has the same idea as well.
I think if you carefully read everything in these links and let it stew for a bit, you’ll get something like my approach.
More generally, having ideas is great but don’t stop there! Always take the next step, make things slightly more precise, push a little bit past the point where you have everything figured out. That way you’re almost guaranteed to find new territory soon enough. I have an old post about that.
Yeah, that’s a good idea. It was proposed a decade ago by Wei Dai and Tim Freeman on the SL4 mailing list and got some discussion in various places. Some starting points are this SL4 post or this LW post, though the discussion kinda diverges. Here’s my current view:
1) Any conflict can be decomposed into bargaining (which Pareto optimal outcome do we want to achieve?) and enforcement (how do we achieve that outcome without anyone cheating?)
2) Bargaining is hard. We tried and failed many times to find a “fair” way to choose among Pareto optimal outcomes. The hardest part is nailing down the difference between bargaining and extortion.
3) Assuming we have some solution to bargaining, enforcement is easy enough for AIs. Most imaginable mechanisms for enforcement, like source code inspection, lead to the same set of outcomes.
4) The simplest way to think about enforcement is two AIs jointly building a new AI and passing all resources to it. If the two original AIs were Bayesian-rational and had utility functions U1 and U2, the new one should also be Bayesian-rational and have a utility function that’s a weighted sum of U1 and U2. This generalizes to any number of AIs.
5) The only subtlety is that weights shouldn’t be set by bargaining, as you might think. Instead, bargaining should determine some probability distribution over weights, then one sample from that distribution should be used as the actual weights. Think of it as flipping a coin to break ties between U1 and u2. That’s necessary to deal with flat Pareto frontiers, like the divide-the-dollar game.
At first I proposed this math as a solution to another problem by Stuart (handling meta-uncertainty about which utility function you have), but it works for AI merging too.
Ah, but my idea is different! It’s not just that these two AIs will physically merge. I claim that two AIs that are able to communicate are already indistinguishable from one AI with a different utility function. I reject the entire concept of meaningfully counting AIs.
There is a trivial idea that two humans together form a kind of single agent. This agent is not a human (there are too many conditions for being a human), and it might not be very smart (if the humans’ goals don’t align).
Now consider the same idea for two superintelligent AIs. I claim that the “combined” mind is also superintelligent, and it acts as though its utility function was a combination of the two initial utility functions. There are only complications from the possibly distributed physical architecture of the AI.
To take it even further, I claim that given any two AIs called A and B, if they together would choose strategy S, then there also exists a single AI called M(A,B), that would also choose strategy S. If we take the paperclip and staple maximizers, they might physically merge (or they might just randomly destroy one of them?). Now I claim that there is another single AI, with a slightly funky but reasonable architecture, which would be rewarded both for 60% staples and for 60% paperclips, and that this AI would choose to construct a new AI with a more coherent utility function (or it would choose to self modify to make its own utility coherent).
Also, thank you for digging for the old threads. It’s frustrating that there is so much out there that I would never know to even look for.
Edit: damn, I think the second link basically has the same idea as well.
I think if you carefully read everything in these links and let it stew for a bit, you’ll get something like my approach.
More generally, having ideas is great but don’t stop there! Always take the next step, make things slightly more precise, push a little bit past the point where you have everything figured out. That way you’re almost guaranteed to find new territory soon enough. I have an old post about that.