What would you say are the remaining problems that need to be solved, if we assume everyone has a way to accurately estimate everyone else’s utility function? The main one that comes to mind for me is, there are many possible solutions/equilibria/policy-sets that get to pareto-optimal outcomes, but they differ in how good they are for different players, and so it’s not enough that players be aware of a solution, and it’s also not even enough that there be one solution which stands out as extra salient—because players will be hoping to achieve a solution that is more favorable to them and might do various crazy things to try to achieve that. (This is a vague problem statement though, perhaps you can do better!)
The main one that comes to mind for me is, there are many possible solutions/equilibria/policy-sets that get to pareto-optimal outcomes, but they differ in how good they are for different players, and so it’s not enough that players be aware of a solution, and it’s also not even enough that there be one solution which stands out as extra salient—because players will be hoping to achieve a solution that is more favorable to them and might do various crazy things to try to achieve that.
This seems like it’s solved by just not letting your opponent get more utility than they would under the bargaining system you think is fair, no matter what crazy things they do? If there is a bargaining solution which stands out, then agents which strategize over which solution they propose will choose the salient one, since they expect their partner to do the same. I might be misunderstand what you’re getting at, though.
What would you say are the remaining problems that need to be solved, if we assume everyone has a way to accurately estimate everyone else’s utility function?
Finding something like a universally canonical bargaining solution would be great, as it would allow agents with knowledge of each other’s utility functions to achieve Pareto optimality. I think it’s not fully disentagleable from the question of incentivizing honesty, as I could imagine that there is some otherwise great bargaining solution that turns out to be unfixably vulnerable to dishonesty. Although, I do have an intuition that probably most reasonable bargaining solutions thought up in good faith are similar enough to each other that agents using different ones wouldn’t end up too far from the Pareto frontier, and so I’m not sure how important it is.
I think my answer is probably figuring out how to deal with strategic successor agents and dealing with counterfactuals. The successor agent problem is similar to the problem of lying about utility functions: if you’re dealing with a successor agent (or an agent which has modified its utility function), you need to bargain with it as though it had the utility function of its creator (or its original utility function), and figuring out how to deal with uncertainty over how an agent’s values have been modified by other agents or its past self seems important.
Bargaining solutions usually have the property that you can’t naively apply them to subgames, as different agents might value the subgames more or less, and an agent might be happy to accept a locally unfair deal for a better deal in another subgame. This is fine for sequential or simultaneous subgames, but some subgames only happen with some probability. Determining what would happen in counterfactual subgames is important for determining the fair solution in the occurring subgames, but verifying would counterfactually happen is often quite difficult.
In some sense, these are just subproblems of incentivizing honesty generally. I think the problem of incentivizing honesty is the overwhelming importance-weighted bulk of the remaining open problems in bargaining theory (relative to what I know), and it’s hard for me to think of an important problem that isn’t entangled with that in some way.
Huh. I don’t worry much about the problem of incentivizing honesty myself, because the cases I’m most worried about are cases where everyone can read everyone else’s minds (with some time lag). Do you think there’s basically no problem then, in those cases?
There’s still the problem of successor agents and self-modifying agents, where you need to set up incentives to create successor agents with the same utility functions and to not strategically self-modify, and I think a solution to that would probably also work as a solution to normal dishonesty.
I do expect that in a case where agents can also see each other’s histories, we can make bargaining go well with the bargaining theory we know (given that the agents try to bargain well, there are of course possible agents which don’t try to cooperate well).
In the cases I’m thinking about you don’t just read their minds now, you read their entire history, including predecessor agents. All is transparent. (Fictional but illustrative example: the French AGI and the Russian AGI are smart like sherlock holmes, they can deduce pretty much everything that happened in great detail leading up to and during the creation of each other + also they are still running on human hardware at human institutions and thanks to constant leaking and also the offence/defense balance favoring offense, they can see logs of what each other is and was thinking the entire time, including through various rounds of modification-to-successor agent.)
Thanks, this was super helpful!
What would you say are the remaining problems that need to be solved, if we assume everyone has a way to accurately estimate everyone else’s utility function? The main one that comes to mind for me is, there are many possible solutions/equilibria/policy-sets that get to pareto-optimal outcomes, but they differ in how good they are for different players, and so it’s not enough that players be aware of a solution, and it’s also not even enough that there be one solution which stands out as extra salient—because players will be hoping to achieve a solution that is more favorable to them and might do various crazy things to try to achieve that. (This is a vague problem statement though, perhaps you can do better!)
You’re welcome!
This seems like it’s solved by just not letting your opponent get more utility than they would under the bargaining system you think is fair, no matter what crazy things they do? If there is a bargaining solution which stands out, then agents which strategize over which solution they propose will choose the salient one, since they expect their partner to do the same. I might be misunderstand what you’re getting at, though.
Finding something like a universally canonical bargaining solution would be great, as it would allow agents with knowledge of each other’s utility functions to achieve Pareto optimality. I think it’s not fully disentagleable from the question of incentivizing honesty, as I could imagine that there is some otherwise great bargaining solution that turns out to be unfixably vulnerable to dishonesty. Although, I do have an intuition that probably most reasonable bargaining solutions thought up in good faith are similar enough to each other that agents using different ones wouldn’t end up too far from the Pareto frontier, and so I’m not sure how important it is.
I think my answer is probably figuring out how to deal with strategic successor agents and dealing with counterfactuals. The successor agent problem is similar to the problem of lying about utility functions: if you’re dealing with a successor agent (or an agent which has modified its utility function), you need to bargain with it as though it had the utility function of its creator (or its original utility function), and figuring out how to deal with uncertainty over how an agent’s values have been modified by other agents or its past self seems important.
Bargaining solutions usually have the property that you can’t naively apply them to subgames, as different agents might value the subgames more or less, and an agent might be happy to accept a locally unfair deal for a better deal in another subgame. This is fine for sequential or simultaneous subgames, but some subgames only happen with some probability. Determining what would happen in counterfactual subgames is important for determining the fair solution in the occurring subgames, but verifying would counterfactually happen is often quite difficult.
In some sense, these are just subproblems of incentivizing honesty generally. I think the problem of incentivizing honesty is the overwhelming importance-weighted bulk of the remaining open problems in bargaining theory (relative to what I know), and it’s hard for me to think of an important problem that isn’t entangled with that in some way.
Huh. I don’t worry much about the problem of incentivizing honesty myself, because the cases I’m most worried about are cases where everyone can read everyone else’s minds (with some time lag). Do you think there’s basically no problem then, in those cases?
There’s still the problem of successor agents and self-modifying agents, where you need to set up incentives to create successor agents with the same utility functions and to not strategically self-modify, and I think a solution to that would probably also work as a solution to normal dishonesty.
I do expect that in a case where agents can also see each other’s histories, we can make bargaining go well with the bargaining theory we know (given that the agents try to bargain well, there are of course possible agents which don’t try to cooperate well).
In the cases I’m thinking about you don’t just read their minds now, you read their entire history, including predecessor agents. All is transparent. (Fictional but illustrative example: the French AGI and the Russian AGI are smart like sherlock holmes, they can deduce pretty much everything that happened in great detail leading up to and during the creation of each other + also they are still running on human hardware at human institutions and thanks to constant leaking and also the offence/defense balance favoring offense, they can see logs of what each other is and was thinking the entire time, including through various rounds of modification-to-successor agent.)