both players want to optimize the welfare function (making it a collaborative game)
The game is collaborative in the sense that a welfare function is optimized in equilibrium, but the principals will in general have different terminal goals (reward functions) and the equilibrium will be enforced with punishments (cf. tit-for-tat).
the issue is primarily that in a collaborative game, the optimal thing for you to do depends strongly on who your partner is, but you may not have a good understanding of who your partner is, and if you’re wrong you can do arbitrarily poorly
Agreed, but there’s the additional point that in the case of principals designing AI agents, the principals can (in theory) coordinate to ensure that the agents “know who their partner is”. That is, they can coordinate on critical game-theoretic parameters of their respective agents.
Ah, I misunderstood your post. I thought you were arguing for problems conditional on the principals agreeing on the welfare function to be optimized, and having common knowledge that they were designing agents that optimize that welfare function.
but there’s the additional point that in the case of principals designing AI agents, the principals can (in theory) coordinate to ensure that the agents “know who their partner is”.
I mean, in this case you just deploy one agent instead of two. Even under the constraint that you must deploy two agents, you exactly coordinate their priors / which equilibria they fall into. To get prior / equilibrium selection problems, you necessarily need to have agents that don’t know who their partner is. (Even if just one agent knows who the partner is, outcomes should be expected to be relatively good, though not optimal, e.g. if everything is deterministic, then threats are never executed.)
----
Looking at these objections, I think probably what you were imagining is a game where the principals have different terminal goals, but they coordinate by doing the following:
Agreeing upon a joint welfare function that is “fair” to the principals. In particular, this means that they agree that they are “licensed” to punish actions that deviate from this welfare function.
Going off and building their own agents that optimize the welfare function, but make sure to punish deviations (to ensure that the other principal doesn’t build an agent that pursues the principal’s goals instead of the welfare function)
New planned summary:
Consider the scenario in which two principals with different terminal goals will separately develop and deploy learning agents, that will then act on their behalf. Let us call this a _learning game_, in which the “players” are the principals, and the actions are the agents developed.
One strategy for this game is for the principals to first agree on a “fair” joint welfare function, such that they and their agents are then licensed to punish the other agent if they take actions that deviate from this welfare function. Ideally, this would lead to the agents jointly optimizing the welfare function (while being on the lookout for defection).
There still remain two coordination problems. First, there is an _equilibrium selection problem_: if the two deployed learning agents are Nash strategies from _different_ equilibria, payoffs can be arbitrarily bad. Second, there is a _prior selection problem_: given that there are many reasonable priors that the learning agents could have, if they end up with different priors from each other, outcomes can again be quite bad, especially in the context of <@threats@>(@Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda@).
New opinion:
These are indeed pretty hard problems in any non-competitive game. While this post takes the framing of considering optimal principals and/or agents (and so considers Bayesian strategies in which only the prior and choice of equilibrium are free variables), I prefer the framing taken in <@our paper@>(@Collaborating with Humans Requires Understanding Them@): the issue is primarily that the optimal thing for you to do depends strongly on who your partner is, but you may not have a good understanding of who your partner is, and if you’re wrong you can do arbitrarily poorly.
Note that when you can have a well-specified Bayesian belief over your partner, these problems don’t arise. However, both agents can’t be in this situation: in this case agent A would have a belief over B that has a belief over A; if these are all well-specified Bayesian beliefs, then A has a Bayesian belief over itself, which is usually impossible.
Btw, some reasons I prefer not using priors / equilibria and instead prefer just saying “you don’t know who your partner is”:
It encourages solutions that take advantage of optimality and won’t actually work in the situations we actually face.
The formality of “priors / equilibria” doesn’t have any benefit in this case (there aren’t any theorems to be proven). The one benefit I see is that it signals that “no, even if we formalize it, the problem doesn’t go away”, to those people who think that once formalized sufficiently all problems go away via the magic of Bayesian reasoning.
The strategy of agreeing on a joint welfare function is already a heuristic and isn’t an optimal strategy; it feels very weird to suppose that initially a heuristic is used and then we suddenly switch to pure optimality.
I mean, in this case you just deploy one agent instead of two
If the CAIS view multi-agent setups like this could be inevitable. There are also many reasons that we could want a lot of actors making a lot of agents rather than one actor making one agent. By having many agents we have no single point of failure (like fault-tolerant data-storage) and no single principle has a concentration of power (like the bitcoin protocol).
It does introduce more game-theoretic issues, but those issues seem understandable and tractable to me and there is very little work from the AI perspective that seriously tackles them, so the problems could be much easier than we think.
Even under the constraint that you must deploy two agents, you exactly coordinate their priors / which equilibria they fall into. To get prior / equilibrium selection problems, you necessarily need to have agents that don’t know who their partner is.
I think it is reasonable to think that there could be a band width constraint on coordination over the prior and equilibria selection, that is much smaller than all of the coordination scenarios you could possibly encounter. I agree to have these selection problems you need to not know who exactly your partner is, but it is possible to know quite a bit about your partner and still have coordination problems.
It encourages solutions that take advantage of optimality and won’t actually work in the situations we actually face.
I would be very weary of a solution that didn’t work when have optimal agents. I think it’s reasonable to try to get things to work when we do everything right before trying to make that process robust to errors
The formality of “priors / equilibria” doesn’t have any benefit in this case (there aren’t any theorems to be proven). The one benefit I see is that it signals that “no, even if we formalize it, the problem doesn’t go away”, to those people who think that once formalized sufficiently all problems go away via the magic of Bayesian reasoning.
I think there are theorems to be proven, just not of the form “there is an optimal thing to do”
The strategy of agreeing on a joint welfare function is already a heuristic and isn’t an optimal strategy; it feels very weird to suppose that initially a heuristic is used and then we suddenly switch to pure optimality.
It’s also, to a first approximation, the strategy society takes in lots of situations, this happens whenever people form teams with a common goal. There are usually processes of re-negotiating the goal, but between these times of conflict people gain a lot of efficiency by working together and punishing deviation.
I think there are theorems to be proven, just not of the form “there is an optimal thing to do”
I meant one thing and wrote another; I just meant to say that there weren’t theorems in this post.
If the CAIS view multi-agent setups like this could be inevitable.
My point is just that “prior / equilibrium selection problem” is a subset of the “you don’t know everything about the other player” problem, which I think you agree with?
It’s also, to a first approximation, the strategy society takes in lots of situations, this happens whenever people form teams with a common goal. There are usually processes of re-negotiating the goal, but between these times of conflict people gain a lot of efficiency by working together and punishing deviation.
I’m not sure how this relates to the thing I’m saying (I’m also not sure if I understood it).
My point is just that “prior / equilibrium selection problem” is a subset of the “you don’t know everything about the other player” problem, which I think you agree with?
I see two problems: one of trying to coordinate on priors, and one of trying to deal with having not successfully coordinated. I think that which is easier depends on the problem: if we’re applying it to CAIS, HRI or a multipolar scenario. Sometimes it’s easier to coordinate on a prior before hand, sometimes it’s easier to be robust to differing priors, and sometimes you have to go for a bit of both. I think it’s reasonable to call both solution techniques to the “prior / equilibrium selection problem”, but the framings shoot for different solutions, both of which I view as necessary sometimes.
The strategy of agreeing on a joint welfare function is already a heuristic and isn’t an optimal strategy; it feels very weird to suppose that initially a heuristic is used and then we suddenly switch to pure optimality.
I don’t really know what you mean by this. Specifically I don’t know from who’s perspective it isn’t optimal and under what beliefs.
A few things to point out:
The strategy of agreeing on a joint welfare function and optimizing it is an optimal strategy for some belief in infinitely iterated settings (because there is a folk theorem so almost everything is an optimal strategy for some belief)
Since we’re currently making norms for these interactions, we are currently designing these beliefs. This means that we can make it be the case that having that belief is justified in future deployments.
If we want to talk about “optimality” in terms of “equilibria selection procedures” or “coordination norms” we have to have a metric to say some outcomes are “better” than others. This is not a utility function for the agents, but for us as the norm designers. Social welfare seems good for this.
The new summary looks good =) Although I second Michael Dennis’ comment below, that the infinite regress of priors is avoided in standard game theory by specifying a common prior. Indeed the specification of this prior leads to a prior selection problem.
The formality of “priors / equilibria” doesn’t have any benefit in this case (there aren’t any theorems to be proven)
I’m not sure if you mean “there aren’t any theorems to be proven” or “any theorem that’s proven in this framework would be useless”. The former is false, e.g. there are things to prove about the construction of learning equilibria in various settings. I’m sympathetic with the latter criticism, though my own intuition is that working with the formalism will help uncover practically useful methods for promoting cooperation, and point to problems that might not be obvious otherwise. I’m trying to make progress in this direction in this paper, though I wouldn’t yet call this practical.
The one benefit I see is that it signals that “no, even if we formalize it, the problem doesn’t go away”, to those people who think that once formalized sufficiently all problems go away via the magic of Bayesian reasoning
Yes, this is a major benefit I have in mind!
The strategy of agreeing on a joint welfare function is already a heuristic and isn’t an optimal strategy; it feels very weird to suppose that initially a heuristic is used and then we suddenly switch to pure optimality
I’m not sure what you mean by “heuristic” or “optimality” here. I don’t know of any good notion of optimality which is independent of the other players, which is why there is an equilibrium selection problem. The welfare function selects among the many equilibria (i.e. it selects one which optimizes the welfare). I wouldn’t call this a heuristic. There has to be some way to select among equilibria, and the welfare function is chosen such that the resulting equilibrium is acceptable by each of the principals’ lights.
I’m not sure what you mean by “heuristic” or “optimality” here. I don’t know of any good notion of optimality which is independent of the other players, which is why there is an equilibrium selection problem.
I think once you settle on a “simple” welfare function, it is possible that there are _no_ Nash equilibria such that the agents are optimizing that welfare function (I don’t even really know what it means to optimize the welfare function, given that you have to also punish the opponent, which isn’t an action that is useful for the welfare function).
I’m not sure if you mean “there aren’t any theorems to be proven” or “any theorem that’s proven in this framework would be useless”.
Hmm, I meant one thing and wrote another. I meant to say “there aren’t any theorems proven in this post”.
I second Michael Dennis’ comment below, that the infinite regress of priors is avoided in standard game theory by specifying a common prior. Indeed the specification of this prior leads to a prior selection problem.
Just to make sure that I was understood, I was also pointing out that “you can have a well-specified Bayesian belief over your partner” even without agreeing on a common prior, as long as you agree on a common set of possibilities or something effectively similar. This means that talking about “Bayesian agents without a common prior” is well-defined.
When there is not a common prior, this lead to an arbitrarily deep nesting of beliefs, but they are all well-defined. I can refer to “what A believes that B believes about A” without running into Russell’s Paradox. When the priors mis-match then the entire hierarchy of these beliefs might be useful to reason about, but when there is a common prior, it allows much of the hierarchy to collapse.
The game is collaborative in the sense that a welfare function is optimized in equilibrium, but the principals will in general have different terminal goals (reward functions) and the equilibrium will be enforced with punishments (cf. tit-for-tat).
Agreed, but there’s the additional point that in the case of principals designing AI agents, the principals can (in theory) coordinate to ensure that the agents “know who their partner is”. That is, they can coordinate on critical game-theoretic parameters of their respective agents.
Ah, I misunderstood your post. I thought you were arguing for problems conditional on the principals agreeing on the welfare function to be optimized, and having common knowledge that they were designing agents that optimize that welfare function.
I mean, in this case you just deploy one agent instead of two. Even under the constraint that you must deploy two agents, you exactly coordinate their priors / which equilibria they fall into. To get prior / equilibrium selection problems, you necessarily need to have agents that don’t know who their partner is. (Even if just one agent knows who the partner is, outcomes should be expected to be relatively good, though not optimal, e.g. if everything is deterministic, then threats are never executed.)
----
Looking at these objections, I think probably what you were imagining is a game where the principals have different terminal goals, but they coordinate by doing the following:
Agreeing upon a joint welfare function that is “fair” to the principals. In particular, this means that they agree that they are “licensed” to punish actions that deviate from this welfare function.
Going off and building their own agents that optimize the welfare function, but make sure to punish deviations (to ensure that the other principal doesn’t build an agent that pursues the principal’s goals instead of the welfare function)
New planned summary:
New opinion:
Btw, some reasons I prefer not using priors / equilibria and instead prefer just saying “you don’t know who your partner is”:
It encourages solutions that take advantage of optimality and won’t actually work in the situations we actually face.
The formality of “priors / equilibria” doesn’t have any benefit in this case (there aren’t any theorems to be proven). The one benefit I see is that it signals that “no, even if we formalize it, the problem doesn’t go away”, to those people who think that once formalized sufficiently all problems go away via the magic of Bayesian reasoning.
The strategy of agreeing on a joint welfare function is already a heuristic and isn’t an optimal strategy; it feels very weird to suppose that initially a heuristic is used and then we suddenly switch to pure optimality.
If the CAIS view multi-agent setups like this could be inevitable. There are also many reasons that we could want a lot of actors making a lot of agents rather than one actor making one agent. By having many agents we have no single point of failure (like fault-tolerant data-storage) and no single principle has a concentration of power (like the bitcoin protocol).
It does introduce more game-theoretic issues, but those issues seem understandable and tractable to me and there is very little work from the AI perspective that seriously tackles them, so the problems could be much easier than we think.
I think it is reasonable to think that there could be a band width constraint on coordination over the prior and equilibria selection, that is much smaller than all of the coordination scenarios you could possibly encounter. I agree to have these selection problems you need to not know who exactly your partner is, but it is possible to know quite a bit about your partner and still have coordination problems.
I would be very weary of a solution that didn’t work when have optimal agents. I think it’s reasonable to try to get things to work when we do everything right before trying to make that process robust to errors
I think there are theorems to be proven, just not of the form “there is an optimal thing to do”
It’s also, to a first approximation, the strategy society takes in lots of situations, this happens whenever people form teams with a common goal. There are usually processes of re-negotiating the goal, but between these times of conflict people gain a lot of efficiency by working together and punishing deviation.
I meant one thing and wrote another; I just meant to say that there weren’t theorems in this post.
My point is just that “prior / equilibrium selection problem” is a subset of the “you don’t know everything about the other player” problem, which I think you agree with?
I’m not sure how this relates to the thing I’m saying (I’m also not sure if I understood it).
I see two problems: one of trying to coordinate on priors, and one of trying to deal with having not successfully coordinated. I think that which is easier depends on the problem: if we’re applying it to CAIS, HRI or a multipolar scenario. Sometimes it’s easier to coordinate on a prior before hand, sometimes it’s easier to be robust to differing priors, and sometimes you have to go for a bit of both. I think it’s reasonable to call both solution techniques to the “prior / equilibrium selection problem”, but the framings shoot for different solutions, both of which I view as necessary sometimes.
I don’t really know what you mean by this. Specifically I don’t know from who’s perspective it isn’t optimal and under what beliefs.
A few things to point out:
The strategy of agreeing on a joint welfare function and optimizing it is an optimal strategy for some belief in infinitely iterated settings (because there is a folk theorem so almost everything is an optimal strategy for some belief)
Since we’re currently making norms for these interactions, we are currently designing these beliefs. This means that we can make it be the case that having that belief is justified in future deployments.
If we want to talk about “optimality” in terms of “equilibria selection procedures” or “coordination norms” we have to have a metric to say some outcomes are “better” than others. This is not a utility function for the agents, but for us as the norm designers. Social welfare seems good for this.
The new summary looks good =) Although I second Michael Dennis’ comment below, that the infinite regress of priors is avoided in standard game theory by specifying a common prior. Indeed the specification of this prior leads to a prior selection problem.
I’m not sure if you mean “there aren’t any theorems to be proven” or “any theorem that’s proven in this framework would be useless”. The former is false, e.g. there are things to prove about the construction of learning equilibria in various settings. I’m sympathetic with the latter criticism, though my own intuition is that working with the formalism will help uncover practically useful methods for promoting cooperation, and point to problems that might not be obvious otherwise. I’m trying to make progress in this direction in this paper, though I wouldn’t yet call this practical.
Yes, this is a major benefit I have in mind!
I’m not sure what you mean by “heuristic” or “optimality” here. I don’t know of any good notion of optimality which is independent of the other players, which is why there is an equilibrium selection problem. The welfare function selects among the many equilibria (i.e. it selects one which optimizes the welfare). I wouldn’t call this a heuristic. There has to be some way to select among equilibria, and the welfare function is chosen such that the resulting equilibrium is acceptable by each of the principals’ lights.
I think once you settle on a “simple” welfare function, it is possible that there are _no_ Nash equilibria such that the agents are optimizing that welfare function (I don’t even really know what it means to optimize the welfare function, given that you have to also punish the opponent, which isn’t an action that is useful for the welfare function).
Hmm, I meant one thing and wrote another. I meant to say “there aren’t any theorems proven in this post”.
Just to make sure that I was understood, I was also pointing out that “you can have a well-specified Bayesian belief over your partner” even without agreeing on a common prior, as long as you agree on a common set of possibilities or something effectively similar. This means that talking about “Bayesian agents without a common prior” is well-defined.
When there is not a common prior, this lead to an arbitrarily deep nesting of beliefs, but they are all well-defined. I can refer to “what A believes that B believes about A” without running into Russell’s Paradox. When the priors mis-match then the entire hierarchy of these beliefs might be useful to reason about, but when there is a common prior, it allows much of the hierarchy to collapse.