Do you have a citation? You seem to believe that this is common knowledge among game theorists, but I don’t think I’ve ever encountered that.
Jacob and I have already considered payout correlation, and I agree that it has some desirable properties. However,
it’s symmetric across players,
it’s invariant to player rationality
which matters, since alignment seems to not just be a function of incentives, but of what-actually-happens and how that affects different players
it equally weights each outcome in the normal-form game, ignoring relevant local dynamics. For example, what if part of the game table is zero-sum, and part is common-payoff? Correlation then can be controlled by zero-sum outcomes which are strictly dominated for all players. For example:
1 / 1 || 2 / 2 -.5 / .5 || 1 / 1
and so I don’t think it’s a slam-dunk solution. At the very least, it would require significant support.
You’re simply incorrect (or describing a different payout matrix than you state) that a player doesn’t “have to select a best response”.
Why? I suppose it’s common to assume (a kind of local) rationality for each player, but I’m not interested in assuming that here. It may be easier to analyze the best-response case as a first start, though.
It’s a definitional thing. The definition of utility is “the thing people maximize.” If you set up your 2x2 game to have utilities in the payout matrix, then by definition both actors will attempt to pick the box with the biggest number. If you set up your 2x2 game with direct payouts from the game that don’t include phychic (eg “I just like picking the first option given”) or reputational effects, then any concept of alignment is one of:
assume the players are trying for the biggest number, how much will they be attempting to land on the same box?
alignment is completely outside of the game, and is one of the features of function that converts game payouts to global utility
You seem to be muddling those two, and wondering “how much will people attempt to land on the same box, taking into account all factors, but only defining the boxes in terms of game payouts.” The answer there is “you can’t.” Because people (and computer programs) have wonky screwed up utility functions (eg (spoiler alert) https://en.wikipedia.org/wiki/Man_of_the_Year_(2006_film))
The definition of utility is “the thing people maximize.”
Only applicable if you’re assuming the players are VNM-rational over outcome lotteries, which I’m not. Forget expected utility maximization.
It seems to me that people are making the question more complicated than it has to be, by projecting their assumptions about what a “game” is. We have payoff numbers describing how “good” each outcome is to each player. We have the strategy spaces, and the possible outcomes of the game. And here’s one approach: fix two response functions in this game, which are functions from strategy profiles to the player’s response strategy. With respect to the payoffs, how “aligned” are these response functions with each other?
This doesn’t make restrictive rationality assumptions. It doesn’t require getting into strange utility assumptions. Most importantly, it’s a clearly-defined question whose answer is both important and not conceptually obvious to me.
(And now that I think of it, I suppose that depending on your response functions, even in zero-sum games, you could have “A aligned with B”, or “B aligned with A”, but not both.)
> The definition of utility is “the thing people maximize.”
Only applicable if you’re assuming the players are VNM-rational over outcome lotteries, which I’m not. Forget expected utility maximization.
Then what’s the definition / interpretation of “payoff”, i.e. the numbers you put in the matrix? If they’re not utilities, are they preferences? How can they be preferences if agents can “choose” not to follow them? Where do the numbers come from?
Note that Vanessa’s answer doesn’t need to depend on uB, which I think is its main strength and the reason it makes intuitive sense. (And I like the answer much less when uB is used to impose constraints.)
I think I’ve been unclear in my own terminology, in part because I’m uncertain about what other people have meant by ‘utility’ (what you’d recover from perfect IRL / Savage’s theorem, or cardinal representation of preferences over outcomes?) My stance is that they’re utilities but that I’m not assuming the players are playing best responses in order to maximize expected utility.
How can they be preferences if agents can “choose” not to follow them?
Am I allowed to have preferences without knowing how to maximize those preferences, or while being irrational at times? Boltzmann-rational agents have preferences, don’t they? These debates have surprised me; I didn’t think that others tied together “has preferences” and “acts rationally with respect to those preferences.”
There’s a difference between “the agent sometimes makes mistakes in getting what it wants” and “the agent does the literal opposite of what it wants”; in the latter case you have to wonder what the word “wants” even means any more.
My understanding is that you want to include cases like “it’s a fixed-sum game, but agent B decides to be maximally aligned / cooperative and do whatever maximizes A’s utility”, and in that case I start to question what exactly B’s utility function meant in the first place.
I’m told that Minimal Rationality addresses this sort of position, where you allow the agent to make mistakes, but don’t allow it to be e.g. literally pessimal since at that point you have lost the meaning of the word “preference”.
(I kind of also want to take the more radical position where when talking about abstract agents the only meaning of preferences is “revealed preferences”, and then in the special case of humans we also see this totally different thing of “stated preferences” that operates at some totally different layer of abstraction and where talking about “making mistakes in achieving your preferences” makes sense in a way that it does not for revealed preferences. But I don’t think you need to take this position to object to the way it sounds like you’re using the term here.)
Hm. At first glance this feels like a “1” game to me, if they both use the “take the strictly dominant action” solution concept. The alignment changes if they make decisions differently, but under the standard rationality assumptions, it feels like a perfectly aligned game.
Correlation between outcomes, not within them.
If both players prefer to be in the same box, they are aligned. As we add indifference and opposing choices, they become unalienable.
In your example, both people have the exact same ordering of outcome. In a classic PD, there is some mix.
Totally unaligned (constant value) example:
0/2 2⁄0
2/0 0⁄2
Do you have a citation? You seem to believe that this is common knowledge among game theorists, but I don’t think I’ve ever encountered that.
Jacob and I have already considered payout correlation, and I agree that it has some desirable properties. However,
it’s symmetric across players,
it’s invariant to player rationality
which matters, since alignment seems to not just be a function of incentives, but of what-actually-happens and how that affects different players
it equally weights each outcome in the normal-form game, ignoring relevant local dynamics. For example, what if part of the game table is zero-sum, and part is common-payoff? Correlation then can be controlled by zero-sum outcomes which are strictly dominated for all players. For example:
1 / 1 || 2 / 2
-.5 / .5 || 1 / 1
and so I don’t think it’s a slam-dunk solution. At the very least, it would require significant support.
Why? I suppose it’s common to assume (a kind of local) rationality for each player, but I’m not interested in assuming that here. It may be easier to analyze the best-response case as a first start, though.
It’s a definitional thing. The definition of utility is “the thing people maximize.” If you set up your 2x2 game to have utilities in the payout matrix, then by definition both actors will attempt to pick the box with the biggest number. If you set up your 2x2 game with direct payouts from the game that don’t include phychic (eg “I just like picking the first option given”) or reputational effects, then any concept of alignment is one of:
assume the players are trying for the biggest number, how much will they be attempting to land on the same box?
alignment is completely outside of the game, and is one of the features of function that converts game payouts to global utility
You seem to be muddling those two, and wondering “how much will people attempt to land on the same box, taking into account all factors, but only defining the boxes in terms of game payouts.” The answer there is “you can’t.” Because people (and computer programs) have wonky screwed up utility functions (eg (spoiler alert) https://en.wikipedia.org/wiki/Man_of_the_Year_(2006_film))
Only applicable if you’re assuming the players are VNM-rational over outcome lotteries, which I’m not. Forget expected utility maximization.
It seems to me that people are making the question more complicated than it has to be, by projecting their assumptions about what a “game” is. We have payoff numbers describing how “good” each outcome is to each player. We have the strategy spaces, and the possible outcomes of the game. And here’s one approach: fix two response functions in this game, which are functions from strategy profiles to the player’s response strategy. With respect to the payoffs, how “aligned” are these response functions with each other?
This doesn’t make restrictive rationality assumptions. It doesn’t require getting into strange utility assumptions. Most importantly, it’s a clearly-defined question whose answer is both important and not conceptually obvious to me.
(And now that I think of it, I suppose that depending on your response functions, even in zero-sum games, you could have “A aligned with B”, or “B aligned with A”, but not both.)
Then what’s the definition / interpretation of “payoff”, i.e. the numbers you put in the matrix? If they’re not utilities, are they preferences? How can they be preferences if agents can “choose” not to follow them? Where do the numbers come from?
Note that Vanessa’s answer doesn’t need to depend on uB, which I think is its main strength and the reason it makes intuitive sense. (And I like the answer much less when uB is used to impose constraints.)
I think I’ve been unclear in my own terminology, in part because I’m uncertain about what other people have meant by ‘utility’ (what you’d recover from perfect IRL / Savage’s theorem, or cardinal representation of preferences over outcomes?) My stance is that they’re utilities but that I’m not assuming the players are playing best responses in order to maximize expected utility.
Am I allowed to have preferences without knowing how to maximize those preferences, or while being irrational at times? Boltzmann-rational agents have preferences, don’t they? These debates have surprised me; I didn’t think that others tied together “has preferences” and “acts rationally with respect to those preferences.”
There’s a difference between “the agent sometimes makes mistakes in getting what it wants” and “the agent does the literal opposite of what it wants”; in the latter case you have to wonder what the word “wants” even means any more.
My understanding is that you want to include cases like “it’s a fixed-sum game, but agent B decides to be maximally aligned / cooperative and do whatever maximizes A’s utility”, and in that case I start to question what exactly B’s utility function meant in the first place.
I’m told that Minimal Rationality addresses this sort of position, where you allow the agent to make mistakes, but don’t allow it to be e.g. literally pessimal since at that point you have lost the meaning of the word “preference”.
(I kind of also want to take the more radical position where when talking about abstract agents the only meaning of preferences is “revealed preferences”, and then in the special case of humans we also see this totally different thing of “stated preferences” that operates at some totally different layer of abstraction and where talking about “making mistakes in achieving your preferences” makes sense in a way that it does not for revealed preferences. But I don’t think you need to take this position to object to the way it sounds like you’re using the term here.)
Tabooing “aligned” what property are you trying to map on a scale of “constant sum” to “common payoff”?
Good question. I don’t have a crisp answer (part of why this is an open question), but I’ll try a few responses:
To what degree does player 1′s actions further the interests of player 2 within this normal form game, and vice versa?
This version requires specific response functions.
To what degree do the interests of players 1 and 2 coincide within a normal form game?
This feels more like correlation of the payout functions, represented as vectors.
So, given this payoff matrix (where P1 picks a row and gets the first payout, P2 picks column and gets 2nd payout):
5 / 0 ; 5 / 100
0 / 100 ; 0 / 1
Would you say P1′s action furthers the interest of player 2?
Would P2′s action further the interest of player 1?
Where would you rank this game on the 0 − 1 scale?
Hm. At first glance this feels like a “1” game to me, if they both use the “take the strictly dominant action” solution concept. The alignment changes if they make decisions differently, but under the standard rationality assumptions, it feels like a perfectly aligned game.
Correlation between outcomes, not within them. If both players prefer to be in the same box, they are aligned. As we add indifference and opposing choices, they become unalienable. In your example, both people have the exact same ordering of outcome. In a classic PD, there is some mix. Totally unaligned (constant value) example: 0/2 2⁄0 2/0 0⁄2