It’s a definitional thing. The definition of utility is “the thing people maximize.” If you set up your 2x2 game to have utilities in the payout matrix, then by definition both actors will attempt to pick the box with the biggest number. If you set up your 2x2 game with direct payouts from the game that don’t include phychic (eg “I just like picking the first option given”) or reputational effects, then any concept of alignment is one of:
assume the players are trying for the biggest number, how much will they be attempting to land on the same box?
alignment is completely outside of the game, and is one of the features of function that converts game payouts to global utility
You seem to be muddling those two, and wondering “how much will people attempt to land on the same box, taking into account all factors, but only defining the boxes in terms of game payouts.” The answer there is “you can’t.” Because people (and computer programs) have wonky screwed up utility functions (eg (spoiler alert) https://en.wikipedia.org/wiki/Man_of_the_Year_(2006_film))
The definition of utility is “the thing people maximize.”
Only applicable if you’re assuming the players are VNM-rational over outcome lotteries, which I’m not. Forget expected utility maximization.
It seems to me that people are making the question more complicated than it has to be, by projecting their assumptions about what a “game” is. We have payoff numbers describing how “good” each outcome is to each player. We have the strategy spaces, and the possible outcomes of the game. And here’s one approach: fix two response functions in this game, which are functions from strategy profiles to the player’s response strategy. With respect to the payoffs, how “aligned” are these response functions with each other?
This doesn’t make restrictive rationality assumptions. It doesn’t require getting into strange utility assumptions. Most importantly, it’s a clearly-defined question whose answer is both important and not conceptually obvious to me.
(And now that I think of it, I suppose that depending on your response functions, even in zero-sum games, you could have “A aligned with B”, or “B aligned with A”, but not both.)
> The definition of utility is “the thing people maximize.”
Only applicable if you’re assuming the players are VNM-rational over outcome lotteries, which I’m not. Forget expected utility maximization.
Then what’s the definition / interpretation of “payoff”, i.e. the numbers you put in the matrix? If they’re not utilities, are they preferences? How can they be preferences if agents can “choose” not to follow them? Where do the numbers come from?
Note that Vanessa’s answer doesn’t need to depend on uB, which I think is its main strength and the reason it makes intuitive sense. (And I like the answer much less when uB is used to impose constraints.)
I think I’ve been unclear in my own terminology, in part because I’m uncertain about what other people have meant by ‘utility’ (what you’d recover from perfect IRL / Savage’s theorem, or cardinal representation of preferences over outcomes?) My stance is that they’re utilities but that I’m not assuming the players are playing best responses in order to maximize expected utility.
How can they be preferences if agents can “choose” not to follow them?
Am I allowed to have preferences without knowing how to maximize those preferences, or while being irrational at times? Boltzmann-rational agents have preferences, don’t they? These debates have surprised me; I didn’t think that others tied together “has preferences” and “acts rationally with respect to those preferences.”
There’s a difference between “the agent sometimes makes mistakes in getting what it wants” and “the agent does the literal opposite of what it wants”; in the latter case you have to wonder what the word “wants” even means any more.
My understanding is that you want to include cases like “it’s a fixed-sum game, but agent B decides to be maximally aligned / cooperative and do whatever maximizes A’s utility”, and in that case I start to question what exactly B’s utility function meant in the first place.
I’m told that Minimal Rationality addresses this sort of position, where you allow the agent to make mistakes, but don’t allow it to be e.g. literally pessimal since at that point you have lost the meaning of the word “preference”.
(I kind of also want to take the more radical position where when talking about abstract agents the only meaning of preferences is “revealed preferences”, and then in the special case of humans we also see this totally different thing of “stated preferences” that operates at some totally different layer of abstraction and where talking about “making mistakes in achieving your preferences” makes sense in a way that it does not for revealed preferences. But I don’t think you need to take this position to object to the way it sounds like you’re using the term here.)
Hm. At first glance this feels like a “1” game to me, if they both use the “take the strictly dominant action” solution concept. The alignment changes if they make decisions differently, but under the standard rationality assumptions, it feels like a perfectly aligned game.
It’s a definitional thing. The definition of utility is “the thing people maximize.” If you set up your 2x2 game to have utilities in the payout matrix, then by definition both actors will attempt to pick the box with the biggest number. If you set up your 2x2 game with direct payouts from the game that don’t include phychic (eg “I just like picking the first option given”) or reputational effects, then any concept of alignment is one of:
assume the players are trying for the biggest number, how much will they be attempting to land on the same box?
alignment is completely outside of the game, and is one of the features of function that converts game payouts to global utility
You seem to be muddling those two, and wondering “how much will people attempt to land on the same box, taking into account all factors, but only defining the boxes in terms of game payouts.” The answer there is “you can’t.” Because people (and computer programs) have wonky screwed up utility functions (eg (spoiler alert) https://en.wikipedia.org/wiki/Man_of_the_Year_(2006_film))
Only applicable if you’re assuming the players are VNM-rational over outcome lotteries, which I’m not. Forget expected utility maximization.
It seems to me that people are making the question more complicated than it has to be, by projecting their assumptions about what a “game” is. We have payoff numbers describing how “good” each outcome is to each player. We have the strategy spaces, and the possible outcomes of the game. And here’s one approach: fix two response functions in this game, which are functions from strategy profiles to the player’s response strategy. With respect to the payoffs, how “aligned” are these response functions with each other?
This doesn’t make restrictive rationality assumptions. It doesn’t require getting into strange utility assumptions. Most importantly, it’s a clearly-defined question whose answer is both important and not conceptually obvious to me.
(And now that I think of it, I suppose that depending on your response functions, even in zero-sum games, you could have “A aligned with B”, or “B aligned with A”, but not both.)
Then what’s the definition / interpretation of “payoff”, i.e. the numbers you put in the matrix? If they’re not utilities, are they preferences? How can they be preferences if agents can “choose” not to follow them? Where do the numbers come from?
Note that Vanessa’s answer doesn’t need to depend on uB, which I think is its main strength and the reason it makes intuitive sense. (And I like the answer much less when uB is used to impose constraints.)
I think I’ve been unclear in my own terminology, in part because I’m uncertain about what other people have meant by ‘utility’ (what you’d recover from perfect IRL / Savage’s theorem, or cardinal representation of preferences over outcomes?) My stance is that they’re utilities but that I’m not assuming the players are playing best responses in order to maximize expected utility.
Am I allowed to have preferences without knowing how to maximize those preferences, or while being irrational at times? Boltzmann-rational agents have preferences, don’t they? These debates have surprised me; I didn’t think that others tied together “has preferences” and “acts rationally with respect to those preferences.”
There’s a difference between “the agent sometimes makes mistakes in getting what it wants” and “the agent does the literal opposite of what it wants”; in the latter case you have to wonder what the word “wants” even means any more.
My understanding is that you want to include cases like “it’s a fixed-sum game, but agent B decides to be maximally aligned / cooperative and do whatever maximizes A’s utility”, and in that case I start to question what exactly B’s utility function meant in the first place.
I’m told that Minimal Rationality addresses this sort of position, where you allow the agent to make mistakes, but don’t allow it to be e.g. literally pessimal since at that point you have lost the meaning of the word “preference”.
(I kind of also want to take the more radical position where when talking about abstract agents the only meaning of preferences is “revealed preferences”, and then in the special case of humans we also see this totally different thing of “stated preferences” that operates at some totally different layer of abstraction and where talking about “making mistakes in achieving your preferences” makes sense in a way that it does not for revealed preferences. But I don’t think you need to take this position to object to the way it sounds like you’re using the term here.)
Tabooing “aligned” what property are you trying to map on a scale of “constant sum” to “common payoff”?
Good question. I don’t have a crisp answer (part of why this is an open question), but I’ll try a few responses:
To what degree does player 1′s actions further the interests of player 2 within this normal form game, and vice versa?
This version requires specific response functions.
To what degree do the interests of players 1 and 2 coincide within a normal form game?
This feels more like correlation of the payout functions, represented as vectors.
So, given this payoff matrix (where P1 picks a row and gets the first payout, P2 picks column and gets 2nd payout):
5 / 0 ; 5 / 100
0 / 100 ; 0 / 1
Would you say P1′s action furthers the interest of player 2?
Would P2′s action further the interest of player 1?
Where would you rank this game on the 0 − 1 scale?
Hm. At first glance this feels like a “1” game to me, if they both use the “take the strictly dominant action” solution concept. The alignment changes if they make decisions differently, but under the standard rationality assumptions, it feels like a perfectly aligned game.