I’m not 100% sure I am understanding your terminology. What does it mean to “play stag against (stag,stag)” or to “defect against cooperate/cooperate”?
If your opponent is not in any sense a utility-maximizer then I don’t think it makes sense to talk about your opponent’s utilities, which means that it doesn’t make sense to have a payout matrix denominated in utility, which means that we are not in the situation of my second paragraph above (“The meaning generally assumed in game theory...”).
We might be in the situation of my last-but-two paragraph (“Or maybe we’re playing a game in which...”): the payouts might be something other than utilities. Dollars, perhaps, or just numbers written on a piece of paper. In that case, all the things I said about that situation apply here. In particular, I agree that it’s then reasonable to ask “how aligned is B with A’s interests?”, but I think this question is largely decoupled from the specific game and is more about the mapping from (A’s payout, B’s payout) to (A’s utility, B’s utility).
I guess there are cases where that isn’t enough, where A’s and/or B’s utility is not a function of the payouts alone. Maybe A just likes saying the word “defect”. Maybe B likes to be seen as the sort of person who cooperates. Etc. But at this point it feels to me as if we’ve left behind most of the simplicity and elegance that we might have hoped to bring by adopting the “two-player game in normal form” formalism in the first place, and if you’re prepared to consider scenarios where A just likes choosing the top-left cell in a 2x2 array then you also need to consider ones like the ones I described earlier in this paragraph—where in fact it’s not just the 2x2 payout matrix that matters but potentially any arbitrary details about what words are used when playing the game, or who is watching, or anything else. So if you’re trying to get to the essence of alignment by considering simple 2x2 games, I think it would be best to leave that sort of thing out of it, and in that case my feeling is that your options are (a) to treat the payouts as actual utilities (in which case, once again, I agree with Dagon and think all the alignment information is in the payout matrix), or (b) to treat them as mere utility-function-fodder, but to assume that they’re all the fodder the utility functions get (in which case, as above, I think none of the alignment information is in the payout matrix and it’s all in the payouts-to-utilities mapping), or (c) to consider some sort of iterated-game setup (in which case, I think you need to nail down what sort of iterated-game setup before asking how to get a measure of alignment out of it).
I’m not 100% sure I am understanding your terminology. What does it mean to “play stag against (stag,stag)” or to “defect against cooperate/cooperate”?
Let πi(σ)=σ′i be player i’s response function to strategy profile σ. Given some strategy profile (like stag/stag), player i selects a response. I mean “response” in terms of “best response”—I don’t necessarily mean that there’s an iterated game. This captures all the relevant “outside details” for how decisions are made.
If your opponent is not in any sense a utility-maximizer then I don’t think it makes sense to talk about your opponent’s utilities, which means that it doesn’t make sense to have a payout matrix denominated in utility
I don’t think I understand where this viewpoint is coming from. I’m not equating payoffs with VNM-utility, and I don’t think game theory usually does either—for example, the maxmin payoff solution concept does not involve VNM-rational expected utility maximization. I just identify payoffs with “how good is this outcome for the player”, without also demanding that πi always select a best response. Maybe it’s Boltzmann rational, or maybe it just always selects certain actions (regardless of their expected payouts).
or (b) to treat them as mere utility-function-fodder, but to assume that they’re all the fodder the utility functions get (in which case, as above, I think none of the alignment information is in the payout matrix and it’s all in the payouts-to-utilities mapping)
There exist two payoff functions. I think I want to know how impact-aligned one player is with another: how do the player’s actual actions affect the other player (in terms of their numerical payoff values). I think (c) is closest to what I’m considering, but in terms of response functions—not actual iterated games.
Sorry, I’m guessing this probably still isn’t clear, but this is the reply I have time to type right now and I figured I’d send it rather than nothing.
Sorry, I think I wasn’t clear about what I don’t understand. What is a “strategy profile (like stag/stag)”? So far as I can tell, the usual meaning of “strategy profile” is the same as that of “strategy”, and a strategy in a one-shot game of stag hunt looks like “stag” or “hare”, or maybe “70% stag, 30% hare”; I don’t understand what “stag/stag” means here.
----
It is absolutely standard in game theory to equate payoffs with utilities. That doesn’t mean that you have to do the same, of course, but I’m sure that’s why Dagon said what he did and it’s why when I was enumerating possible interpretations that was the first one I mentioned.
(The next several paragraphs are just giving some evidence for this; I had a look on my shelves and described what I found. Most detail is given for the one book that’s specifically about formalized 2-player game theory.)
“Two-Person Game Theory” by Rapoport, which happens to be the only book dedicated to this topic I have on my shelves, says this at the start of chapter 2 (titled “Utilities”):
So far nothing has been said about the nature of the payoffs. [...] It is even conceivable that a man playing Checkers with a child would rather lose than win. In that case a larger payoff must be assigned to his loss than to his win. [...] the game remains undefined if we do not know what payoff magnitudes are assigned by the players to the outcomes, even if the latter are specified in terms of monetary payoffs. However, this problem is bypassed by the game theoretician, who assumes that the payoffs are given.
Unfortunately, Rapoport is using the word “payoffs” to mean two different things here. I think it’s entirely clear from context, though, that his actual meaning is: you may begin by specifying monetary payoffs, but what we care about for game theory is payoffs as utilities. Here’s more from a little later in the chapter:
The given payoffs are assumed to reflect the psychological worth of the associated outcomes to the player in question.
A bit later:
When payoffs are specified on an interval scale [as opposed to an “ordinal scale” where you just say which ones are better than which other ones—gjm], they are called utilities.
and:
As has already been pointed out, these matters are not of concern to the game theoretician. His position is that if utility scales can be determined, then a theory of games can be built on a reliable foundation. If no such utility scale can be established with references to any real subjects, then game theory will not be relevant to the behaviour of people in either a normative or descriptive sense.
As I say, that’s the only book of formal game theory on my shelves. Schelling’s Strategy of Conflict has a little to say about such games, but not much and not in much detail, but it looks to me as if he assumes payoffs are utilities. The following sentence is informative, though it presupposes rather than stating: “But what configuration of value systems for the two participants—of the “payoffs”, in the language of game theory—makes a deterrent threat credible?” (This is from the chapter entitled “International Strategy”; in my copy it’s on page 13.)
Rapoport’s “Strategy and Conscience” isn’t a book of formal game theory, but it does discuss the topic, and it explicitly says: payoffs are utilities.
One chapter in Schelling’s “Choice and Consequence” is concerned with this sort of game theory; he says that the numbers you put in the matrix are either arbitrary things whose relative ordering is the only thing that matters, or numbers that behave like utilities in the sense that the players are trying to maximize their expectations.
The Wikipedia article on game theory says: “The payoffs of the game are generally taken to represent the utility of individual players.” (This is in the section about the use of game theory in economics and business. It does also mention applications in evolutionary biology, where the payoffs are fitnesses—which seem to me very closely analogous to utilities, in that what the evolutionary process stochastically maximizes is something like expected fitness.)
Again, I don’t claim that you have to equate payoffs with utilities; you can apply the formalism of game theory in any way you please! But I don’t think there’s any question that this is the usual way in which payoffs in a game matrix are understood.
----
It feels odd to me to focus on response functions, since as a matter of fact you never actually know the other player’s strategy. (Aside from special cases where your opponent is sufficiently deterministic and sufficiently simple that you can “read their source code” and make reliable predictions from it. There’s a bit of an LW tradition of thinking in those terms, but I think that with the possible exception of reasoning along the lines of “X is an exact copy of me and will therefore make the same decisions as I do” it’s basically never going to be relevant to real decision-making agents because the usual case is that the other player is about as complicated as you are, and you don’t have enough brainpower to understand your own brain completely.)
If you are not considering payouts to be utilities, then you need to note that knowing the other player’s payouts—which is a crucial part of playing this sort of game—doesn’t tell you anything until you also know how those payouts correspond to utilities, or to whatever else the other player might use to guide their decision-making.
(If you aren’t considering that they’re utilities but are assuming that higher is better, then for many purposes that’s enough. But, again, only if you suppose that the other player does actually act as someone would act who prefers higher payouts to lower ones.)
My feeling is that you will get most insight by adopting (what I claim to be) the standard perspective where payoffs are utilities; then, if you want to try to measure alignment, the payoff matrix is the input for your calculation. Obviously this won’t work if one or both players behave in a way not describable by any utility function, but my suspicion is that in such cases you shouldn’t necessarily expect there to be any sort of meaningful measure of how aligned the players are.
I’m not 100% sure I am understanding your terminology. What does it mean to “play stag against (stag,stag)” or to “defect against cooperate/cooperate”?
If your opponent is not in any sense a utility-maximizer then I don’t think it makes sense to talk about your opponent’s utilities, which means that it doesn’t make sense to have a payout matrix denominated in utility, which means that we are not in the situation of my second paragraph above (“The meaning generally assumed in game theory...”).
We might be in the situation of my last-but-two paragraph (“Or maybe we’re playing a game in which...”): the payouts might be something other than utilities. Dollars, perhaps, or just numbers written on a piece of paper. In that case, all the things I said about that situation apply here. In particular, I agree that it’s then reasonable to ask “how aligned is B with A’s interests?”, but I think this question is largely decoupled from the specific game and is more about the mapping from (A’s payout, B’s payout) to (A’s utility, B’s utility).
I guess there are cases where that isn’t enough, where A’s and/or B’s utility is not a function of the payouts alone. Maybe A just likes saying the word “defect”. Maybe B likes to be seen as the sort of person who cooperates. Etc. But at this point it feels to me as if we’ve left behind most of the simplicity and elegance that we might have hoped to bring by adopting the “two-player game in normal form” formalism in the first place, and if you’re prepared to consider scenarios where A just likes choosing the top-left cell in a 2x2 array then you also need to consider ones like the ones I described earlier in this paragraph—where in fact it’s not just the 2x2 payout matrix that matters but potentially any arbitrary details about what words are used when playing the game, or who is watching, or anything else. So if you’re trying to get to the essence of alignment by considering simple 2x2 games, I think it would be best to leave that sort of thing out of it, and in that case my feeling is that your options are (a) to treat the payouts as actual utilities (in which case, once again, I agree with Dagon and think all the alignment information is in the payout matrix), or (b) to treat them as mere utility-function-fodder, but to assume that they’re all the fodder the utility functions get (in which case, as above, I think none of the alignment information is in the payout matrix and it’s all in the payouts-to-utilities mapping), or (c) to consider some sort of iterated-game setup (in which case, I think you need to nail down what sort of iterated-game setup before asking how to get a measure of alignment out of it).
Let πi(σ)=σ′i be player i’s response function to strategy profile σ. Given some strategy profile (like stag/stag), player i selects a response. I mean “response” in terms of “best response”—I don’t necessarily mean that there’s an iterated game. This captures all the relevant “outside details” for how decisions are made.
I don’t think I understand where this viewpoint is coming from. I’m not equating payoffs with VNM-utility, and I don’t think game theory usually does either—for example, the maxmin payoff solution concept does not involve VNM-rational expected utility maximization. I just identify payoffs with “how good is this outcome for the player”, without also demanding that πi always select a best response. Maybe it’s Boltzmann rational, or maybe it just always selects certain actions (regardless of their expected payouts).
There exist two payoff functions. I think I want to know how impact-aligned one player is with another: how do the player’s actual actions affect the other player (in terms of their numerical payoff values). I think (c) is closest to what I’m considering, but in terms of response functions—not actual iterated games.
Sorry, I’m guessing this probably still isn’t clear, but this is the reply I have time to type right now and I figured I’d send it rather than nothing.
Sorry, I think I wasn’t clear about what I don’t understand. What is a “strategy profile (like stag/stag)”? So far as I can tell, the usual meaning of “strategy profile” is the same as that of “strategy”, and a strategy in a one-shot game of stag hunt looks like “stag” or “hare”, or maybe “70% stag, 30% hare”; I don’t understand what “stag/stag” means here.
----
It is absolutely standard in game theory to equate payoffs with utilities. That doesn’t mean that you have to do the same, of course, but I’m sure that’s why Dagon said what he did and it’s why when I was enumerating possible interpretations that was the first one I mentioned.
(The next several paragraphs are just giving some evidence for this; I had a look on my shelves and described what I found. Most detail is given for the one book that’s specifically about formalized 2-player game theory.)
“Two-Person Game Theory” by Rapoport, which happens to be the only book dedicated to this topic I have on my shelves, says this at the start of chapter 2 (titled “Utilities”):
Unfortunately, Rapoport is using the word “payoffs” to mean two different things here. I think it’s entirely clear from context, though, that his actual meaning is: you may begin by specifying monetary payoffs, but what we care about for game theory is payoffs as utilities. Here’s more from a little later in the chapter:
A bit later:
and:
As I say, that’s the only book of formal game theory on my shelves. Schelling’s Strategy of Conflict has a little to say about such games, but not much and not in much detail, but it looks to me as if he assumes payoffs are utilities. The following sentence is informative, though it presupposes rather than stating: “But what configuration of value systems for the two participants—of the “payoffs”, in the language of game theory—makes a deterrent threat credible?” (This is from the chapter entitled “International Strategy”; in my copy it’s on page 13.)
Rapoport’s “Strategy and Conscience” isn’t a book of formal game theory, but it does discuss the topic, and it explicitly says: payoffs are utilities.
One chapter in Schelling’s “Choice and Consequence” is concerned with this sort of game theory; he says that the numbers you put in the matrix are either arbitrary things whose relative ordering is the only thing that matters, or numbers that behave like utilities in the sense that the players are trying to maximize their expectations.
The Wikipedia article on game theory says: “The payoffs of the game are generally taken to represent the utility of individual players.” (This is in the section about the use of game theory in economics and business. It does also mention applications in evolutionary biology, where the payoffs are fitnesses—which seem to me very closely analogous to utilities, in that what the evolutionary process stochastically maximizes is something like expected fitness.)
Again, I don’t claim that you have to equate payoffs with utilities; you can apply the formalism of game theory in any way you please! But I don’t think there’s any question that this is the usual way in which payoffs in a game matrix are understood.
----
It feels odd to me to focus on response functions, since as a matter of fact you never actually know the other player’s strategy. (Aside from special cases where your opponent is sufficiently deterministic and sufficiently simple that you can “read their source code” and make reliable predictions from it. There’s a bit of an LW tradition of thinking in those terms, but I think that with the possible exception of reasoning along the lines of “X is an exact copy of me and will therefore make the same decisions as I do” it’s basically never going to be relevant to real decision-making agents because the usual case is that the other player is about as complicated as you are, and you don’t have enough brainpower to understand your own brain completely.)
If you are not considering payouts to be utilities, then you need to note that knowing the other player’s payouts—which is a crucial part of playing this sort of game—doesn’t tell you anything until you also know how those payouts correspond to utilities, or to whatever else the other player might use to guide their decision-making.
(If you aren’t considering that they’re utilities but are assuming that higher is better, then for many purposes that’s enough. But, again, only if you suppose that the other player does actually act as someone would act who prefers higher payouts to lower ones.)
My feeling is that you will get most insight by adopting (what I claim to be) the standard perspective where payoffs are utilities; then, if you want to try to measure alignment, the payoff matrix is the input for your calculation. Obviously this won’t work if one or both players behave in a way not describable by any utility function, but my suspicion is that in such cases you shouldn’t necessarily expect there to be any sort of meaningful measure of how aligned the players are.