Rohin Shah comments on Open problem: how can we quantify player alignment in 2x2 normal-form games?

Rohin Shah 22 Jun 2021 18:44 UTC
LW: 2 AF: 2
AF
> The definition of utility is “the thing people maximize.”
Only applicable if you’re assuming the players are VNM-rational over outcome lotteries, which I’m not. Forget expected utility maximization.
Then what’s the definition / interpretation of “payoff”, i.e. the numbers you put in the matrix? If they’re not utilities, are they preferences? How can they be preferences if agents can “choose” not to follow them? Where do the numbers come from?
Note that Vanessa’s answer doesn’t need to depend on $u_{B}$ , which I think is its main strength and the reason it makes intuitive sense. (And I like the answer much less when $u_{B}$ is used to impose constraints.)
- TurnTrout 22 Jun 2021 18:58 UTC
  LW: 4 AF: 4
  AF Parent
  I think I’ve been unclear in my own terminology, in part because I’m uncertain about what other people have meant by ‘utility’ (what you’d recover from perfect IRL / Savage’s theorem, or cardinal representation of preferences over outcomes?) My stance is that they’re utilities but that I’m not assuming the players are playing best responses in order to maximize expected utility.
  How can they be preferences if agents can “choose” not to follow them?
  Am I allowed to have preferences without knowing how to maximize those preferences, or while being irrational at times? Boltzmann-rational agents have preferences, don’t they? These debates have surprised me; I didn’t think that others tied together “has preferences” and “acts rationally with respect to those preferences.”
  - Rohin Shah 22 Jun 2021 21:52 UTC
    LW: 4 AF: 3
    AF Parent
    There’s a difference between “the agent sometimes makes mistakes in getting what it wants” and “the agent does the literal opposite of what it wants”; in the latter case you have to wonder what the word “wants” even means any more.
    My understanding is that you want to include cases like “it’s a fixed-sum game, but agent B decides to be maximally aligned / cooperative and do whatever maximizes A’s utility”, and in that case I start to question what exactly B’s utility function meant in the first place.
    I’m told that Minimal Rationality addresses this sort of position, where you allow the agent to make mistakes, but don’t allow it to be e.g. literally pessimal since at that point you have lost the meaning of the word “preference”.
    (I kind of also want to take the more radical position where when talking about abstract agents the only meaning of preferences is “revealed preferences”, and then in the special case of humans we also see this totally different thing of “stated preferences” that operates at some totally different layer of abstraction and where talking about “making mistakes in achieving your preferences” makes sense in a way that it does not for revealed preferences. But I don’t think you need to take this position to object to the way it sounds like you’re using the term here.)