It helped move a bunch of my thinking away from “people are defecting“ to “it’s hard to get people to coordinate around a single vision when it’s risky and their alternatives look good”.
This makes it sound like the key difference is that defecting is antisocial (or something), while stag hunts are more of an understandable tragic outcome for well-intentioned people.
I think this is a bad way of thinking about defection: PD-defection can be prosocial, and cooperation can be antisocial/harmful/destructive.
I worry this isn’t made clear enough in the OP? E.g.:
“Defect” feels like it brings in connotations that aren’t always accurate.
If “Defect” has the wrong connotations, that seems to me like a reason to pick a different label for the math, rather than switching to different math. The math of PDs and of stag hunts doesn’t know what connotations we’re ascribing, so we risk getting very confused if we start using PDs and/or stag hunts as symbols for something that only actually exists in our heads (e.g., ‘defection in Prisoner’s Dilemmas is antisocial because I’m assuming an interpretation that isn’t in the math’).
This isn’t to say that I think stag hunts aren’t useful to think about; but if a post about this ended up in the Review I’d want it to be very clear and precise about distinguishing a game’s connotations from its denotations, and ideally use totally different solutions for solving ‘this has the wrong connotation’ and ‘this has the wrong denotation’.
(I haven’t re-read the whole post recently, so it’s possible the post does a good job of this and I just didn’t notice while skimming.)
If “Defect” has the wrong connotations, that seems to me like a reason to pick a different label for the math, rather than switching to different math.
I think that this is often an issue of differing beliefs among the players and different weightings over player payoffs. In What Counts as Defection?, I wrote:
Informal definition. A player defects when they increase their personal payoff at the expense of the group.
I went on to formalize this as
Definition. Player i’s action a∈Ai is a defection against strategy profile s and weighting (αj)j=1,…,n if
Personal gain: Pi(a,s−i)>Pi(si,s−i)
Social loss: ∑jαjPj(a,s−i)<∑jαjPj(si,s−i)
Under this model, this implies two potential sources of disagreement about defections:
Disagreement in beliefs. You think everyone agreed to hunt stag, I’m not so sure; I hunt rabbit, and you say I defected; I disagree. Under your beliefs (the strategy profile s you thought we’d agreed to follow), it was a defection. You thought we’d agreed on the hunt-stag profile. In fact, it was worse than a defection, because there wasn’t personal gain for me—I just sabotaged the group because I was scared (condition 2 above).
Under my beliefs, it wasn’t a defection—I thought it was quite unlikely that we would all hunt stag, and so I salvaged the situation by hunting rabbit.
Disagreement in weighting (αj). There might be an implicit social contract—if we both did half the work on a project, it would be a defection for me to take all of the credit. But if there’s no implicit agreement and we’re “just” playing a constant-sum game, that would just be me being rational. Tough luck, it’s a tough world out there in those normal-form games!
Speculation: This explains why it can sometimes feel rightto defect in PD. This need not be because of our “terminal values” agreeing with the other player (i.e. I’d feel bad if Bob went to jail for 10 years), but where the rightness is likely judged by the part of our brain that helps us “be the reliable kind of person with whom one can cooperate” by making us feel bad for transgressions/defections, even against someone with orthogonal terminal values. If there’s no (implicit) contract, then that bad feeling might not pop up.
I think this explains (at least part of) why defection in stag hunt can “feel different” than defection in PD.
(I think the general principle of “try during the review to holistically review sequences/followups/concepts” makes sense. But I still feel confused about how to actually operationalize that such that the process is clear and outputs a coherent product)
In this case, we really do prefer the outcome (D, C) to the outcome (C, C), leaving aside the actions that produced it. We would vastly rather live in a universe where 3 billion humans were cured of their disease and no paperclips were produced, rather than sacrifice a billion human lives to produce 2 paperclips. It doesn’t seem right to cooperate, in a case like this. It doesn’t even seem fair—so great a sacrifice by us, for so little gain by the paperclip maximizer? And let us specify that the paperclip-agent experiences no pain or pleasure—it just outputs actions that steer its universe to contain more paperclips. The paperclip-agent will experience no pleasure at gaining paperclips, no hurt from losing paperclips, and no painful sense of betrayal if we betray it.
What do you do then? Do you cooperate when you really, definitely, truly and absolutely do want the highest reward you can get, and you don’t care a tiny bit by comparison about what happens to the other player? When it seems right to defect even if the other player cooperates?
That’s what the payoff matrix for the true Prisoner’s Dilemma looks like—a situation where (D, C) seems righter than (C, C).
I cooperate, Hitler cooperates: 5 million people die.
I cooperate, Hitler defects: 50 million people die.
I defect, Hitler cooperates: 0 people die.
I defect, Hitler defects: 8 million people die.
Obviously in this case the defect-defect equilibrium isn’t optimal; if there’s a way to get a better outcome, go for it. But equally obviously, cooperating isn’t prosocial; the cooperate-cooperate equilibrium is far from ideal, and hitting ‘cooperate’ unconditionally is the worst of all possible strategies.
TurnTrout’s informal definition of ‘defection’ above looks right to me, where “a player defects when they increase their personal payoff at the expense of the group.” My point is that when people didn’t pick the same choice as me, I was previously modeling it using prisoners’ dilemma, but this was inaccurate because the person wasn’t getting any personal benefit at my expense. They weren’t taking from me, they weren’t free riding. They just weren’t coordinating around my stag. (And for my own clarity I should’ve been using the stag hunt metaphor
I can try to respond to your (correct) points about the True Prisoners’ Dilemma, but I don’t think they’re cruxy for me. I understand that defecting and cooperating in the PD don’t straightforwardly reflect the associations of their words, but they sometimes do. Sometimes, defecting is literally tricking someone into thinking you’re cooperating and then stealing their stuff. The point I’m making about stag hunts is that this element is entirely lacking — there’s no ability for them to gain from my loss, and bringing it into the analysis is in many places muddying the waters of what’s going on. And this reflects lots of situations I’ve been in, where because of the game theoretic language I was using I was causing myself to believe there was some level of betrayal I needed to model, where there largely wasn’t.
They weren’t taking from me, they weren’t free riding. They just weren’t coordinating around my stag.
I’d like to note that with respect to my formal definition, defection always exists in stag hunts (if the social contract cares about everyone’s utility equally); see Theorem 6:
What’s happening when P(Stag1)<12 is: Player 2 thinks it’s less than 50% probable that P1 hunts stag; if P2 hunted stag, expected total payoff would go up, but expected P2-payoff would go down (since some of the time, P1 is hunting hares while P2 waits alone near the stags); therefore, P2 is tempted to hunt hare, which would be classed as a defection.
If that’s a stag hunt then I don’t know what a stag hunt is. I would expect a stag hunt to have (2,0) in the bottom left corner and (0,2) in the top right, precisely showing that player two gets no advantage from hunting hare if player one hunts stag (and vice versa).
T >= P covers both the case where you’re indifferent as to whether or not they hunt hare when you do (the =) and the case where you’re better off as the only hare hunter (the >); so long as R > T, both cases have the important feature that you want to hunt stag if they will hunt stag, and you want to hunt hare if they won’t hunt stag.
The two cases (T>P and T=P) end up being the same because if you succeed at tricking them into hunting stag while you hurt hare (because T>P, say), then you would have done even better by actually collaborating with them on hunting stag (because R>T).
I think this is indeed a failure of the post (the concern you raise was on my mind recently as I thought through more recent posts on game theory).
I think TurnTrout successfully gets at some of the details of what’s actually going on. I’m not 100% sure how to rewrite the post to make it’s primarily point clearly and accurately without getting bogged down in lots of detailed caveats, but I agree there is some problem to solve here if the post were to pass review.
Maybe provide an example of a prisoner’s dilemma with Clippy, and a stag hunt with Clippy, to distinguish those cases from ‘games with other humans whose values aren’t awful by my lights’? This could also better clarify for the reader what the actual content of the games is.
This makes it sound like the key difference is that defecting is antisocial (or something), while stag hunts are more of an understandable tragic outcome for well-intentioned people.
I think this is a bad way of thinking about defection: PD-defection can be prosocial, and cooperation can be antisocial/harmful/destructive.
I worry this isn’t made clear enough in the OP? E.g.:
If “Defect” has the wrong connotations, that seems to me like a reason to pick a different label for the math, rather than switching to different math. The math of PDs and of stag hunts doesn’t know what connotations we’re ascribing, so we risk getting very confused if we start using PDs and/or stag hunts as symbols for something that only actually exists in our heads (e.g., ‘defection in Prisoner’s Dilemmas is antisocial because I’m assuming an interpretation that isn’t in the math’).
See also zulupineapple’s comment.
This isn’t to say that I think stag hunts aren’t useful to think about; but if a post about this ended up in the Review I’d want it to be very clear and precise about distinguishing a game’s connotations from its denotations, and ideally use totally different solutions for solving ‘this has the wrong connotation’ and ‘this has the wrong denotation’.
(I haven’t re-read the whole post recently, so it’s possible the post does a good job of this and I just didn’t notice while skimming.)
I think that this is often an issue of differing beliefs among the players and different weightings over player payoffs. In What Counts as Defection?, I wrote:
I went on to formalize this as
Under this model, this implies two potential sources of disagreement about defections:
Disagreement in beliefs. You think everyone agreed to hunt stag, I’m not so sure; I hunt rabbit, and you say I defected; I disagree. Under your beliefs (the strategy profile s you thought we’d agreed to follow), it was a defection. You thought we’d agreed on the hunt-stag profile. In fact, it was worse than a defection, because there wasn’t personal gain for me—I just sabotaged the group because I was scared (condition 2 above).
Under my beliefs, it wasn’t a defection—I thought it was quite unlikely that we would all hunt stag, and so I salvaged the situation by hunting rabbit.
Disagreement in weighting (αj). There might be an implicit social contract—if we both did half the work on a project, it would be a defection for me to take all of the credit. But if there’s no implicit agreement and we’re “just” playing a constant-sum game, that would just be me being rational. Tough luck, it’s a tough world out there in those normal-form games!
Speculation: This explains why it can sometimes feel right to defect in PD. This need not be because of our “terminal values” agreeing with the other player (i.e. I’d feel bad if Bob went to jail for 10 years), but where the rightness is likely judged by the part of our brain that helps us “be the reliable kind of person with whom one can cooperate” by making us feel bad for transgressions/defections, even against someone with orthogonal terminal values. If there’s no (implicit) contract, then that bad feeling might not pop up.
I think this explains (at least part of) why defection in stag hunt can “feel different” than defection in PD.
I’m mulling this over in the context of “How should the review even work for concepts that have continued to get written-on since 2018?”. I notice that the ideal Schelling Choice is Rabbit post relies a bit on both “What Counts as Defection” and “Most prisoner’s dilemmas are stag hunts, most stag hunts are battles of the sexes”, which both came later.
(I think the general principle of “try during the review to holistically review sequences/followups/concepts” makes sense. But I still feel confused about how to actually operationalize that such that the process is clear and outputs a coherent product)
The True Prisoner’s Dilemma seems useful to link here.
But defecting is anti-social. You’re gaining from my loss. I tried to do something good for us both and then you basically stole some of my resources.
See The True Prisoner’s Dilemma. Suppose I’m negotiating with Hitler, and my possible payoffs look like:
I cooperate, Hitler cooperates: 5 million people die.
I cooperate, Hitler defects: 50 million people die.
I defect, Hitler cooperates: 0 people die.
I defect, Hitler defects: 8 million people die.
Obviously in this case the defect-defect equilibrium isn’t optimal; if there’s a way to get a better outcome, go for it. But equally obviously, cooperating isn’t prosocial; the cooperate-cooperate equilibrium is far from ideal, and hitting ‘cooperate’ unconditionally is the worst of all possible strategies.
TurnTrout’s informal definition of ‘defection’ above looks right to me, where “a player defects when they increase their personal payoff at the expense of the group.” My point is that when people didn’t pick the same choice as me, I was previously modeling it using prisoners’ dilemma, but this was inaccurate because the person wasn’t getting any personal benefit at my expense. They weren’t taking from me, they weren’t free riding. They just weren’t coordinating around my stag. (And for my own clarity I should’ve been using the stag hunt metaphor
I can try to respond to your (correct) points about the True Prisoners’ Dilemma, but I don’t think they’re cruxy for me. I understand that defecting and cooperating in the PD don’t straightforwardly reflect the associations of their words, but they sometimes do. Sometimes, defecting is literally tricking someone into thinking you’re cooperating and then stealing their stuff. The point I’m making about stag hunts is that this element is entirely lacking — there’s no ability for them to gain from my loss, and bringing it into the analysis is in many places muddying the waters of what’s going on. And this reflects lots of situations I’ve been in, where because of the game theoretic language I was using I was causing myself to believe there was some level of betrayal I needed to model, where there largely wasn’t.
I’d like to note that with respect to my formal definition, defection always exists in stag hunts (if the social contract cares about everyone’s utility equally); see Theorem 6:
What’s happening when P(Stag1)<12 is: Player 2 thinks it’s less than 50% probable that P1 hunts stag; if P2 hunted stag, expected total payoff would go up, but expected P2-payoff would go down (since some of the time, P1 is hunting hares while P2 waits alone near the stags); therefore, P2 is tempted to hunt hare, which would be classed as a defection.
If that’s a stag hunt then I don’t know what a stag hunt is. I would expect a stag hunt to have (2,0) in the bottom left corner and (0,2) in the top right, precisely showing that player two gets no advantage from hunting hare if player one hunts stag (and vice versa).
Also note that you can indeed make that entry (2,0) by subtracting 1 from every payoff in the game. The same arguments still hold.
T >= P covers both the case where you’re indifferent as to whether or not they hunt hare when you do (the =) and the case where you’re better off as the only hare hunter (the >); so long as R > T, both cases have the important feature that you want to hunt stag if they will hunt stag, and you want to hunt hare if they won’t hunt stag.
The two cases (T>P and T=P) end up being the same because if you succeed at tricking them into hunting stag while you hurt hare (because T>P, say), then you would have done even better by actually collaborating with them on hunting stag (because R>T).
I see, thx.
I think this is indeed a failure of the post (the concern you raise was on my mind recently as I thought through more recent posts on game theory).
I think TurnTrout successfully gets at some of the details of what’s actually going on. I’m not 100% sure how to rewrite the post to make it’s primarily point clearly and accurately without getting bogged down in lots of detailed caveats, but I agree there is some problem to solve here if the post were to pass review.
Maybe provide an example of a prisoner’s dilemma with Clippy, and a stag hunt with Clippy, to distinguish those cases from ‘games with other humans whose values aren’t awful by my lights’? This could also better clarify for the reader what the actual content of the games is.