The continued misuse of the Prisoner’s Dilemma
Related to: The True Prisoner’s Dilemma, Newcomb’s Problem and Regret of Rationality
In The True Prisoner’s Dilemma, Eliezer Yudkowsky pointed out a critical problem with the way the Prisoner’s Dilemma is taught: the distinction between utility and avoided-jail-time is not made clear. The payoff matrix is supposed to represent the former, even as its numerical values happen to coincidentally match the latter. And worse, people don’t naturally assign utility as per the standard payoff matrix: their compassion for the friend in the “accomplice” role means they wouldn’t feel quite so good about a “successful” backstabbing, nor quite so bad about being backstabbed. (“Hey, at least I didn’t rat out a friend.”)
For that reason, you rarely encounter a true Prisoner’s Dilemma, even an iterated one. The above complications prevent real-world payoff matrices from working out that way.
Which brings us to another unfortunate example of this misunderstanding being taught.
Recently, on the New York Times’s “Freakonomics” blog, Professor Daniel Hamermesh gleefully recounts a recent experiment he performed (which he says he does often) on students in his intro economics course, which is basically the same as the Prisoner’s Dilemma (henceforth, PD).
Now, before going further, let me make clear that Hamermesh is no small player. Just take a look at all the accolades and accomplishments listed on his Wikipedia page or his university page CV. So, this is a teaching of a professor at the top of his field, so it’s only with hesitation that I proceed further to allege that he’s Doing It Wrong
Hamermesh’s variant of the PD is to pick eight students and auction off a $20 bill to them, with the money split evenly across the winners if there are multiple highest bids. Here, cooperation corresponds to adhering to a conspiracy where everyone agrees to make the same low bid and thus a big profit. Defecting corresponds to breaking the agreement and making a slightly higher bid so you can take everything for yourself. If the others continue to cooperate, their bid is lower and they get nothing.
Here is how Hamermesh describes the result (italics mine, bold in the original):
Today seven of the students stuck to the collusive agreement, and each bid $.01. They figured they would split the $20 eight ways, netting $2.49 each. Ashley, bless her heart, broke the agreement, bid $0.05, and collected $19.95. The other 7 students booed her, but I got the class to join me in applauding her, as she was the only one who understood the game.
The game? Which game? There’s more than one game going on here! There’s the neat little well-defined, artificial setup that Professor Hamermesh has laid out. On top of that, there’s the game we better know as “life”, in which the later consequences of superficially PD-like scenarios cause us to assign different utilities to successful backstabbing (defecting when others cooperate). There’s also the game of becoming the high-status professor’s Wunderkind. And while Ashley (whose name he bolded for some reason) may have won the narrow, artificial game, she also told everyone there that, “Trusting me isn’t such a good idea.” In other words, the kind of consequence we normally worry about in our everyday lives.
For this reason, I left the following comment:
No, she learned how to game a very narrow instance of that type of scenario, and got lucky that someone else didn’t bid $0.06.
Try that kind of thing in real life, and you’ll get the social equivalent of a horse’s head in your bed.
Incidentally, how many friends did Ashley make out of this event?
I probably came off as more “anticapitalist” or “collectivist” than I really am, but the point is important: betraying your partners has long-term consequences which aren’t apparent when you only look at the narrow version of this game.
Hamermesh’s point was actually to show the difficulty of collusion in a free market. However, to the extent that markets can pose barriers to collusion, it’s certainly not because going back on your word will consistently work out in just the right way as to divert a huge amount of utility to yourself—which happens to be the very reason Ashley “succeded” (with the professor’s applause) in this scenario. Rather, it’s because the incentives for making such agreements fundamentally change; you are still better off maintaining a good reputation.
Ultimately, the students learned the wrong lesson from an unrealistic game.
- Morality as Parfitian-filtered Decision Theory? by 30 Aug 2010 21:37 UTC; 32 points) (
- 21 Dec 2011 23:31 UTC; 5 points) 's comment on Presents for impoving rationality or reducing superstition? by (
EDIT: as suggested I have turned most of this comment, somewhat expanded, into a post.
The entire point of experiential learning, which is what you set up to happen when you have students play a game—as opposed to telling them about a game—is that there is no “right” or “wrong” lesson to be taken from it.
...see linked post for fuller argument...
If Hamermesh is to be faulted for something, it is for (apparently) imposing on the students his own conclusions from a given outcome, as opposed to letting the students figure out for themselves what the outcome means.
Agree completely. I wonder why your comment isn’t upvoted to +10. Applauding the defector in PD is a weird thing to do for a professor anyway.
Possibly related is Nassim Nicholas Taleb’s concept of the ludic fallacy: “the person who is assuming a tightly-constrained game will emerge as the loser”.
I’m considering a top-level post (my first) on experiential games and a little background on how they might be worthwhile for LWers—there have been a few reports of experiences, such as the estimation/calibration game at one meetup, but I’m feeling that a little detail on the constructivist approach and practical advice on how to set up such games might be useful.
I use experiential games quite a bit; one that I remember fondly from a few years ago was adapted from Dietrich Doerner’s /The Logic of Failure/ - the one where you are to control a thermostat. Doerner’s account of people actually playing the game is enlightening, with many irrational reactions on display. But reading about it is one thing, and actually playing the game quite another, so I got a group of colleagues together and we gave it a try. By the reports of all involved it was one of the most effective learning experiences they’d had.
An experiential learning game focusing on the basics of Bayesian reasoning might be a valuable design goal for this community—one I’d definitely have an interest in playing.
By all means write it, this stuff sounds very interesting.
Possibly related are the PCT demo games mentioned on LW before. I imagine a Bayesian learning game to be similar in spirit (better implement it in Flash rather than Java, though). Also tangentially related are the cognitive testing games.
Wait, you mean he let them conspire and they didn’t set up explicit [monetary] penalties for breaking the agreement? Everybody fails.
I have been in Ashley’s situation—roped in to play a similar parlour game to demonstrate game theory in action.
In my case it was in a work setting: part of a two day brainstorming / team building boondongle.
In my game there were five tables each with eight people, all playing the same, iterarted game.
In four out of five table every single person cooperated in every single iteration—including the first and last one. On the fifth table they got confused about the rules.
The reason for the behaviour was clear—the purpose of the game was to demonstrate that cooperation increased the total size of the pot (the game was structered that way). In a workplace setting the prize was to win the approbation of the trainers and managers, by demonstrating that we were teamplayers, and certainly NOT to be the asshole who cheated his tablemates and walked off with $50.
On the the fifth table they managed to confuse themselves such that on the first iteration two of them unwittingly defected. Their table therefore ended up with the least money, but the two individuals of course ended up the richest in the room—they were hideously embarrassed.
I was left wondering what amount of money it would have taken to change behaviour. Would people defect if there was $1000 at stake? In that setting, I think still not. $10,000? $100,000 ?
Practical game-theory experiments would be quite expensive to run, I think.
Pretending to not understand the game and acting embarrassed in order to defect without social consequences seems like a pretty good strategy to me.
I’m reminded of a real-world similar example: World of Warcraft loot ninjas.
Background: when a good item drops in a dungeon, each group member is presented with two buttons, a die icon (“need”) and a pile-of-gold icon (“greed”). If one or more people click “need”, the server rolls a random 100-sided die for each player who clicked “need”, and the player with the highest roll wins the item. If no one in the group clicked “need”, then the server rolls dice for everyone in the group. Usually players enter dungeons in the hopes of obtaining items that directly improve their combat effectiveness, but many items can also be sold at the in-game auction house, sometimes for a substantial amount of gold, so that a character can still benefit indirectly even if the item itself has no immediate worth.
As you can imagine, “pick-up groups” (i.e. four random strangers you might never party with again) often suffer from loot ninjas: people who intentionally click on the “need” button to vastly improve their odds of obtaining items, even when the item holds no direct value for themselves but does hold direct value for another party member.
And, indeed, a common loot ninja strategy is to feign ignorance of the “need versus greed” loot roll system (which, to be fair, has legitimately confusing icons) and to use every other possible trick to elicit sympathy, such as feigning bad spelling and grammar, for as long as possible before being booted from the party and forcibly expelled from the dungeon.
Alternately, they learned more about finding the balance between maintaining peer alliances and gaining the favour of the ruler (guessing the teacher’s password). This is the very essence of the courtier. Even if the students don’t fully comprehend that message I am confident that their intuitions are lapping it up.
On a note more explicitly related to economics they gained insight into using anti-competitive practices needed to get ahead (in this case public shaming) while avoiding crossing the line that triggers adverse social sanction (professorial intervention.)
Whichever way you look at it, Ashley won this game. Of course, many other students with less aggressive or less aware professors have lost by taking the same actions. Including some in classes which I have attended!
I think it’s odd that he would say that only Ashley understood the game, not because she may actually be the loser in the wider scheme of things, but because the relevance of the Prisoner’s Dilemma is that is actually supposed to be a dilemma. His saying only her action showed understanding suggests he doesn’t think it’s a real dilemma at all. He thinks it’s a question with an answer: defect.
It isn’t the prisoner’s dilemma and Hamerish did not describe it as such. It is similar to the Prisoner’s Dilemma in as much as, well it is to do with game theory and people could cooperate. The title of this post is a misuse of ‘Prisoner’s Dilemma’.
It is completely standard to refer to a wide class of problems as PD. This example is much closer than most examples.
It is a completely standard mistake to refer to just about anything game theoretic as ‘Prisoner’s Dilemma’. In this instance, there are several elements that are neither newcomblike nor Prisoner’s Dilemmaish. When one adds all the necessary assumptions and limitations to this problem to make the decision one particular agent faces analogous to a Prisoner’s Dilemma one does not find that $0.05 is equivalent to ‘defect’. The judgement required to reach that decision requires far more insight than a defection. When Hamerish said Ashley understood the game he was not saying “Ashley chose to defect which is the correct response to the Prisoner’s not-dilemma”.
Mind you, Neil makes a good point. He just happens to be making false claims about what a Professor believes because he has been fed a false premise. I don’t like being misrepresented and I particularly don’t like it when this misrepresentation makes me look naive. If we go around saying things that are not true out of negligence then this is what we can expect to happen.
It doesn’t need to be. The mapping to the PD here is that defection is continuous rather than binary. It generalizes the concept of defection in the canonical PD so that you can choose a level of defection, and the most “defective” (!) person, if they aren’t equal, diverts utility to him/herself at the expense of the other players.
Just like how in the standard PD, a defection when the other player doesn’t will divert utility to yourself.
In the PD increasing defection level from 0 to 1 never lowers utility. In this game increasing what you call the continuous measure of defection always lowers utility except when your defection is the largest.
There’s a deeper similarity to the PD and I explained it in the original post.
We cannot draw conclusions about whether Hamerish believes the Prisoner’s Dilemma is a dilemma just because one element of the game he described is the potential for collusion.
Since a bid’s winningness is contingent on other bids you can’t use winning as a proxy for understanding. If they all thought and acted like Ashley and broke the pact with 5 cent bids would they all have got a round of applause for their great insight in bidding 5 cents?
Win isn’t an answer. It’s like somebody asking “where’s the Central station?” getting the answer “just find it”.
No, it’s like saying “Alison found the Central Station! Well done!”
As a group, they’d get more money appointing just one person to bid $0.01, and splitting it after the fact.
The rules of the game forbid that.
You mean as a group they would have gotten the exact same amount of money as they in fact did.
Maybe we should call this the “socialist fallacy”—confusing a group’s total benefit with the “equality” of the outcome for the group’s members.
No, I don’t think that’s what he means. There’s an ambiguity in what is meant by “joint bid” in:
If seven people bid $0.01, does the prof take $0.01 for his $20, or does he take $0.07?
I noticed early on that the problem was ambiguous in this respect. Fortunately, for the point made, about the gains from cooperation and defection, it doesn’t matter: all you need is that it’s possible to share in larger gains by cooperating, unless someone defects, and the professor’s reaction to what happened.
No, that’s not what he means. From the Freakonomics blog post:
So as a group, if seven people bid $0.01, they split $19.93, whereas if under JamesAndrix’s scheme, they’d net $19.99.
You’re calling out the professor for not addressing the larger game of “life” but this post itself seems to be denying Ashley the opportunity to play the larger game of life at all.
For example, Ashley’s demonstration also surely had some gross, if not necessarily net, benefit to her reputation—she showed everyone she is clever, that she can get the approval of the professor, etc.
Ashley may have left class and spent $17.45 neutralizing those hits to her reputation (distribute the money after class, buy everyone a beer, etc.). She would have netted $0.01 more doing this than cooperating.
You’re letting Ashley accrue costs in the larger game, but not letting her accrue benefit in the larger game, which doesn’t seem fair to Ashley or Hamermesh.
Did you miss the part about it being a 500-student class?
I meant the 7 people she “beat” at the game. Besides, have some faith in Ashley’s ability to find a really, really good price-to-performance ratio on reputational gains!
But if she redistributed enough to her friends to make up for what she took, that would have been the original agreement they had made in the first place!
My entire point was that I think it’s possible for Ashley to use her gains from defecting in the PD to more than offset her real-life reputational costs.
Do you disagree with my statement that it is possible to do that?
Yes, I do disagree, because any later redistribution to them will either be a) less than what they would have gotten in the original deal, or b) the same as what it would have been if she had just stuck to the disagreement.
Plus, undoing damage to your reputation is much harder than doing the damage.
This professor, who has no doubt debated game theory with many other professors and countless students making all kinds of objections, gets three paragraphs in this article to make a point. Based on this, you figure that the very simple objection that you’re making is news to him?
One thing that concerns me about LW is that it often seems to operate in a vacuum, disconnected from mainstream discourse.
Yes, like I said, given Hamermesh’s credential’s, I didn’t want to jump to any hasty conclusion.
However, professional game theorists do in fact get deceived by the supposed textbook correctness of their conclusions. That’s why I linked the previous Regret of Rationality, which goes over why being “reasonable” and winning so sharply diverge. It’s also part of why no one ever wins the “guess a third of the average guess” by guessing zero, despite its correctness proof.
If Hamermesh did have some understanding of the issues I raised, it would have taken him very little—even within the bounds of three paragraphs—to make it clear. Just a simple “But Ashley may not get invited to many parties after this” would have sufficed.
But not only did Hamermesh not make such an acknowledgement, you can see from his tone that he quite clearly believes there is “a” correct way to play for that scenario, irrespective of what metagames it might be embedded in.
The fact that students booed doesn’t seem to have registered as a relevant piece of evidence to him, in its significance to other games that might be going on.
Finally, the only reason Ashley walked away with any money at all is because she happened to get lucky that someone else didn’t defect with $0.06, or $0.07, or …, which she had no way of knowing wouldn’t happen. So what’s the skill he’s rewarding here?
As soon as he said “she was the only one who understood the game,” I wondered whether he really understood the game, broadly construed.
Especially if we imagine some reasonable distribution of other players’ actions, breaking from cooperation only benefits the player when 1) you bid the highest and therefore receive the money, or 2) everybody (or a majority) breaks cooperation and you don’t look bad for doing so.
Even then, 1) is only good if the money outweighs the social costs. Heck, there might be future financial costs, if they play cooperation games in the future in that class!
Btw, they bold every name on the Freakonomics blog, at least the first time they are said in a particular post.
Oh. Oops. Guess I laid it on a little too thick there :-P
Don’t worry, I still voted you up, even after such an egregious error ;)
This is one of the purest examples I have seen in a while of argument from authority, congratulations!
If he’s really on top of the situation, why did he say the equilibrium was $17.50? Obviously this isn’t an equilibrium, since anybody wins by defecting to $17.51. The equilibria are $19.99 and $20.00.
Seconded.
This is actually the real meaning of “selfishness.” It is in my own best interest to do things for the community.
The mantras of collectivists and anti-capitalists seem to either not realize or ignore the fact that greedy people aren’t really doing things in their own best interest if they are making enemies in the process.
I had my intro Ethics students play an anonymous Prisoner’s Dilemma with candy earlier this week—two one-shot, one iterated thrice. Although they didn’t know who their own partners were, I had no good way to conceal who got no candy because someone had defected to their cooperation, who walked away with ten pieces looking smug, who had to settle for two, and who got five for mutual cooperation. This didn’t appear to influence their behavior at all—actually, apart from the one star student who chose to attend that day and some of the people who managed consistent cooperation during the iteration, none of them looked like they had much of a strategy, even though most of them seemed motivated by the candy. I guess $20 is a larger payoff than the amounts of candy I was working with, but this being a game and the payoffs coming from without (i.e. they aren’t managing resources they already have, but negotiating the split of a non-player’s donation), it doesn’t seem likely that there would be too much long-term animosity over it except in choosing how to behave with future games of the same type.
Do you mean that they played randomly, or that they defected without articulating why?
Some of them seemed to be playing randomly. Some of them decided that they didn’t like the game (too hard to understand, they weren’t getting enough candy, whatever) and cooperated in spite of partner defection as a way of checking out of the game. One guy didn’t even want to know what his partner had done last time during the iteration, he just defected every time—I guess that could be called a strategy, especially since he wound up with a randomly-playing partner that time.
Thanks. So they saw the game as another nuisance that the teachers thought up… As my game theory book says, “there’s really no point in playing poker except for sums of money it would really hurt to lose”.
I didn’t think that asking them all to put up cash would have gone over well, or I might have tried it. Besides, I got reimbursed for the candy and got to keep the leftovers.
I read the article, also. The description of the game was a bit short and somewhat ambiguous.
The game is designed to show people who participate why it is hard to maintain collusion or price fixing amongst oligopolies, secret agreements are not enough. It was a good demonstration of the difficulties in maintaining a secret deal. Far better than simply reading about it.
A number of theorists think that price fixing is mystery because the economics of it should make any agreements disappear.
However, there are price fixings in the real world which are regularly prosecuted. So, how are the Ashley’s dealt with by those groups?
Fortunately, I just finished reading Our customers are the enemy, a study of cartels in the ’80s and ’90s, so I can tell you!
Cartels have a number of ways, but the illegal ones have the most problems.
One of the most effective methods was used by Archer Daniels Midland in its lysine cartel: it built lysine plants of grotesque overcapacity, something like 30% of the global market, but only sold part of its peak production; its threat to defectors like Sewon (the Korean manufacturer) was that if they cheated on their quotas, it would unleash a price war that would drive it into trouble (apparently Sewon was very heavily in debt from financing its expansion, and like the 2 Japanese companies, it had minimal non-lysine business) or outright bankruptcy. This is similar to what De Beers & OPEC have sometimes done, IIRC.
Another method in other cartels is the companies shared their internal financial data (whose veracity would be guaranteed by third-party auditors), pooled all the revenues, and then divided accordingly. Obviously this makes it harder to cheat as well, and reduces any incentive.
An approximation to this would be market surveys, and if the surveys showed that any cartelist’s share had fallen at the expense of another’s, the offender would sell at-cost the product to the damaged party (one of te lysine mechanisms).
Some cartels just hold together because the corporate managers running the cartel have a collective interest in driving up the price & their division’s profits, but not much of one to engage in price wars for market share. (Such as the multiple vitamin cartels lasting many decades.)
Others, like the big German conglomerates or Japanese zaibatsu, have been aided by government complicity or active aid.
And then there often can be legal punishments for defectors—going back to the vitamin cartel, we can read in Wikipedia:
(And the whistleblower on the lysine cartel, ironically, wound up staying in jail longer than any of the other malefactors because of his embezzlement.)
Finally, one threat in the US that can be used on a defector is that the amnesty program grants complete immunity from federal damages & prosecution to the first cartelist to come forward with good information about the cartel, and this also minimizes their civil liability too—while the other cartelists will still be vulnerable to the triple damages prescribed by anti-trust law, and the civil penalties. So if cheating gets too bad, the irked company can blow the whole thing up.
“I probably came off as more “anticapitalist” or “collectivist” than I really am”
There is most certainly nothing anti-capitalist about creating and maintaining a reputation for cooperation. Who would loan you money or send you goods without payment if you have a reputation for defecting?
And what’s the point of making money if everyone hates you?
Survival. If everyone loves you you might be able to live without money. But if everyone hates you then you need to give full payment for everything that you want from them. So then you really need money.
If everyone really hates you they won’t deal with you at all. And if they really really hate you they’ll burn you in your house.
Money is a great way to transact with strangers. Enemies and friends, it gets complicated.
Um … decrease the money supply? =-)
When you write “If the others continue to cooperate, their bid is lower and they get nothing” you imply an iterated game. It seems clear from Hamermesh’s account that players were only allowed to submit one bid.
Ashley won, but she didn’t maximize her win. The smartest thing to do would be to agree to collude, bid higher, and then divide the winnings equally anyway. Everyone gets the same payout, but only Ashley would get the satisfaction of winning. And if someone else bids higher, she’s no longer the sole defector, which is socially significant. And, of course, $20 is really not a significant enough sum to play hardball for.
Sorry for the poor phrasing. I didn’t read it as an iterated game at all. That statement should instead read, “If the others nevertheless cooperate, … ”
Should I update it? How do you do the strikeout/line-through thing.
Silas, well said. I note that Bob Murphy has linked to this post.
BTW, while I agree that Hamermesh’s experiment showed the difficulty of collusion in a free market, I doubt that was his intended “point”.
Regards,
Tom
Thanks, TokyoTom.
And now a commenter on Scott Sumner’s blog mentioned this. I’m gonna be famous! :-P (Btw, any plans for allowing more profile information so we post our email and websites if we want?)
I was saying the opposite: in his post, Hamermesh is saying that this game shows why collusion is difficult, but it doesn’t capture the mechanism by which it makes collusion difficult.
There are wiki user pages. Unfortunately, they are not linked.
You aren’t saying anything here that Hamermesh isn’t well aware of. He is teaching models, and models are simplifications of the world.
He’s aware that the mechanism by which Ashley won (being a lucky liar) is not the reason markets prevent collusion?
Then why is he teaching that as a demonstration of why markets prevent collusion? Kind of a strange way to go about it, don’t you think?
The celebratory tone of the Freakanomics post is also pretty inexplicable. Why is he so happy that one student out of eight bid $0.05, when the model that he’s teaching supposedly predicts that everyone bid $17.50? Either his model is horribly wrong, or the students haven’t learned anything, or both...
Maybe this professor just doesn’t spend much effort on his blog posts. Take a look at http://freakonomics.blogs.nytimes.com/2009/09/21/why-my-students-dont-get-rebates where he uses the phrase “Pareto improvement” in a completely wrong way. Anyone who doesn’t already know what it means will be misled, and those who do will be confused.
Robin, when do you go from “using a model” to committing the ludic fallacy? I would really be interested in a post that attempts to better define where this line is.
I read the article, and thought much the same thing. Ashley may be up financially, but down socially.
Definitely—how much would you (the abstract “you”) pay to avoid the whole class seeing you as a jerk?
< -$20
This is not Prisoner’s Dilemma. The original has no reputation effects. http://en.wikipedia.org/wiki/Prisoner’s_dilemma
This was a game in a game theory class. As so the teacher is trying to teach things like strategy domination, ect. In this case I believe he was applauding Ashley because she understood that a bid of .01 was weekly dominated by all other bids; that all other bids yield as good or better results.
Was it a bad idea for her to show herself as a “selfish git”? I don’t know that depends on the social situation. My guess is that folks in a game theory class get that this is a game.
See Yale open course on game theory for background: http://oyc.yale.edu/economics/game-theory/
On a side note if you want to take reputation into consideration consider the iterated prisoner’s dilemma. Computer science classes commonly do this early on as a fun way of getting kids to create data structures capable of remembering who ripped them off.
In the experiment you trade with you classmates $1. If you both are honest you get back $1.1. If one cheats and the other does not they get $2. If both cheat they get $1. If someone cheats you most program that they cheat that person from then onward.
When the students don’t know how many trades the program will be run for the honest traders do best.
I don’t understand—if you both do just as well both cheating as you do when you both act honestly, why is there any reason whatsoever to be “honest”?
On the topic of “utilities in the prisoner dilemma coinciding with jailtime” I quote one of my guest blog posts: http://phd.kt.pri.ee/2009/01/27/the-real-prisoner-dilemma/
Two hardened criminals are taken to interrogation in separate cells. They are offered the usual deal: If neither confesses, both get one year probation. If both confess, both do 5 years in jail. If one confesses, he goes free but the other does 10 years hard time.
Here’s what actually goes through their minds: “Okay, if neither of us confesses, we have to go back to the real world. But its so hard there! But if I confess, he will kill me when he gets out.. so thats bad… If both of us confess, then we can just get back to jail and continue our lives!”
Lateral thinking, people ;)