JeremyHussell comments on Simplified Poker

JeremyHussell 26 Jan 2019 17:28 UTC
1 point
8 months late. I’m coming into this cold but having previously read about a very similar competition to create strategies to play Rock-Paper-Scissors (RPS). First, work out all the decision points in the game, and the possible information available at each decision point. We end up with 2 binary decisions for each player, and 3 states of information at each decision point.
So my first strategy is to predict my opponent’s decisions, and calculate which of my possible decisions will give me the best result. For RPS this is pretty simple:
P(R), P(P), P(S): probabilities my opponent will play Rock, Paper, and Scissors.
V(R), V(P), V(S): expected score (value) for me playing Rock, Paper, Scissors.
V(R) = (P(R) * 0 + P(P) * −1 + P(S) * 1) / (P(R) + P(P) + P(S))
The calculation on the line above is for the general case. For the specific case of RPS, it simplifies to:
V(R) = P(S) - P(P)
V(P) = P(R) - P(S)
V(S) = P(P) - P(R)
A surprising number of competitors fail to play optimally against their opponent’s predicted actions. For example, with P(R) = 0.45, P(P) = 0.16, P(S) = 0.39, many competitors play Paper, even though the best expected value is from playing Rock. (Optimal play exploits unusually low probabilities as well as unusually high probabilities.)
In RPS there are three possible decisions, but in simplified poker all the decision points are binary, so we can use A and !A to represent both probabilities, instead of A, B, and C. I choose to represent betting and calling as direct probabilities, and checking and folding as the complementary probabilities.
A, B, C: player #1 bets with a 1, 2, or 3 respectively
D, E, F: after a check and a bet, player #1 calls with a 1, 2, 3
G, H, I: after a bet, player #2 calls with a 1, 2, 3
J, K, L: after a check, player #2 bets with a 1, 2, 3
The expected value calculations are more complicated than in RPS (among other things, you can be uncertain about the current state of the game because you don’t know which card your opponent has, and the outcome of player #1′s game sometimes depends on its own future decisions), but thanks to the binary decisions the results can be simplified almost as much as in RPS.
D(A), D(B), etc.: condition necessary to decide to do A, B, etc. Calculate V(A) and V(!A), then D(A) = V(A) > V(!A) and D(!A) = V(A) < V(!A). If they’re equal, then you play your predetermined Nash equilibrium strategy.
Player #1:
D(A) = ⁴⁄₃ > P(H) + P(I)
D(B) = 2 + P(G) + z > 3 * P(I), where z = P(L) - P(J) when 3 * P(J) > P(L) and z = 2 * P(J) when 3 * P(J) < P(L)
D(C) = P(G) + P(H) > P(J) + P(K)
D(D) = false
D(E) = 3 * P(J) > P(L)
D(F) = true
Player #2:
D(G) = false
D(H) = 3 * P(A) > P(C)
D(I) = true
D(J) = P(!B) * (2 * P(!E) - P(E)) > P(!C) * (P(F) − 2 * P(!F))
D(K) = P(!A) * P(D) > P(!C) * (3 * P(F) + 2)
D(L) = P(!A) * P(D) + P(!B) * P(E) > 0
Translated back to English:
#1 with a 1: If you predict #2 will fold often enough, then bet (bluff), otherwise check, and always fold if #2 bets.
#1 with a 2: Bet only if you predict #2 will call with a 1 and fold with a 3 enough more than bluffing with a 1 and checking with a 3. Call after #2 bets if there’s a high enough chance it’s a bluff.
#1 with a 3: Bet or call depending on whether #2 is more likely to call your bet or bet after you check. Always call if #2 bets.
#2 with a 1: If #1 bets, fold. If #1 checks and will fold often enough, then bluff, otherwise check.
#2 with a 2: If #1 bets, call if the chances of a bluff are high enough, otherwise fold. If #2 checks, check unless you predict #1 will call with a 1 and fold with a 3 often enough combined to be worth it.
#2 with a 3: If #1 bets, call. If #1 checks, bet.
Alert readers will complain that I’ve skipped over the most interesting step: predicting what my opponent will play. This is true, but the above steps needed to be done first, because many of the interesting strategies for predicting your opponent’s play assume they’ve done the same analysis. If both players play following this strategy, and both know that the other will play following this strategy, then play settles into one of the Nash equilibriums. But, many players won’t play optimally, and if you can identify deviations from the Nash equilibrium quickly then you can get a better score. If your opponent is doing the same thing, then you can fake a deviation from Nash that lowers your score a little, but causes your opponent to deviate from the Nash equilibrium in a way that you can exploit for more gain than your loss (until your opponent catches on). So I can predict you will predict I will predict you will… and it seems to go into an infinite loop of ever-higher levels of double-think.
My most important takeaway from the Rock-Paper-Scissors competition was that if there are a finite number of deterministic strategies, then the number of levels of double-think are finite too. This is much easier to see in RPS. Given a method of prediction P:
P0: assume your opponent is vulnerable to prediction by method P, play to beat it.
P1: assume your opponent thinks you will use method P0, and plays to beat it. Play to beat that.
P2: assume your opponent thinks you will use P1, and plays to beat it. Play to beat that.
But because in RPS there are only 3 possible deterministic strategies, P3 recommends you play the same way as P0!
There’s also a second stack where you assume your opponent is using P to predict you, then assuming you know that, and so on, which also ends with 3 deterministic strategies.
In simplified poker, if you predict your opponent is not playing a Nash equilibrium strategy, and respond optimally yourself, then you will respond in one of 16 ways. If you assume your opponent has guessed your play and will respond optimally, then there are 8 ways for player #1 to respond, and only 4 ways for player #2 to respond. So, assuming I haven’t made a mistake, there are at most 5 levels of second guessing, 1 for responding to naive play, and at most 4 more for responding to optimal play before either you or your opponent start repeating yourselves.
So, for any method of prediction which does not involve double-thinking, you can generate all double-think strategies and reverse double-think strategies. Then you need a meta-strategy to decide which one to use on the next hand. If you do this successfully then you’ll defeat anyone who is vulnerable to one of your methods of prediction, uses one of your methods of prediction, or uses a strategy to directly defeat one of your methods of prediction.