I just found out about the “hot hand fallacy fallacy” (Dan Kahan, Andrew Gelman, Miller&Sanjuro paper) as a type of bias that more numerate people are likely more susceptible to, and for whom it’s highly counterintuitive. It’s described as a specific failure mode of the intuition used to get rid of the gambler’s fallacy.
I understand the correct statement like this. Suppose we’re flipping a fair coin.
*If you’re predicting future flips of the coin, the next flip is unaffected by the results of your previous flips, because the flips are independent. So far, so good.
*However, if you’re predicting the next flip in a finite series of flips that has already occurred, it’s actually more likely that you’ll alternate between heads and tails.
The discussion is mostly about whether a streak of a given length will end or continue. This is for length of 1 and probability of 0.5. Another example is
...we can offer the following lottery at a $5 ticket price: a fair coin will be flipped 4 times. if the relative frequency of heads on flips that immediately follow a heads is greater than 0.5 then the ticket pays $10; if the relative frequency is less than 0.5 then the ticket pays $0; if the relative frequency is exactly equal to 0.5, or if no flip is immediately preceded by a heads, then a new sequence of 4 flips is generated. While, intuitively, it seems like the expected payout of this ticket is $0, it is actually $-0.71 (see Table 1). Curiously, this betting game may be more attractive to someone who believes in the independence of coin flips, rather than someone who holds the Gambler’s fallacy.
I think this is not quite right, and it’s not-quite-right in an important way. It really isn’t true in any sense that “it’s more likely that you’ll alternate between heads and tails”. This is a Simpson’s-paradox-y thing where “the average of the averages doesn’t equal the average”.
Suppose you flip a coin four times, and you do this 16 times, and happen to get each possible outcome once: TTTT TTTH TTHT TTHH THTT THTH THHT THHH HTTT HTTH HTHT HTHH HHTT HHTH HHHT HHHH.
Question 1: in this whole sequence of events, what fraction of the time was the flip after a head another head? Answer: there were 24 flips after heads, and of these 12 were heads. So: exactly half the time, as it should be. (Clarification: we don’t count the first flip of a group of 4 as “after a head” even if the previous group ended with a head.)
Question 2: if you answer that same question for each group of four, and ignore cases where the answer is indeterminate because it involves dividing by zero, what’s the average of the results: Answer: it goes 0⁄00⁄00⁄11⁄10⁄10⁄11⁄22⁄20⁄10⁄10⁄21⁄21⁄21⁄22⁄33⁄3. We have to ignore the first two. The average of the rest is 17⁄42, or just over 0.4.
What’s going on here isn’t any kind of tendency for heads and tails to alternate. It’s that an individual head or tail “counts for more” when the denominator is smaller, i.e., when there are fewer heads in the sample.
My intuition is from the six points in Kahan’s post. If the next flip is heads, then the flip after is more likely to be tails, relative to if the next flip is tails. If we have an equal number of heads and tails left, P(HT) > P(HH) for the next two flips. After the first heads, the probability for the next two might not give P(TH) > P(TT), but relative to independence it will be biased in that direction because the first T gets used up.
Is there a mistake? I haven’t done any probability in a while.
If the next flip is heads, then the flip after is more likely to be tails, relative to if the next flip is tails.
No, that is not correct. Have a look at my list of 16 length-4 sequences. Exactly half of all flips-after-heads are heads, and the other half tails. Exactly half of all flips-after-tails are heads, and the other half tails.
The result of Miller and Sanjuro is very specifically about “averages of averages”. Here’s a key quotation:
We demonstrate that in a finite sequence generated by i.i.d. Bernoulli trials with probability of success p, the relative frequency of success on those trials that immediately follow a streak of one, or more, consecutive successes is expected to be strictly less than p
“The relative frequency [average #1] is expected [average #2] to be …”. M&S are not saying that in finite sequences of trials successes are actually rarer after streaks of success. They’re saying that if you compute their frequency separately for each of your finite sequences then the average frequency you’ll get will be lower. These are not the same thing. If, e.g., you run a large number of those finite sequences and aggregate the counts of streaks and successes-after-streaks, the effect disappears.
However, if you’re predicting the next flip in a finite series of flips that has already occurred, it’s actually more likely that you’ll alternate between heads and tails.
...because heads occurring separately are on average balanced by heads occurring in long sequences; but limiting the length of the series puts a limit on the long sequences.
In other words, in infinite sequences, “heads preceeded by heads” and “heads preceeded by tails” would be in balance, but if you cut out a finite subsequence, if the first one was “head preceeded by head”, by cutting out the subsequence you have reclassified it.
Using the words from my previous comment, now the trick seems to be that ‘heads occurring separately are on average balanced by heads occurring in long sequences’—but according to the rules of the game, you get only one point of reward for a long sequence, while you could get multiple punishments for the separately occuring heads, if they appear in different series. Well, approximately.
I just found out about the “hot hand fallacy fallacy” (Dan Kahan, Andrew Gelman, Miller&Sanjuro paper) as a type of bias that more numerate people are likely more susceptible to, and for whom it’s highly counterintuitive. It’s described as a specific failure mode of the intuition used to get rid of the gambler’s fallacy.
I understand the correct statement like this. Suppose we’re flipping a fair coin.
*If you’re predicting future flips of the coin, the next flip is unaffected by the results of your previous flips, because the flips are independent. So far, so good.
*However, if you’re predicting the next flip in a finite series of flips that has already occurred, it’s actually more likely that you’ll alternate between heads and tails.
The discussion is mostly about whether a streak of a given length will end or continue. This is for length of 1 and probability of 0.5. Another example is
I think this is not quite right, and it’s not-quite-right in an important way. It really isn’t true in any sense that “it’s more likely that you’ll alternate between heads and tails”. This is a Simpson’s-paradox-y thing where “the average of the averages doesn’t equal the average”.
Suppose you flip a coin four times, and you do this 16 times, and happen to get each possible outcome once: TTTT TTTH TTHT TTHH THTT THTH THHT THHH HTTT HTTH HTHT HTHH HHTT HHTH HHHT HHHH.
Question 1: in this whole sequence of events, what fraction of the time was the flip after a head another head? Answer: there were 24 flips after heads, and of these 12 were heads. So: exactly half the time, as it should be. (Clarification: we don’t count the first flip of a group of 4 as “after a head” even if the previous group ended with a head.)
Question 2: if you answer that same question for each group of four, and ignore cases where the answer is indeterminate because it involves dividing by zero, what’s the average of the results: Answer: it goes 0⁄0 0⁄0 0⁄1 1⁄1 0⁄1 0⁄1 1⁄2 2⁄2 0⁄1 0⁄1 0⁄2 1⁄2 1⁄2 1⁄2 2⁄3 3⁄3. We have to ignore the first two. The average of the rest is 17⁄42, or just over 0.4.
What’s going on here isn’t any kind of tendency for heads and tails to alternate. It’s that an individual head or tail “counts for more” when the denominator is smaller, i.e., when there are fewer heads in the sample.
My intuition is from the six points in Kahan’s post. If the next flip is heads, then the flip after is more likely to be tails, relative to if the next flip is tails. If we have an equal number of heads and tails left, P(HT) > P(HH) for the next two flips. After the first heads, the probability for the next two might not give P(TH) > P(TT), but relative to independence it will be biased in that direction because the first T gets used up.
Is there a mistake? I haven’t done any probability in a while.
No, that is not correct. Have a look at my list of 16 length-4 sequences. Exactly half of all flips-after-heads are heads, and the other half tails. Exactly half of all flips-after-tails are heads, and the other half tails.
The result of Miller and Sanjuro is very specifically about “averages of averages”. Here’s a key quotation:
“The relative frequency [average #1] is expected [average #2] to be …”. M&S are not saying that in finite sequences of trials successes are actually rarer after streaks of success. They’re saying that if you compute their frequency separately for each of your finite sequences then the average frequency you’ll get will be lower. These are not the same thing. If, e.g., you run a large number of those finite sequences and aggregate the counts of streaks and successes-after-streaks, the effect disappears.
...because heads occurring separately are on average balanced by heads occurring in long sequences; but limiting the length of the series puts a limit on the long sequences.
In other words, in infinite sequences, “heads preceeded by heads” and “heads preceeded by tails” would be in balance, but if you cut out a finite subsequence, if the first one was “head preceeded by head”, by cutting out the subsequence you have reclassified it.
Am I correct, or is there more?
I don’t think this is correct. See my reply to AstraSequi.
(But I’m not certain I’ve understood what you’re proposing, and if I haven’t then of course your analysis and mine could both be right.)
Oops, you’re right.
Using the words from my previous comment, now the trick seems to be that ‘heads occurring separately are on average balanced by heads occurring in long sequences’—but according to the rules of the game, you get only one point of reward for a long sequence, while you could get multiple punishments for the separately occuring heads, if they appear in different series. Well, approximately.