I think this is not quite right, and it’s not-quite-right in an important way. It really isn’t true in any sense that “it’s more likely that you’ll alternate between heads and tails”. This is a Simpson’s-paradox-y thing where “the average of the averages doesn’t equal the average”.
Suppose you flip a coin four times, and you do this 16 times, and happen to get each possible outcome once: TTTT TTTH TTHT TTHH THTT THTH THHT THHH HTTT HTTH HTHT HTHH HHTT HHTH HHHT HHHH.
Question 1: in this whole sequence of events, what fraction of the time was the flip after a head another head? Answer: there were 24 flips after heads, and of these 12 were heads. So: exactly half the time, as it should be. (Clarification: we don’t count the first flip of a group of 4 as “after a head” even if the previous group ended with a head.)
Question 2: if you answer that same question for each group of four, and ignore cases where the answer is indeterminate because it involves dividing by zero, what’s the average of the results: Answer: it goes 0⁄00⁄00⁄11⁄10⁄10⁄11⁄22⁄20⁄10⁄10⁄21⁄21⁄21⁄22⁄33⁄3. We have to ignore the first two. The average of the rest is 17⁄42, or just over 0.4.
What’s going on here isn’t any kind of tendency for heads and tails to alternate. It’s that an individual head or tail “counts for more” when the denominator is smaller, i.e., when there are fewer heads in the sample.
My intuition is from the six points in Kahan’s post. If the next flip is heads, then the flip after is more likely to be tails, relative to if the next flip is tails. If we have an equal number of heads and tails left, P(HT) > P(HH) for the next two flips. After the first heads, the probability for the next two might not give P(TH) > P(TT), but relative to independence it will be biased in that direction because the first T gets used up.
Is there a mistake? I haven’t done any probability in a while.
If the next flip is heads, then the flip after is more likely to be tails, relative to if the next flip is tails.
No, that is not correct. Have a look at my list of 16 length-4 sequences. Exactly half of all flips-after-heads are heads, and the other half tails. Exactly half of all flips-after-tails are heads, and the other half tails.
The result of Miller and Sanjuro is very specifically about “averages of averages”. Here’s a key quotation:
We demonstrate that in a finite sequence generated by i.i.d. Bernoulli trials with probability of success p, the relative frequency of success on those trials that immediately follow a streak of one, or more, consecutive successes is expected to be strictly less than p
“The relative frequency [average #1] is expected [average #2] to be …”. M&S are not saying that in finite sequences of trials successes are actually rarer after streaks of success. They’re saying that if you compute their frequency separately for each of your finite sequences then the average frequency you’ll get will be lower. These are not the same thing. If, e.g., you run a large number of those finite sequences and aggregate the counts of streaks and successes-after-streaks, the effect disappears.
I think this is not quite right, and it’s not-quite-right in an important way. It really isn’t true in any sense that “it’s more likely that you’ll alternate between heads and tails”. This is a Simpson’s-paradox-y thing where “the average of the averages doesn’t equal the average”.
Suppose you flip a coin four times, and you do this 16 times, and happen to get each possible outcome once: TTTT TTTH TTHT TTHH THTT THTH THHT THHH HTTT HTTH HTHT HTHH HHTT HHTH HHHT HHHH.
Question 1: in this whole sequence of events, what fraction of the time was the flip after a head another head? Answer: there were 24 flips after heads, and of these 12 were heads. So: exactly half the time, as it should be. (Clarification: we don’t count the first flip of a group of 4 as “after a head” even if the previous group ended with a head.)
Question 2: if you answer that same question for each group of four, and ignore cases where the answer is indeterminate because it involves dividing by zero, what’s the average of the results: Answer: it goes 0⁄0 0⁄0 0⁄1 1⁄1 0⁄1 0⁄1 1⁄2 2⁄2 0⁄1 0⁄1 0⁄2 1⁄2 1⁄2 1⁄2 2⁄3 3⁄3. We have to ignore the first two. The average of the rest is 17⁄42, or just over 0.4.
What’s going on here isn’t any kind of tendency for heads and tails to alternate. It’s that an individual head or tail “counts for more” when the denominator is smaller, i.e., when there are fewer heads in the sample.
My intuition is from the six points in Kahan’s post. If the next flip is heads, then the flip after is more likely to be tails, relative to if the next flip is tails. If we have an equal number of heads and tails left, P(HT) > P(HH) for the next two flips. After the first heads, the probability for the next two might not give P(TH) > P(TT), but relative to independence it will be biased in that direction because the first T gets used up.
Is there a mistake? I haven’t done any probability in a while.
No, that is not correct. Have a look at my list of 16 length-4 sequences. Exactly half of all flips-after-heads are heads, and the other half tails. Exactly half of all flips-after-tails are heads, and the other half tails.
The result of Miller and Sanjuro is very specifically about “averages of averages”. Here’s a key quotation:
“The relative frequency [average #1] is expected [average #2] to be …”. M&S are not saying that in finite sequences of trials successes are actually rarer after streaks of success. They’re saying that if you compute their frequency separately for each of your finite sequences then the average frequency you’ll get will be lower. These are not the same thing. If, e.g., you run a large number of those finite sequences and aggregate the counts of streaks and successes-after-streaks, the effect disappears.