My intuition is from the six points in Kahan’s post. If the next flip is heads, then the flip after is more likely to be tails, relative to if the next flip is tails. If we have an equal number of heads and tails left, P(HT) > P(HH) for the next two flips. After the first heads, the probability for the next two might not give P(TH) > P(TT), but relative to independence it will be biased in that direction because the first T gets used up.
Is there a mistake? I haven’t done any probability in a while.
If the next flip is heads, then the flip after is more likely to be tails, relative to if the next flip is tails.
No, that is not correct. Have a look at my list of 16 length-4 sequences. Exactly half of all flips-after-heads are heads, and the other half tails. Exactly half of all flips-after-tails are heads, and the other half tails.
The result of Miller and Sanjuro is very specifically about “averages of averages”. Here’s a key quotation:
We demonstrate that in a finite sequence generated by i.i.d. Bernoulli trials with probability of success p, the relative frequency of success on those trials that immediately follow a streak of one, or more, consecutive successes is expected to be strictly less than p
“The relative frequency [average #1] is expected [average #2] to be …”. M&S are not saying that in finite sequences of trials successes are actually rarer after streaks of success. They’re saying that if you compute their frequency separately for each of your finite sequences then the average frequency you’ll get will be lower. These are not the same thing. If, e.g., you run a large number of those finite sequences and aggregate the counts of streaks and successes-after-streaks, the effect disappears.
My intuition is from the six points in Kahan’s post. If the next flip is heads, then the flip after is more likely to be tails, relative to if the next flip is tails. If we have an equal number of heads and tails left, P(HT) > P(HH) for the next two flips. After the first heads, the probability for the next two might not give P(TH) > P(TT), but relative to independence it will be biased in that direction because the first T gets used up.
Is there a mistake? I haven’t done any probability in a while.
No, that is not correct. Have a look at my list of 16 length-4 sequences. Exactly half of all flips-after-heads are heads, and the other half tails. Exactly half of all flips-after-tails are heads, and the other half tails.
The result of Miller and Sanjuro is very specifically about “averages of averages”. Here’s a key quotation:
“The relative frequency [average #1] is expected [average #2] to be …”. M&S are not saying that in finite sequences of trials successes are actually rarer after streaks of success. They’re saying that if you compute their frequency separately for each of your finite sequences then the average frequency you’ll get will be lower. These are not the same thing. If, e.g., you run a large number of those finite sequences and aggregate the counts of streaks and successes-after-streaks, the effect disappears.