The motivating practical problem came from this question,
“guess the rule governing the following sequence” 11, 31, 41, 61, 71, 101, 131, …
I cried, “Ah the sequence is increasing!” With pride I looked into the back of the book and found the answer “primes ending in 1″.
I’m trying to zone in on what I did wrong.
If I had said instead, the sequence is a list of numbers—that would be stupider, but well inline with my previous logic.
My first attempt at explaining my mistake, was by arguing “it’s an increasing sequence” was actually less plausible then the real answer, since the real answer was making a much riskier claim. I think one can argue this without contradiction (the rule is either vague or specific, not both).
I think of it in terms of making a $100 bet.
So you have the sequence S: 11, 31, 41, 61, 71, 101, 131.
A: is the “bet” (i.e. hypothesis) that the sequence is increasing by primes ending in 1. There are very few sequences (below the number 150) you can write where you have an increasing sequence of primes ending in 1, so your “bet” is to go all in.
B: is the “bet” that the sequence is increasing. But a “sequence that’s increasing” spreads more of its money around so it’s not a very confident bet. Why does it spread more of its money around?
If we introduced a second sequence X: 14, 32, 42, 76, 96, 110, 125
You can still see that B can account for this sequence as well, whereas A does not. So B has to at least spread its betting money between the two sequences presented A and X just in case either of those are the answer presented in the back of the book. In reality there are an untold amount of sequences that B can account for besides the two here. Meaning that B has to spread its betting money to all of those sequences if B wants to “win” by “correctly guessing” what the answer was in the back of the book. This is what makes it a bad bet; a hypothesis that is too general.
This is a simple mathematical way you can compare the two “bets” via conditional probabilities:
Pr(A | S) is already all in because the A bet only fits something that looks like S. Pr(B | S) is less than all in because Pr(B | X) is also a possibility as well as any other increasing sequence of numbers, Pr(B | ???). This is a fancy way of saying that the strength of a hypothesis lies in what it can’t explain, not what it can; ask not what your hypothesis predicts, but what it excludes.
Going by what each bet excludes you can see that Pr(A | ??) < Pr(B | ??), even if we don’t have any hard and fast number for them. While there is a limited amount of 7 number patterns below 150 that are increasing, this is a much larger set than the amount of 7 number patterns below 150 that are increasing by primes ending in 1.
I think of it in terms of making a $100 bet.
So you have the sequence S: 11, 31, 41, 61, 71, 101, 131.
A: is the “bet” (i.e. hypothesis) that the sequence is increasing by primes ending in 1. There are very few sequences (below the number 150) you can write where you have an increasing sequence of primes ending in 1, so your “bet” is to go all in.
B: is the “bet” that the sequence is increasing. But a “sequence that’s increasing” spreads more of its money around so it’s not a very confident bet. Why does it spread more of its money around?
If we introduced a second sequence X: 14, 32, 42, 76, 96, 110, 125
You can still see that B can account for this sequence as well, whereas A does not. So B has to at least spread its betting money between the two sequences presented A and X just in case either of those are the answer presented in the back of the book. In reality there are an untold amount of sequences that B can account for besides the two here. Meaning that B has to spread its betting money to all of those sequences if B wants to “win” by “correctly guessing” what the answer was in the back of the book. This is what makes it a bad bet; a hypothesis that is too general.
This is a simple mathematical way you can compare the two “bets” via conditional probabilities:
Pr(B | S) + Pr(B | X) + Pr(B | ??) = 1.00 and Pr(A | S) + Pr(A | X) + Pr(A | ??) = 1.00
Pr(A | S) is already all in because the A bet only fits something that looks like S. Pr(B | S) is less than all in because Pr(B | X) is also a possibility as well as any other increasing sequence of numbers, Pr(B | ???). This is a fancy way of saying that the strength of a hypothesis lies in what it can’t explain, not what it can; ask not what your hypothesis predicts, but what it excludes.
Going by what each bet excludes you can see that Pr(A | ??) < Pr(B | ??), even if we don’t have any hard and fast number for them. While there is a limited amount of 7 number patterns below 150 that are increasing, this is a much larger set than the amount of 7 number patterns below 150 that are increasing by primes ending in 1.