I wrote in my last comment that “T2 is more likely to be flawed than is T1, because T2 only had to post-dict the second batch. This is trivial to formalize using Bayes’s theorem. Roughly speaking, it would have been harder for T1 to been constructed in a flawed way and still have gotten its predictions for the second batch right.”
Benja Fallenstein asked for a formalization of this claim. So here goes :).
Define a method to be a map that takes in a batch of evidence and returns a theory. We have two assumptions
ASSUMPTION 1: The theory produced by giving an input batch to a method will at least predict that input. That is, no matter how flawed a method of theory-construction is, it won’t contradict the evidence fed into it. More precisely,
p( M(B) predicts B ) = 1.
(A real account of hypothesis testing would need to be much more careful about what constitutes a “contradiction”. For example, it would need to deal with the fact that inputs aren’t absolutely reliable in the real world. But I think we can ignore these complications in this problem.)
ASSUMPTION 2: If a method M is known to be flawed, then its theories are less likely to make correct predictions of future observations. More precisely, if B2 is not contained in B1, then
(Outside of toy problems like this one, we would need to stipulate that B2 is not a logical consequence of B1, and so forth.)
Now, let B1 and B2 be two disjoint and nonempty sets of input data. In the problem, B1 is the set of results of the first ten experiments, and B2 is the set of results of the next ten experiments.
My claim amounted to the following. Let
P1 := p( M is flawed | M(B1) predicts B2 ),
P2 := p( M is flawed | M(B1 union B2) predicts B2 ).
Then P1 < P2
To prove this, note that, by Bayes’s theorem, the second quantity P2 is given by
P2 = p( M(B1 union B2) predicts B2 | M is flawed ) * p(M is flawed) / p( M(B1 union B2) predicts B2 ).
Since p(X) = 1 implies p(X|Y) = 1 when Y is nonempty, Assumption 1 tells us that this reduces to
P2 = p(M is flawed).
On the other hand, the first quantity P1 is
P1 = p( M(B1) predicts B2 | M is flawed ) * p( M is flawed) / p( M(B1) predicts B2 ).
I wrote in my last comment that “T2 is more likely to be flawed than is T1, because T2 only had to post-dict the second batch. This is trivial to formalize using Bayes’s theorem. Roughly speaking, it would have been harder for T1 to been constructed in a flawed way and still have gotten its predictions for the second batch right.”
Benja Fallenstein asked for a formalization of this claim. So here goes :).
Define a method to be a map that takes in a batch of evidence and returns a theory. We have two assumptions
ASSUMPTION 1: The theory produced by giving an input batch to a method will at least predict that input. That is, no matter how flawed a method of theory-construction is, it won’t contradict the evidence fed into it. More precisely,
p( M(B) predicts B ) = 1.
(A real account of hypothesis testing would need to be much more careful about what constitutes a “contradiction”. For example, it would need to deal with the fact that inputs aren’t absolutely reliable in the real world. But I think we can ignore these complications in this problem.)
ASSUMPTION 2: If a method M is known to be flawed, then its theories are less likely to make correct predictions of future observations. More precisely, if B2 is not contained in B1, then
p( M(B1) predicts B2 | M flawed ) < P( M(B1) predicts B2 ).
(Outside of toy problems like this one, we would need to stipulate that B2 is not a logical consequence of B1, and so forth.)
Now, let B1 and B2 be two disjoint and nonempty sets of input data. In the problem, B1 is the set of results of the first ten experiments, and B2 is the set of results of the next ten experiments.
My claim amounted to the following. Let
P1 := p( M is flawed | M(B1) predicts B2 ),
P2 := p( M is flawed | M(B1 union B2) predicts B2 ).
Then P1 < P2
To prove this, note that, by Bayes’s theorem, the second quantity P2 is given by
P2 = p( M(B1 union B2) predicts B2 | M is flawed ) * p(M is flawed) / p( M(B1 union B2) predicts B2 ).
Since p(X) = 1 implies p(X|Y) = 1 when Y is nonempty, Assumption 1 tells us that this reduces to
P2 = p(M is flawed).
On the other hand, the first quantity P1 is
P1 = p( M(B1) predicts B2 | M is flawed ) * p( M is flawed) / p( M(B1) predicts B2 ).
By Assumption 2, this becomes
P1 < p( M is flawed ).
Hence, P1 < P2, as claimed.