Let’s imagine a scientist did 500 tests. Then he started discarding tests, from the end, until the remaining data supported some hypothesis (or he ran out of tests). Is this to be treated as evidence of the same strength as it would if he had precommitted to only doing that many tests?
I may be wrong here because I’m tired, but I think the way the maths comes out is that this would be as strong if he only removed tests from the end, whereas if he removed them from anywhere he chose depending on how they came out it would not be as strong.
He does not discard anything that actually happened.
This is the key difference. We are evaluating the effectiveness of the drug by looking at what the drug actually did, not what it could have done.
I can give a much more precise mathematical proof if you want.
Let’s imagine a scientist did 500 tests. Then he started discarding tests, from the end, until the remaining data supported some hypothesis (or he ran out of tests). Is this to be treated as evidence of the same strength as it would if he had precommitted to only doing that many tests?
I may be wrong here because I’m tired, but I think the way the maths comes out is that this would be as strong if he only removed tests from the end, whereas if he removed them from anywhere he chose depending on how they came out it would not be as strong.