It’s using the experimental evidence to privilege H’ (a strictly more complex hypothesis than H), and then using the same experimental evidence to support H’. That’s double-counting.
The more possibly relevant differences between the experiments, the worse this is. There are usually a lot of potentially relevant differences, which causes exponential explosion in the hypothesis space from which H’ is privileged.
What’s worse, Alice’s experiment gave only weak evidence for H against some non-H hypotheses. Since you mention p-value, I expect that it’s only comparing against one other hypothesis. That would make it weak evidence for H even if p < 0.0001 - but it couldn’t even manage that.
Are there no other hypotheses of comparable or lesser complexity than H’ matching the evidence as well or better? Did those formulating H’ even think for five minutes about whether there were or not?
Yes, it’s definitely fishy.
It’s using the experimental evidence to privilege H’ (a strictly more complex hypothesis than H), and then using the same experimental evidence to support H’. That’s double-counting.
The more possibly relevant differences between the experiments, the worse this is. There are usually a lot of potentially relevant differences, which causes exponential explosion in the hypothesis space from which H’ is privileged.
What’s worse, Alice’s experiment gave only weak evidence for H against some non-H hypotheses. Since you mention p-value, I expect that it’s only comparing against one other hypothesis. That would make it weak evidence for H even if p < 0.0001 - but it couldn’t even manage that.
Are there no other hypotheses of comparable or lesser complexity than H’ matching the evidence as well or better? Did those formulating H’ even think for five minutes about whether there were or not?