The results of the two experimenters in the example are different: to begin with, the 2nd experimenter’s first result is a non-cure (otherwise he would have stopped there with a 100% success); one of the three following results is also a non-cure (otherwise he would have stopped with a 75%); etc. Also, his last result is a cure (otherwise he would have stopped one patient earlier).
The first experimenter certainly got different results—or you may as well win the lottery: the odds that a Bernoulli trial produces a sequence x1..x100 in which no prefix x1...xN has a higher rate of successes than the whole sequence are really small.
Note that this argument gets weaker as you change the definition of “definitely greater than 60%” to require greater statistical confidence (indeed .99 results are less sensible to methodological biases than .95 results), but even at .99 the odds that the sequence obtained by the 1st doctor would finish exactly where the 2nd doctor would stop are well below 1/10th (I just made a quick upper bound calculation, it is actually even smaller).
The problem is that (1) when the results are reported in a journal, you only get the total counts—which hides the methodological trap, and (2) even if you got the full results, you most likely don’t have the computational power to discover the difference (except of course in the ~60% of reports from doctor 2 where he reports on a single patient).
Elezier:
The results of the two experimenters in the example are different: to begin with, the 2nd experimenter’s first result is a non-cure (otherwise he would have stopped there with a 100% success); one of the three following results is also a non-cure (otherwise he would have stopped with a 75%); etc. Also, his last result is a cure (otherwise he would have stopped one patient earlier).
The first experimenter certainly got different results—or you may as well win the lottery: the odds that a Bernoulli trial produces a sequence x1..x100 in which no prefix x1...xN has a higher rate of successes than the whole sequence are really small.
Note that this argument gets weaker as you change the definition of “definitely greater than 60%” to require greater statistical confidence (indeed .99 results are less sensible to methodological biases than .95 results), but even at .99 the odds that the sequence obtained by the 1st doctor would finish exactly where the 2nd doctor would stop are well below 1/10th (I just made a quick upper bound calculation, it is actually even smaller).
The problem is that (1) when the results are reported in a journal, you only get the total counts—which hides the methodological trap, and (2) even if you got the full results, you most likely don’t have the computational power to discover the difference (except of course in the ~60% of reports from doctor 2 where he reports on a single patient).