In other words, yes, the researcher’s private thoughts do matter, because P(observation|researcher 1) != P(observation|researcher 2) even though the observations are the same.
But, that’s the thing : P(observation|researcher 1) = P(observation|researcher 2)
The individual patient results would not change whether it is researcher 1 or 2 leading the experiment. And given the 100 results that they had, both researchers would (and did) proceed exactly the same.
Maybe the second researcher was one of 20 researchers using the same approach, and he is the only one with a 70% success rate—the other 19 had success rates of about 1%. We have never heard of these other researchers, because having failed to reach 60% they are researching to this very day and are likely to never publish their results. When you have 10,000 cures out of a million patients, it’d take a nearly impossible lucky streak to be able to get nearly a million and a half more successes without getting a billion more failures along the way, given the likely probability of 1% and assuming you are using the same cure and not optimizing it along the research (which will make it a different beast entirely)
So, if we combine all the tests of all the 20 researches together, we have 70+19⋅10000=190070 cures out of 100+19⋅1000000=19000100 patients giving us a success rate of 19007019000100≈1.00036%. But the fact that only our one researcher has published cherry-picks the tiny fraction of that data to get a 70% success rate.
Compare to the first researcher, who would have published anyway testing 100 patients—so if there were 19 more like him who would get 1% success rate they would still publish, and a meta research could show more accurate results.
Maybe. But to assume any of that, you would need additional knoweledge. In the real world, in an actual case, you might have checked that there are 19 other researchers who used the same approach and that they all hid their findings. Whatever that additional knoweledge is that’s allowing you to infer 19 hidden motivated researchers where only 1 is given, that is what gives you the ≈1% result.
I’m not inferring 19 more motivated researchers—that was just an example (the number 20 was picked because the standard threshold for significance is 5% which means one of out 20 researches that achieved this will be wrong). What I do infer is an unknown number of motivated researchers.
The key assumption here is that had the motivated researcher failed to meet the desired results, he would have kept researching without publishing and we would not know about his research. This implies that we do not know about any motivated researcher that failed to achieve their desired results—hence we can assume an unknown number of them.
The same cannot be said about the frugal researcher. If there were more frugal researchers but they all failed, they would have still published once they reached 100 patients and we would have still heard of them—so the fact we don’t know about more frugal researchers really does mean there aren’t any more frugal researchers.
Note that if my assumption is wrong, and in the other Everett branch where the motivated researcher failed we would have still known about his forever ongoing research, then in that case there really was no difference between them, because we could assign to the fact the motivated researcher is still researching the same meaning we assign to the frugal researcher publishing failed results.
-------
Consider a third researcher—one that’s not as ethical as the first two, and plans on cherry-picking his results. But he decides he can be technically ethical if instead of cherry-picking the results inside each research he’d just cherry-pick the researches with desirable results. His plan is to research 100 patients, and if he can cure more than 60% of them he’ll publish. Otherwise he’ll just throw scrap that research’s results and start a brand new research, with the same treatment but still technically a new research.
That third researcher is publishing results—it’s 70 cures out of 100 patients. We know about his methods and we know about these results—and that’s it. Should we just assume this is his only research and even though he intended to cherry-pick he happened to get this results on the first attempt, so we should treat them the same as we treat the frugal researcher’s results?
Note that the difference between the motivated researcher and the cheating researcher is that the cheating researcher has to deliberately hide his previous researches (if there are any) while the motivated researcher simply doesn’t now about his still researching peers (if there are any). But that’s just a state of mind, and neither of them is lying about the research they did publish.
But in the Jaynes example we’re talking about, there are clear observable differences. One had announced that he would continue until he got a certain proportion of success, the other had announced that he would stop at 100.
The key is that Jaynes gives a further piece of data: that somehow we know that “Neither would stoop to falsifying the data”. In Bayesian terms, this information, if reliable, screens out our knowledge that their plans had differed. But in real life, you’re never 100% certain that “neither would stoop to falsifying the data”, especially when there’s often more wiggle room than you’d realize about exactly which data get counted how. In that sense, a rigorous pre-announced plan may be useful evidence about whether there’s funny business going on. The reviled “frequentist” assumptions, then, can be expressed in Bayesian terms as a prior distribution that assumes that researchers cheat whenever the rules aren’t clear. That’s clearly over-pessimistic in many cases (though over-optimistic in others; some researchers cheat even when the rules ARE clear); but, like other heuristics of “significance”, it has some value in developing a “scientific consensus” that doesn’t need to be updated minute-by-minute.
In general: sure, the world is Bayesian. But that doesn’t mean that frequentist math isn’t math. Good frequentist statistics is better than bad Bayesian statistics any day, and anyone who shuts their ears or perks them up just based on a simplistic label is doing themselves a disservice.
To answer your story about data:
One person decides on a conclusion and then tries to write the most persuasive argument for that conclusion.
Another person begins to write an argument by considering evidence, analyzing it, and then comes to a conclusion based on the analysis.
Both of those people type up their arguments and put them in your mailbox. As it happens, both arguments happen to be identical.
Are you telling me the first person’s argument carries the exact same weight as the second?
In other words, yes, the researcher’s private thoughts do matter, because P(observation|researcher 1) != P(observation|researcher 2) even though the observations are the same.
I think that’s the proper Bayesian objection, anyway.
But, that’s the thing : P(observation|researcher 1) = P(observation|researcher 2)
The individual patient results would not change whether it is researcher 1 or 2 leading the experiment. And given the 100 results that they had, both researchers would (and did) proceed exactly the same.
Maybe the second researcher was one of 20 researchers using the same approach, and he is the only one with a 70% success rate—the other 19 had success rates of about 1%. We have never heard of these other researchers, because having failed to reach 60% they are researching to this very day and are likely to never publish their results. When you have 10,000 cures out of a million patients, it’d take a nearly impossible lucky streak to be able to get nearly a million and a half more successes without getting a billion more failures along the way, given the likely probability of 1% and assuming you are using the same cure and not optimizing it along the research (which will make it a different beast entirely)
So, if we combine all the tests of all the 20 researches together, we have 70+19⋅10000=190070 cures out of 100+19⋅1000000=19000100 patients giving us a success rate of 19007019000100≈1.00036%. But the fact that only our one researcher has published cherry-picks the tiny fraction of that data to get a 70% success rate.
Compare to the first researcher, who would have published anyway testing 100 patients—so if there were 19 more like him who would get 1% success rate they would still publish, and a meta research could show more accurate results.
This is an actual problem with science publications—journals are more likely to publish successful results than null results, effectively cherry-picking the results from the successful researches.
Maybe. But to assume any of that, you would need additional knoweledge. In the real world, in an actual case, you might have checked that there are 19 other researchers who used the same approach and that they all hid their findings. Whatever that additional knoweledge is that’s allowing you to infer 19 hidden motivated researchers where only 1 is given, that is what gives you the ≈1% result.
I’m not inferring 19 more motivated researchers—that was just an example (the number 20 was picked because the standard threshold for significance is 5% which means one of out 20 researches that achieved this will be wrong). What I do infer is an unknown number of motivated researchers.
The key assumption here is that had the motivated researcher failed to meet the desired results, he would have kept researching without publishing and we would not know about his research. This implies that we do not know about any motivated researcher that failed to achieve their desired results—hence we can assume an unknown number of them.
The same cannot be said about the frugal researcher. If there were more frugal researchers but they all failed, they would have still published once they reached 100 patients and we would have still heard of them—so the fact we don’t know about more frugal researchers really does mean there aren’t any more frugal researchers.
Note that if my assumption is wrong, and in the other Everett branch where the motivated researcher failed we would have still known about his forever ongoing research, then in that case there really was no difference between them, because we could assign to the fact the motivated researcher is still researching the same meaning we assign to the frugal researcher publishing failed results.
-------
Consider a third researcher—one that’s not as ethical as the first two, and plans on cherry-picking his results. But he decides he can be technically ethical if instead of cherry-picking the results inside each research he’d just cherry-pick the researches with desirable results. His plan is to research 100 patients, and if he can cure more than 60% of them he’ll publish. Otherwise he’ll just throw scrap that research’s results and start a brand new research, with the same treatment but still technically a new research.
That third researcher is publishing results—it’s 70 cures out of 100 patients. We know about his methods and we know about these results—and that’s it. Should we just assume this is his only research and even though he intended to cherry-pick he happened to get this results on the first attempt, so we should treat them the same as we treat the frugal researcher’s results?
Note that the difference between the motivated researcher and the cheating researcher is that the cheating researcher has to deliberately hide his previous researches (if there are any) while the motivated researcher simply doesn’t now about his still researching peers (if there are any). But that’s just a state of mind, and neither of them is lying about the research they did publish.
How can anyone other than the researchers themselves distinguish between them if their thoughts are private?
I understand “private thoughts” to imply that there are no other observable differences between the two researchers.
But in the Jaynes example we’re talking about, there are clear observable differences. One had announced that he would continue until he got a certain proportion of success, the other had announced that he would stop at 100.
The key is that Jaynes gives a further piece of data: that somehow we know that “Neither would stoop to falsifying the data”. In Bayesian terms, this information, if reliable, screens out our knowledge that their plans had differed. But in real life, you’re never 100% certain that “neither would stoop to falsifying the data”, especially when there’s often more wiggle room than you’d realize about exactly which data get counted how. In that sense, a rigorous pre-announced plan may be useful evidence about whether there’s funny business going on. The reviled “frequentist” assumptions, then, can be expressed in Bayesian terms as a prior distribution that assumes that researchers cheat whenever the rules aren’t clear. That’s clearly over-pessimistic in many cases (though over-optimistic in others; some researchers cheat even when the rules ARE clear); but, like other heuristics of “significance”, it has some value in developing a “scientific consensus” that doesn’t need to be updated minute-by-minute.
In general: sure, the world is Bayesian. But that doesn’t mean that frequentist math isn’t math. Good frequentist statistics is better than bad Bayesian statistics any day, and anyone who shuts their ears or perks them up just based on a simplistic label is doing themselves a disservice.