Just a note here: the fact that a dataset has the same likelihood function regardless of the procedure that produced it is actually NOT a trivial statement—the way I see it, it a somewhat deep result which follows from the optional stopping theorem and the fact that the likelihood function is bounded. Not trying to nitpick, just pointing out that there is something to think about here. According to my initial intuitions, this was actually rather surprising—I didn’t expect experimental results constructed using biased data (in the sense of non-fixed stopping time) to end up yielding unbiased results, even with full disclosure of all data.
It’s worth revising your intuitions if you found if surprising that a fixed physical act had the same likelihood to data regardless of researcher thoughts. It is indeed possible to see the mathematical result as “obvious at a glance”.
That’s not quite what I meant. It is not the experimenter’s thoughts that I am uncomfortable with- it is the collection of possible experimental outcomes.
I will try to illustrate with an example. Let us say that I toss a coin either (i) two times, or (ii) until it comes up heads. In the first case, the possible outcomes are HH, HT, TH, or TT; in the second case, they are H, TH, TTH, TTTH, TTTTH, etc. It isn’t obvious to me that a TH outcome has the same meaning in both cases. If, for instance, we were not talking about likelihood and instead decided to measure something else, e.g. the portion of tosses landing on heads, this wouldn’t be the case; in scenario (i), the expected portion of tosses landing on heads is 1⁄4 + .5/4 + .5/4 + 0⁄4 = .5, but in scenario (ii), it would be 1⁄2 + .5/4 + (1/3)/8 + .25/16 + … = log(2); i.e. a little under .7.
I think in this case, we are assuming total and honest reporting of results (including publication); otherwise, we would be back to the story of filtered evidence. Therefore, the publication is not a result of the plans—it was going to happen in either case.
Thanks, I understood the mathematical point but was wondering if there is any practical significance since it seems in the real world that we cannot make such an assumption, and that in the real world we should trust the results of the two researchers differently (since the one researcher likely published no matter what, whereas the second probably only published the experiments which came out favorably (even if he didn’t publish false information)).
What is the practical import of this idea? In the real world with all of people’s biases shouldn’t we distinguish between the two researchers as a general heuristic for good research standards?
(If this is addressed in a different post on this site feel free to point me there since I have not read the majority of the site)
You can claim that it should have the same likelihood either way, but you have to put the discrepancy somewhere.
Knowing the choice of stopping rule is evidence about the experimenter’s state of knowledge about the efficacy. You can say that it should be treated as a separate piece of evidence, or that knowing about the stopping rule should change your prior, but if you don’t bring it in somewhere, you’re ignoring critical information.
No, it practical terms it’s negligible. There’s a reason that double-blind trials are the gold standard—it’s because doctors are as prone to cognitive biases as anyone else.
Let me put it this way: recently a pair of doctors looked at the available evidence and concluded (foolishly!) that putting fecal bacteria in the brains of brain cancer patients was such a promising experimental treatment that they did an end-run around the ethics review process—and after leaving that job under a cloud, one of them was still considered a “star free agent”. Well, perhaps so—but I think this little episode illustrates very well that a doctor’s unsupported opinion about the efficacy of his or her novel experimental treatment isn’t worth the shit s/he wants to place inside your skull.
Of course experimental design is very important in general. But VAuroch and I agree that when two designs give rise to the same likelihood function, the information that comes in from the data are equivalent. We disagree about the weight to give to the information that comes in from what the choice of experimental design tells us about the experimenter’s prior state of knowledge.
Double-blind trials aren’t the gold standard, they’re the best available standard. They still don’t replicate far too often, because they don’t remove bias (and I’m not just referring to publication bias). Which is why, when considering how to interpret a study, you look at the history of what scientific positions the experimenter has supported in the past, and then update away from that to compensate for bias which you have good reason to think will show up in their data.
In the example, past results suggest that, even if the trial was double-blind, someone who is committed to achieving a good result for the treatment will get more favorable data than some other experimenter with no involvement.
And that’s on top of the trivial fact that someone with an interest in getting a successful trial is more likely to use a directionally-slanted stopping rule if they have doubts about the efficacy than if they are confident it will work, which is not explicitly relevant in Eliezer’s example.
Just a note here: the fact that a dataset has the same likelihood function regardless of the procedure that produced it is actually NOT a trivial statement—the way I see it, it a somewhat deep result which follows from the optional stopping theorem and the fact that the likelihood function is bounded. Not trying to nitpick, just pointing out that there is something to think about here. According to my initial intuitions, this was actually rather surprising—I didn’t expect experimental results constructed using biased data (in the sense of non-fixed stopping time) to end up yielding unbiased results, even with full disclosure of all data.
It’s worth revising your intuitions if you found if surprising that a fixed physical act had the same likelihood to data regardless of researcher thoughts. It is indeed possible to see the mathematical result as “obvious at a glance”.
That’s not quite what I meant. It is not the experimenter’s thoughts that I am uncomfortable with- it is the collection of possible experimental outcomes.
I will try to illustrate with an example. Let us say that I toss a coin either (i) two times, or (ii) until it comes up heads. In the first case, the possible outcomes are HH, HT, TH, or TT; in the second case, they are H, TH, TTH, TTTH, TTTTH, etc. It isn’t obvious to me that a TH outcome has the same meaning in both cases. If, for instance, we were not talking about likelihood and instead decided to measure something else, e.g. the portion of tosses landing on heads, this wouldn’t be the case; in scenario (i), the expected portion of tosses landing on heads is 1⁄4 + .5/4 + .5/4 + 0⁄4 = .5, but in scenario (ii), it would be 1⁄2 + .5/4 + (1/3)/8 + .25/16 + … = log(2); i.e. a little under .7.
The TH outcome tells you the same thing about the coin because the coin does not know what your plans were like.
I’m convinced. Having though about this a little more, I think I see the model you are working under, and it does make a good deal of intuitive sense.
Does the publication of the result tell you the same thing, since the fact that it was published is a result of the plans?
I think in this case, we are assuming total and honest reporting of results (including publication); otherwise, we would be back to the story of filtered evidence. Therefore, the publication is not a result of the plans—it was going to happen in either case.
Thanks, I understood the mathematical point but was wondering if there is any practical significance since it seems in the real world that we cannot make such an assumption, and that in the real world we should trust the results of the two researchers differently (since the one researcher likely published no matter what, whereas the second probably only published the experiments which came out favorably (even if he didn’t publish false information)). What is the practical import of this idea? In the real world with all of people’s biases shouldn’t we distinguish between the two researchers as a general heuristic for good research standards?
(If this is addressed in a different post on this site feel free to point me there since I have not read the majority of the site)
You can claim that it should have the same likelihood either way, but you have to put the discrepancy somewhere. Knowing the choice of stopping rule is evidence about the experimenter’s state of knowledge about the efficacy. You can say that it should be treated as a separate piece of evidence, or that knowing about the stopping rule should change your prior, but if you don’t bring it in somewhere, you’re ignoring critical information.
No, it practical terms it’s negligible. There’s a reason that double-blind trials are the gold standard—it’s because doctors are as prone to cognitive biases as anyone else.
Let me put it this way: recently a pair of doctors looked at the available evidence and concluded (foolishly!) that putting fecal bacteria in the brains of brain cancer patients was such a promising experimental treatment that they did an end-run around the ethics review process—and after leaving that job under a cloud, one of them was still considered a “star free agent”. Well, perhaps so—but I think this little episode illustrates very well that a doctor’s unsupported opinion about the efficacy of his or her novel experimental treatment isn’t worth the shit s/he wants to place inside your skull.
Hold on- aren’t you saying the choice of experimental rule is VERY important (i.e. double blind vs. not double blind,etc)?
If so you are agreeing with VAuroch. You have to include the details of the experiment somewhere. The data does not speak for itself.
Of course experimental design is very important in general. But VAuroch and I agree that when two designs give rise to the same likelihood function, the information that comes in from the data are equivalent. We disagree about the weight to give to the information that comes in from what the choice of experimental design tells us about the experimenter’s prior state of knowledge.
Double-blind trials aren’t the gold standard, they’re the best available standard. They still don’t replicate far too often, because they don’t remove bias (and I’m not just referring to publication bias). Which is why, when considering how to interpret a study, you look at the history of what scientific positions the experimenter has supported in the past, and then update away from that to compensate for bias which you have good reason to think will show up in their data.
In the example, past results suggest that, even if the trial was double-blind, someone who is committed to achieving a good result for the treatment will get more favorable data than some other experimenter with no involvement.
And that’s on top of the trivial fact that someone with an interest in getting a successful trial is more likely to use a directionally-slanted stopping rule if they have doubts about the efficacy than if they are confident it will work, which is not explicitly relevant in Eliezer’s example.
I can’t say I disagree.