Obviously, if what you’re actually doing is running a set number of trials in one case and running trials until you reach significance or give up in the second case, you will come up with different results.
I don’t believe this is true. Every individual trial is individual Bayesian evidence, unrelated to the rest of the trials except in the fact that your priors are different. If you run until significance you will have updated to a certain probability, and if you run until you’re bored you’ll also have updated to a certain probability.
Sure, if you run a different amount of trials, you may end up with a different probability. At worst, if you keep going until you’re bored, you may end up with results insignificant for the strict rules of “proof” in Science. But as long as you use Bayesian updating, neither method produces some form of invalid results.
which actually seems fairly obvious in retrospect
Ding ding ding! That’s my hindsight-bias-reminder-heuristic going off. It tells me when I need to check myself for hindsight bias, and goes off on thoughts like “That seems obvious in retrospect” and “I knew that all along.” At the risk of doing your thinking for you, I’d say this is a case of hindsight bias: It wasn’t obvious beforehand, since otherwise you wouldn’t have felt the need to do the test. This means it’s not an obvious concept in the first place, and only becomes clear when you consider it more closely, which you did. Then saying that “it’s obvious in retrospect” has no value, and actually devalues the time you put in.
formatting sucks
Try this:
To make a paragraph where your indentation is preserved and no characters are treated specially, precede each line with (at least) four spaces. This is commonly used for computer program source code.
I don’t believe this is true. Every individual trial is individual Bayesian evidence, unrelated to the rest of the trials except in the fact that your priors are different. If you run until significance you will have updated to a certain probability, and if you run until you’re bored you’ll also have updated to a certain probability.
You have to be very careful you’re actually asking the same question in both cases. In the case I tested above, I was asking exactly the same question (my intuition said very strongly that I wasn’t, but that’s because I was thinking of the very similar but subtly different question below). The “fairly obvious in retrospect” refers to that particular phrasing of the problem (I would have immediately understood that the probabilities had to be equal if I had phrased it that way, but since I didn’t, that insight was a little harder-earned).
The question I was actually thinking of is as follows.
Scenario A: You run 12 trials, then check whether your odds ratio reaches significance and report your results.
Scenario B: You run trials until either your odds ratio reaches significance or you hit 12 trials, then report your results.
I think scenario A is different from scenario B, and that’s the one I was thinking of (it’s the “run subjects until you hit significance or run out of funding” model).
A new program confirms my intuition about the question I had been thinking of when I decided to test it. I agree with Eliezer that it shouldn’t matter whether the researcher goes to a certain number of trials or a certain number of positive results, but I disagree with the implication that the same dataset always gives you the same information.
The program is here, you can fiddle with the parameters if you want to look at the result yourself.
formatting sucks
Try this:
I did. It didn’t indent properly. I tried again, and it still doesn’t.
Upvoted for actually testing the theory :)
I don’t believe this is true. Every individual trial is individual Bayesian evidence, unrelated to the rest of the trials except in the fact that your priors are different. If you run until significance you will have updated to a certain probability, and if you run until you’re bored you’ll also have updated to a certain probability.
Sure, if you run a different amount of trials, you may end up with a different probability. At worst, if you keep going until you’re bored, you may end up with results insignificant for the strict rules of “proof” in Science. But as long as you use Bayesian updating, neither method produces some form of invalid results.
Ding ding ding! That’s my hindsight-bias-reminder-heuristic going off. It tells me when I need to check myself for hindsight bias, and goes off on thoughts like “That seems obvious in retrospect” and “I knew that all along.” At the risk of doing your thinking for you, I’d say this is a case of hindsight bias: It wasn’t obvious beforehand, since otherwise you wouldn’t have felt the need to do the test. This means it’s not an obvious concept in the first place, and only becomes clear when you consider it more closely, which you did. Then saying that “it’s obvious in retrospect” has no value, and actually devalues the time you put in.
Try this:
(From the Comment Formatting Help)
You have to be very careful you’re actually asking the same question in both cases. In the case I tested above, I was asking exactly the same question (my intuition said very strongly that I wasn’t, but that’s because I was thinking of the very similar but subtly different question below). The “fairly obvious in retrospect” refers to that particular phrasing of the problem (I would have immediately understood that the probabilities had to be equal if I had phrased it that way, but since I didn’t, that insight was a little harder-earned).
The question I was actually thinking of is as follows.
Scenario A: You run 12 trials, then check whether your odds ratio reaches significance and report your results.
Scenario B: You run trials until either your odds ratio reaches significance or you hit 12 trials, then report your results.
I think scenario A is different from scenario B, and that’s the one I was thinking of (it’s the “run subjects until you hit significance or run out of funding” model).
A new program confirms my intuition about the question I had been thinking of when I decided to test it. I agree with Eliezer that it shouldn’t matter whether the researcher goes to a certain number of trials or a certain number of positive results, but I disagree with the implication that the same dataset always gives you the same information.
The program is here, you can fiddle with the parameters if you want to look at the result yourself.
I did. It didn’t indent properly. I tried again, and it still doesn’t.