What makes Bayesian “lose” in the cases proposed by Mayo and Simonsohn isn’t the inference, it’s the scoring rule. A Bayesian scores himself on total calibration, “number of times my 95% confidence interval includes the truth” is just a small part of it. You can generate an experiment that has a high chance (let’s say 99%) of making a Bayesian have a 20:1 likelihood ratio in favor of some hypothesis. By conservation of expected evidence, the same experiment might have 1% chance of generating close to a 2000:1 likelihood ratio against that same hypothesis. A frequentist could never be as sure of anything, this occasional 2000:1 confidence is the Bayesian’s reward. If you rig the rules to view something about 95% confidence intervals as the only measure of success, then the frequentist’s decision rule about accepting hypotheses at a 5% p-value wins, it’s not his inference that magically becomes superior.
Sometimes we might care about “total calibration” I guess, but sometimes we care about being actually calibrated in the rationalist sense. Sometimes we want a 95% confidence interval to mean that doing this 100 times will include the true value about 95 times.
My point was this idea that the stopping rule doesn’t matter is more complicated than calculating a Bayes factor and saying “look, the stopping rule doesn’t change the Bayes factor.”
My point was this idea that the stopping rule doesn’t matter is more complicated than calculating a Bayes factor and saying “look, the stopping rule doesn’t change the Bayes factor.”
The stopping rule won’t change the expectation of the Bayes factor.
Sometimes we want a 95% confidence interval to mean that doing this 100 times will include the true value about 95 times.
If your prior is correct, then your 95% credibility interval will, in fact, be well calibrated and be correct 95% of the time. I argued at length on tumblr that most or all of the force of the stopping rule objection to Bayes is a disguised “you have a bad prior” situation. If you’re willing to ask the question that way, you can generate similar cases without stopping rules as well. For instance, imagine there are two kinds of coins; ones that land on heads 100% of the time, and ones that land on heads 20% of the time. (The rest are tails.) You have one flip with the coin. Oh, one more thing: I tell you that there are 1 billion coins of the first kind, and only one of the second kind.
You flip the coin once. It’s easy to show that there’s an overwhelming probability of getting a 20:1 likelihood in favor of the first coin. Why is this problematic?
I can and have given a similar case for 95% credibility intervals as opposed to Bayes factors, which I’ll write out if you’re interested.
Sometimes we might care about “total calibration” I guess, but sometimes we care about being actually calibrated in the rationalist sense. Sometimes we want a 95% confidence interval to mean that doing this 100 times will include the true value about 95 times.
My point was this idea that the stopping rule doesn’t matter is more complicated than calculating a Bayes factor and saying “look, the stopping rule doesn’t change the Bayes factor.”
The stopping rule won’t change the expectation of the Bayes factor.
If your prior is correct, then your 95% credibility interval will, in fact, be well calibrated and be correct 95% of the time. I argued at length on tumblr that most or all of the force of the stopping rule objection to Bayes is a disguised “you have a bad prior” situation. If you’re willing to ask the question that way, you can generate similar cases without stopping rules as well. For instance, imagine there are two kinds of coins; ones that land on heads 100% of the time, and ones that land on heads 20% of the time. (The rest are tails.) You have one flip with the coin. Oh, one more thing: I tell you that there are 1 billion coins of the first kind, and only one of the second kind.
You flip the coin once. It’s easy to show that there’s an overwhelming probability of getting a 20:1 likelihood in favor of the first coin. Why is this problematic?
I can and have given a similar case for 95% credibility intervals as opposed to Bayes factors, which I’ll write out if you’re interested.