I think 2 is uncontroversial, other than if you have a perfect prior why do any experiment at all?
By perfect I mean well calibrated. I don’t see why knowing that your priors in general are well calibrated implies that more information doesn’t have positive expected utility.
The issue is that with optional stopping you bias the Bayes factor.
Only in some cases, and only with regard to someone who knows more than the Bayesian. The Bayesian himself can’t predict that the factor will be biased; the expected factor should be 1. It’s only someone who knows better that can predict this.
So let’s think of this like a frequentist who has a laboratory full of bayesians in cages.
Before I analyse this case, can you clarify whether the hypothesis happens to be true, false, or chosen at random? Also give these Bayesians’ priors, and perhaps an example of the rule you’d use.
Before I analyse this case, can you clarify whether the hypothesis happens to be true, false, or chosen at random? Also give these Bayesians’ priors, and perhaps an example of the rule you’d use.
Again, the prior doesn’t matter, they are computing Bayes factors. We are talking about Bayes factors. Bayes factors. Prior doesn’t matter. Bayes factors. Prior.Doesn’t.Matter. Bayes factors. Prior.Doesn’t.Matter. Bayes.factor.
Let’s say the null is true, but the frequentist mastermind has devised some data generating process that (let’s say he has infinite data at his disposal) that can produce evidence in favor of competing hypothesis at a Bayes factor of 3, 99% of the time.
Again, the prior doesn’t matter, they are computing Bayes factors.
It matters here, because you said “So you might be able to create a rule that fools 99 out of the 100 Bayesians”. The probability of getting data given a certain rule depends on which hypothesis is true, and if we’re assuming the hypothesis is like the prior, then we need to know the prior to calculate those numbers.
Let’s say the null is true, but the frequentist mastermind has devised some data generating process that (let’s say he has infinite data at his disposal) that can produce evidence in favor of competing hypothesis at a Bayes factor of 3, 99% of the time.
Using either Bayesian HDI with ROPE, or a Bayes factor, the false alarm rate asymptotes at a level far less than 100% (e.g., 20-25%). In other words, using Bayesian methods, the null hypothesis is accepted when it is true, even with sequential testing of every datum, perhaps 75-80% of the time.
In fact, you can show easily that this can succeed at most 33% of the time. By definition, the Bayes factor is how likely the data is given one hypothesis, divided by how likely the data is given the other. The data in the class “results in a bayes factor of 3 against the null” has a certain chance of happening given that the null is true, say p. This class of course contains many individual mutually exclusive sets of data, each with a far lower probability, but they sum to p. Now, the chance of this class of possible data sets happening given that the null is not true has an upper bound of 1. Each individual probability (which collectively sum to at most 1) must be 3 times as much as the corresponding probability in the group that sums to p. Ergo, p is upper bounded by 33%.
In simulation, I start to asymptote to around 20%, with a coin flip, but estimating mean from a normal distribution (with the null being 0) with fixed variance I keep climbing indefinitely. If you are willing to sample literally forever it seems like you’d be able to convince the Bayesian that the mean is not 0 with arbitrary Bayes factor. So for large enough N in a sample, I expect you can get a factor of 3 for 99⁄100 of the Bayesians in cages (so long as that last Bayesian is really, really sure the value is 0).
But it doesn’t change the results if we switch and say we fool 33% of the Bayesians with Bayes factor of 3. We are still fooling them.
If you are willing to sample literally forever it seems like you’d be able to convince the Bayesian that the mean is not 0 with arbitrary Bayes factor.
Instead, as pointed out by Edwards et al. (1963, p. 239):
“(...) if you set out to collect data until your posterior probability for a hypothesis which
unknown to you is true has been reduced to .01, then 99 times out of 100 you will never
make it, no matter how many data you, or your children after you, may collect (...)”.
If you can generate arbitrarily high Bayes factors, then you can reduce your posterior to .01, which means that it can only happen 1 in 100 times. You can never have a guarantee of always getting strong evidence for a false hypothesis. If you find a case that does, it will be new to me and probably change my mind.
But it doesn’t change the results if we switch and say we fool 33% of the Bayesians with Bayes factor of 3. We are still fooling them.
That doesn’t concern me. I’m not going to argue for why, I’ll just point out that if it is a problem, it has absolutely nothing to do with optional stopping. The exact same behavior (probability 1⁄3 of generating a Bayes factor of 3 in favor of a false hypothesis) shows up in the following case: a coin either always lands on heads, or lands on heads 1⁄3 of the time and tails 2⁄3 of the time. I flip the coin a single time. Let’s say the coin is the second coin. There’s a 33% chance of getting heads, which would produce a Bayes factor of 3 in favor of the 100%H coin.
If there’s something wrong with that, it’s a problem with classic Bayes, not optional stopping.
It is my thesis that every optional stopping so-called paradox can be converted into a form without optional stopping, and those will be clearer as to whether the problem is real or not.
I can check my simulation for bugs. I don’t have the referenced textbook to check the result being suggested.
It is my thesis that every optional stopping so-called paradox can be converted into a form without optional stopping, and those will be clearer as to whether the problem is real or not.
The first part of this is trivially true. Replace the original distribution with the sampling distribution from the stopped problem, and it’s not longer a stopped problem, it’s normal pulls from that sampling distribution.
I’m not sure it’s more clear,I think it is not. Your “remapped” problem makes it look like it’s a result of low-data-volume and not a problem of how the sampling distribution was actually constructed.
Replace the original distribution with the sampling distribution from the stopped problem, and it’s not longer a stopped problem, it’s normal pulls from that sampling distribution.
How would this affect a frequentist?
I’m not sure it’s more clear,I think it is not. Your “remapped” problem makes it look like it’s a result of low-data-volume and not a problem of how the sampling distribution was actually constructed.
I’m giving low data because those are the simplest kinds of cases to think of. If you had lots of data with the same distribution/likelihood, it would be the same. I leave it as an exercise to find a case with lots of data and the same underlying distribution …
I was mainly trying to convince you that nothing’s actually wrong with having 33% false positive rate in contrived cases.
It doesn’t the frequentist is already measuring with the sample distribution. That is how frequentism works.
I was mainly trying to convince you that nothing’s actually wrong with having 33% false positive rate in contrived cases.
I mean it’s not “wrong” but if you care about false positive rates and there is a method had has a 5% false positive rate, wouldn’t you want to use that instead?
By perfect I mean well calibrated. I don’t see why knowing that your priors in general are well calibrated implies that more information doesn’t have positive expected utility.
Only in some cases, and only with regard to someone who knows more than the Bayesian. The Bayesian himself can’t predict that the factor will be biased; the expected factor should be 1. It’s only someone who knows better that can predict this.
Before I analyse this case, can you clarify whether the hypothesis happens to be true, false, or chosen at random? Also give these Bayesians’ priors, and perhaps an example of the rule you’d use.
Again, the prior doesn’t matter, they are computing Bayes factors. We are talking about Bayes factors. Bayes factors. Prior doesn’t matter. Bayes factors. Prior.Doesn’t.Matter. Bayes factors. Prior.Doesn’t.Matter. Bayes.factor.
Let’s say the null is true, but the frequentist mastermind has devised some data generating process that (let’s say he has infinite data at his disposal) that can produce evidence in favor of competing hypothesis at a Bayes factor of 3, 99% of the time.
It matters here, because you said “So you might be able to create a rule that fools 99 out of the 100 Bayesians”. The probability of getting data given a certain rule depends on which hypothesis is true, and if we’re assuming the hypothesis is like the prior, then we need to know the prior to calculate those numbers.
That’s impossible. http://doingbayesiandataanalysis.blogspot.com/2013/11/optional-stopping-in-data-collection-p.html goes through the math.
In fact, you can show easily that this can succeed at most 33% of the time. By definition, the Bayes factor is how likely the data is given one hypothesis, divided by how likely the data is given the other. The data in the class “results in a bayes factor of 3 against the null” has a certain chance of happening given that the null is true, say p. This class of course contains many individual mutually exclusive sets of data, each with a far lower probability, but they sum to p. Now, the chance of this class of possible data sets happening given that the null is not true has an upper bound of 1. Each individual probability (which collectively sum to at most 1) must be 3 times as much as the corresponding probability in the group that sums to p. Ergo, p is upper bounded by 33%.
I think this is problem dependent.
In simulation, I start to asymptote to around 20%, with a coin flip, but estimating mean from a normal distribution (with the null being 0) with fixed variance I keep climbing indefinitely. If you are willing to sample literally forever it seems like you’d be able to convince the Bayesian that the mean is not 0 with arbitrary Bayes factor. So for large enough N in a sample, I expect you can get a factor of 3 for 99⁄100 of the Bayesians in cages (so long as that last Bayesian is really, really sure the value is 0).
But it doesn’t change the results if we switch and say we fool 33% of the Bayesians with Bayes factor of 3. We are still fooling them.
No, there’s a limit on that as well. See http://www.ejwagenmakers.com/2007/StoppingRuleAppendix.pdf
If you can generate arbitrarily high Bayes factors, then you can reduce your posterior to .01, which means that it can only happen 1 in 100 times. You can never have a guarantee of always getting strong evidence for a false hypothesis. If you find a case that does, it will be new to me and probably change my mind.
That doesn’t concern me. I’m not going to argue for why, I’ll just point out that if it is a problem, it has absolutely nothing to do with optional stopping. The exact same behavior (probability 1⁄3 of generating a Bayes factor of 3 in favor of a false hypothesis) shows up in the following case: a coin either always lands on heads, or lands on heads 1⁄3 of the time and tails 2⁄3 of the time. I flip the coin a single time. Let’s say the coin is the second coin. There’s a 33% chance of getting heads, which would produce a Bayes factor of 3 in favor of the 100%H coin.
If there’s something wrong with that, it’s a problem with classic Bayes, not optional stopping.
It is my thesis that every optional stopping so-called paradox can be converted into a form without optional stopping, and those will be clearer as to whether the problem is real or not.
I can check my simulation for bugs. I don’t have the referenced textbook to check the result being suggested.
The first part of this is trivially true. Replace the original distribution with the sampling distribution from the stopped problem, and it’s not longer a stopped problem, it’s normal pulls from that sampling distribution.
I’m not sure it’s more clear,I think it is not. Your “remapped” problem makes it look like it’s a result of low-data-volume and not a problem of how the sampling distribution was actually constructed.
You can see http://projecteuclid.org/euclid.aoms/1177704038, which proves the result.
How would this affect a frequentist?
I’m giving low data because those are the simplest kinds of cases to think of. If you had lots of data with the same distribution/likelihood, it would be the same. I leave it as an exercise to find a case with lots of data and the same underlying distribution …
I was mainly trying to convince you that nothing’s actually wrong with having 33% false positive rate in contrived cases.
It doesn’t the frequentist is already measuring with the sample distribution. That is how frequentism works.
I mean it’s not “wrong” but if you care about false positive rates and there is a method had has a 5% false positive rate, wouldn’t you want to use that instead?
If for some reason low false positive rates were important, sure. If it’s important enough to give up consistency.