And the point in the paper I linked has nothing to do with the prior, it’s about the bayes factor, which is independent of the prior.
Let me put it differently. Yes, your chance of getting a bayes factor of >3 is 1.8 with data peeking, as opposed to 1% without; but your chance of getting a higher factor also goes down, because you stop as soon as you reach 3. Your expected bayes factor is necessarily 1 weighted over your prior; you expect to find evidence for neither side. Changing the exact distribution of your results won’t change that.
My original response to this was wrong and has been deleted
I don’t think this has anything to do with logs, but rather that it is about the difference between probabilities and odds. Specifically, the Bayes factor works on the odds scale but the proof for conservation of expected evidence is on the regular probability scale
If you consider the posterior under all possible outcomes of the experiment, the ratio of the posterior probability to the prior probability will on average be 1 (when weighted by the probability of the outcome under your prior). However, the ratio of the posterior probability to the prior probability is not the same thing as the Bayes factor.
If you multiply the Bayes factor by the prior odds, and then transform the resulting quantity (ie the posterior) from the odds scale to a probability, and then divide by the prior probability, the resulting quantity will on average be 1
However, this is too complicated and doesn’t seem like a property that gives any additional insight on the Bayes factor..
I’m claiming the second. I was framing it in my mind as “on average, the factor will be 1”, but the kind of “average” required is the average log on further thought. I should probably use log in the future for statements like that.
This seems wrong then. Imagine you have two hypotheses, which you place equal probability on but then will see an observation that definitively selects one of the two as correct. E[p(x)] = 1⁄2 both before and after the observation, but E[log p (x)] is −1 vs—infinity.
I think we need to use actual limits then, instead of handwaving infinities. So let’s say the posterior for the unfavored hypothesis is e->0 (and is the same for both sides). The Bayes factor for the first hypothesis being confirmed is then (1-e)*3/(3e/2), which http://www.wolframalpha.com/input/?i=%281-e%29*3%2F%283e%2F2%29 simplifies to 2/e − 2. The Bayes factor for the second being confirmed is 3e/((1-e)3/2), which is again simplified http://www.wolframalpha.com/input/?i=3e%2F%28%281-e%293%2F2%29 to (2e)/(1-e).
Now, let me digress and derive the probability of finding evidence for each hypothesis; it’s almost but not quite 1/3:2/3. There’s a prior of 1⁄3 of the first hypothesis being true; this must equal the weighted expectation of the posteriors, by conservation of evidence. So if we call x the chance of finding evidence for hypothesis one, then x*(1-e)+(1-x)*e must equal 1⁄3. http://www.wolframalpha.com/input/?i=x*%281-e%29%2B%281-x%29*e%3D1%2F3+solve+for+x solves
x = (1-3 e)/(3-6 e)
which as a sanity check, does in fact head towards 1⁄3 as e goes towards 0. The corresponding probability of finding evidence for the second hypothesis is 1-x=(2-3 e)/(3-6 e).
Getting back to expected logs of Bayes factors, the chance of getting a bayes factor of 2/e − 2 is (1-3 e)/(3-6 e), while the chance of getting (2e)/(1-e) is (2-3 e)/(3-6 e).
Log of the first, times its probability, plus log of the second, times its probability, is http://www.wolframalpha.com/input/?i=log+%282%2Fx+-+2%29*+%281-3+x%29%2F%283-6+x%29%2Blog%28%282x%29%2F%281-x%29%29*+%282-3+x%29%2F%283-6+x%29%2Cx%3D.01 not zero.
Hm. I’ll need to think this over, this wasn’t what I expected. Either I made some mistake, or am misunderstanding something here. Let me think on this for a bit.
I think it’s not going to work out. The expected posterior is equal to the prior, but the expected log Bayes factor will have the form p log(K1) + (1-p) log(K2), which for general p is just a mess. Only when p=1/2 does it simplify to log(K1 K2), and when p=1/2, K2=1/K1, so the whole thing is zero.
Okay, so I think I worked out where my failed intuition got it from. The Bayes facter is the ratio of posterior/prior for hypothesis a, divided by the ratio for hypothesis B. The top of that is expected to be 1 (because the expected posterior over the prior is one, factoring out the prior in each case keeps that fraction constant), and the bottom is also (same argument), but the expected ratio of two numbers expected to be one is not always one. So my brain turned “denominator and numerator one” into “ratio one”.
I think it’s not going to work out. The expected posterior is equal to the prior, but the expected log Bayes factor will have the form p log(K1) + (1-p) log(K2), which for general p is just a mess. Only when p=1/2 does it simplify to log(K1 K2), and when p=1/2, K2=1/K1, so the whole thing is zero.
Your expected bayes factor is necessarily 1 weighted over your prior; you expect to find evidence for neither side.
I think this claim is correct on the natural scale except it should be weighted over probability of the data, not weighted over the prior. The margin of this comment is too small to contain the proof, so I’ll put a pdf in my public drop box folder at https://www.dropbox.com/s/vmom25u9ic7redu/Proof.pdf?dl=0
(I am slightly out of my depth here, I am not a mathematician or a Bayesian theorist, so I reserve the right to delete this comment if someone spots a flaw)
Let me put it differently. Yes, your chance of getting a bayes factor of >3 is 1.8 with data peeking, as opposed to 1% without; but your chance of getting a higher factor also goes down, because you stop as soon as you reach 3. Your expected bayes factor is necessarily 1 weighted over your prior; you expect to find evidence for neither side. Changing the exact distribution of your results won’t change that.
Should that say, rather, that its expected log is zero? A factor of n being as likely as a factor of 1/n.
My original response to this was wrong and has been deleted
I don’t think this has anything to do with logs, but rather that it is about the difference between probabilities and odds. Specifically, the Bayes factor works on the odds scale but the proof for conservation of expected evidence is on the regular probability scale
If you consider the posterior under all possible outcomes of the experiment, the ratio of the posterior probability to the prior probability will on average be 1 (when weighted by the probability of the outcome under your prior). However, the ratio of the posterior probability to the prior probability is not the same thing as the Bayes factor.
If you multiply the Bayes factor by the prior odds, and then transform the resulting quantity (ie the posterior) from the odds scale to a probability, and then divide by the prior probability, the resulting quantity will on average be 1
However, this is too complicated and doesn’t seem like a property that gives any additional insight on the Bayes factor..
That’s probably a better way of putting it. I’m trying to intuitively capture the idea of “no expected evidence”, you can frame that in multiple ways.
Huh? E[X] = 1 and E[\log(X)] = 0 are two very different claims; which one are you actually claiming?
Also, what is the expectation with respect to? Your prior or the data distribution or something else?
I’m claiming the second. I was framing it in my mind as “on average, the factor will be 1”, but the kind of “average” required is the average log on further thought. I should probably use log in the future for statements like that.
The prior.
This seems wrong then. Imagine you have two hypotheses, which you place equal probability on but then will see an observation that definitively selects one of the two as correct. E[p(x)] = 1⁄2 both before and after the observation, but E[log p (x)] is −1 vs—infinity.
In that case, your Bayes Factor will be either 2⁄0, or 0⁄2.
Log of the first is infinity, log of the second is negative infinity.
The average of those two numbers is insert handwave here 0.
(If you use the formula for log of divisions, this actually works).
Replace 1⁄2 and 1⁄2 in the prior with 1⁄3 and 2⁄3, and I don’t think you can make them cancel anymore.
I think we need to use actual limits then, instead of handwaving infinities. So let’s say the posterior for the unfavored hypothesis is e->0 (and is the same for both sides). The Bayes factor for the first hypothesis being confirmed is then (1-e)*3/(3e/2), which http://www.wolframalpha.com/input/?i=%281-e%29*3%2F%283e%2F2%29 simplifies to 2/e − 2. The Bayes factor for the second being confirmed is 3e/((1-e)3/2), which is again simplified http://www.wolframalpha.com/input/?i=3e%2F%28%281-e%293%2F2%29 to (2e)/(1-e).
Now, let me digress and derive the probability of finding evidence for each hypothesis; it’s almost but not quite 1/3:2/3. There’s a prior of 1⁄3 of the first hypothesis being true; this must equal the weighted expectation of the posteriors, by conservation of evidence. So if we call x the chance of finding evidence for hypothesis one, then x*(1-e)+(1-x)*e must equal 1⁄3. http://www.wolframalpha.com/input/?i=x*%281-e%29%2B%281-x%29*e%3D1%2F3+solve+for+x solves
which as a sanity check, does in fact head towards 1⁄3 as e goes towards 0. The corresponding probability of finding evidence for the second hypothesis is 1-x=(2-3 e)/(3-6 e).
Getting back to expected logs of Bayes factors, the chance of getting a bayes factor of 2/e − 2 is (1-3 e)/(3-6 e), while the chance of getting (2e)/(1-e) is (2-3 e)/(3-6 e).
Log of the first, times its probability, plus log of the second, times its probability, is http://www.wolframalpha.com/input/?i=log+%282%2Fx+-+2%29*+%281-3+x%29%2F%283-6+x%29%2Blog%28%282x%29%2F%281-x%29%29*+%282-3+x%29%2F%283-6+x%29%2Cx%3D.01 not zero.
Hm. I’ll need to think this over, this wasn’t what I expected. Either I made some mistake, or am misunderstanding something here. Let me think on this for a bit.
Hopefully I’ll update this soon with an answer.
I think it’s not going to work out. The expected posterior is equal to the prior, but the expected log Bayes factor will have the form p log(K1) + (1-p) log(K2), which for general p is just a mess. Only when p=1/2 does it simplify to log(K1 K2), and when p=1/2, K2=1/K1, so the whole thing is zero.
Okay, so I think I worked out where my failed intuition got it from. The Bayes facter is the ratio of posterior/prior for hypothesis a, divided by the ratio for hypothesis B. The top of that is expected to be 1 (because the expected posterior over the prior is one, factoring out the prior in each case keeps that fraction constant), and the bottom is also (same argument), but the expected ratio of two numbers expected to be one is not always one. So my brain turned “denominator and numerator one” into “ratio one”.
I think it’s not going to work out. The expected posterior is equal to the prior, but the expected log Bayes factor will have the form p log(K1) + (1-p) log(K2), which for general p is just a mess. Only when p=1/2 does it simplify to log(K1 K2), and when p=1/2, K2=1/K1, so the whole thing is zero.
I think this claim is correct on the natural scale except it should be weighted over probability of the data, not weighted over the prior. The margin of this comment is too small to contain the proof, so I’ll put a pdf in my public drop box folder at https://www.dropbox.com/s/vmom25u9ic7redu/Proof.pdf?dl=0
(I am slightly out of my depth here, I am not a mathematician or a Bayesian theorist, so I reserve the right to delete this comment if someone spots a flaw)