Why is it consistent that assigning a probability of 99% to one half of a binary proposition that turns out false is much better than assigning a probability of 1% to the opposite half that turns out true?
Why is it consistent that assigning a probability of 99% to one half of a binary proposition that turns out false is much better than assigning a probability of 1% to the opposite half that turns out true?
I think there’s some confusion. Coscott said these three facts:
Let f(x) be the output if the question is true, and let g(x) be the output if the question is false.
f(x)=g(1-x)
f(x)=log(x)
In consequence, g(x)=log(1-x). So if x=0.99 and the question is false, the output is g(x)=log(1-x)=log(0.01). Or if x=0.01 and the question is true, the output is f(x)=log(x)=log(0.01). So the symmetry that you desire is true.
But that doesn’t output 1 for estimates of 100%, 0 for estimates of 50%, and -inf (or even −1) to estimates of 0%, or even something that can be normalized to either of those triples.
Huh. I thought that wasn’t a Bayesian score (not maximized by estimating correctly), but doing the math the maximum is at the right point for 1⁄4, 1⁄100, 3⁄4, and 99⁄100, and 1⁄2.
I didn’t do that. I only set 1 to 100% and 0 to 50%. 0% is still negative infinity.
That’s the math error.
Why is it consistent that assigning a probability of 99% to one half of a binary proposition that turns out false is much better than assigning a probability of 1% to the opposite half that turns out true?
There’s no math error.
I think there’s some confusion. Coscott said these three facts:
In consequence, g(x)=log(1-x). So if x=0.99 and the question is false, the output is g(x)=log(1-x)=log(0.01). Or if x=0.01 and the question is true, the output is f(x)=log(x)=log(0.01). So the symmetry that you desire is true.
But that doesn’t output 1 for estimates of 100%, 0 for estimates of 50%, and -inf (or even −1) to estimates of 0%, or even something that can be normalized to either of those triples.
Here’s the “normalized” version: f(x)=1+log2(x), g(x)=1+log2(1-x) (i.e. scale f and g by 1/log(2) and add 1).
Now f(1)=1, f(.5)=0, f(0)=-Inf ; g(1)=-Inf, g(.5)=0, g(0)=1.
Ok?
Huh. I thought that wasn’t a Bayesian score (not maximized by estimating correctly), but doing the math the maximum is at the right point for 1⁄4, 1⁄100, 3⁄4, and 99⁄100, and 1⁄2.