Consider an error which is 0.5 22% of the time, 1.1 78% of the time. The squared errors are 0.25 and 1.21. The median error is 1.1 > 1. (The mean squared error is 1)
1/ You aren’t “[assuming] the errors are normally distributed”. (Since a mixture of two normals isn’t normal) in what you’ve written above.
2/ If your assumption is X∼N(0,1) then yes, I agree the median ofX2 is ~0.45 (although
from scipy import stats
stats.chi2.ppf(.5, df=1)
>>> 0.454936
would have been an easier way to illustrate your point). I think this is actually the assumption you’re making. [Which is a horrible assumption, because if it were true, you would already be perfectly calibrated].
3/ I guess you’re new claim is “[assuming] the errors are a mixture of normal distributions, centered at 0”, which okay, fine that’s probably true, I don’t care enough to check because it seems a bad assumption to make.
More importantly, there’s a more fundamental problem with your post. You can’t just take some numbers from my post and then put them in a different model and think that’s in some sense equivalent. It’s quite frankly bizarre. The equivalent model would be something like:
Our ability to talk past each other is impressive :)
would have been an easier way to illustrate your point). I think this is actually the assumption you’re making. [Which is a horrible assumption, because if it were true, you would already be perfectly calibrated].
Yes this is almost the assumption I am making, the general point of this post is to assume that all your predictions follow a Normal distribution, with μ as “guessed” and with a σ that is different from what you guessed, and then use X2 to get a point estimate for the counterfactual σ you should have used. And as you point out if (counterfactual) σ=1 then the point estimate suggests you are well calibrated.
I am making the simple observation that the median error is less than one because the mean squares error is one.
That isn’t a “simple” observation.
Consider an error which is 0.5 22% of the time, 1.1 78% of the time. The squared errors are 0.25 and 1.21. The median error is 1.1 > 1. (The mean squared error is 1)
Yes you are right, but under the assumption the errors are normal distributed, then I am right:
If:
p∼Bern(0.78)σ=p×N(0,1.1)+(p−1)N(0,0.5)
Then E[σ2]≈0.37 Which is much less than 1.
proof:
Under what assumption?
1/ You aren’t “[assuming] the errors are normally distributed”. (Since a mixture of two normals isn’t normal) in what you’ve written above.
2/ If your assumption is X∼N(0,1) then yes, I agree the median ofX2 is ~0.45 (although
would have been an easier way to illustrate your point). I think this is actually the assumption you’re making. [Which is a horrible assumption, because if it were true, you would already be perfectly calibrated].
3/ I guess you’re new claim is “[assuming] the errors are a mixture of normal distributions, centered at 0”, which okay, fine that’s probably true, I don’t care enough to check because it seems a bad assumption to make.
More importantly, there’s a more fundamental problem with your post. You can’t just take some numbers from my post and then put them in a different model and think that’s in some sense equivalent. It’s quite frankly bizarre. The equivalent model would be something like:
p∼Bern(0.78)
σ∼p⋅N(1.1,ε)+(1−p)∼N(0.5,ε)
Our ability to talk past each other is impressive :)
Yes this is almost the assumption I am making, the general point of this post is to assume that all your predictions follow a Normal distribution, with μ as “guessed” and with a σ that is different from what you guessed, and then use X2 to get a point estimate for the counterfactual σ you should have used. And as you point out if (counterfactual) σ=1 then the point estimate suggests you are well calibrated.
In the post counter factual σ is ^σz