I am extremely interested in these sorts of questions myself (message me if you would want to chat more about them). In terms of the relation between accuracy and calibration, I think you might be able to see some of this relation from Open Philanthropy’s report on the quality of their predictions. In footnote 10, I believe they decompose Brier score into a term for miscalibration, a term for resolution, and a term for entropy.
Also, would you be able to explain a bit how it would be possible for someone who is perfectly calibrated at predicting rain to predict rain at 90% probability but the Bayes factor based on that information to not by 9? To me it seems like for someone to be perfectly calibrated at the 90% confidence level the ratio of it having rained to it not having rained whenever they predict 90% rain has to be 9:1 so P(say rain 90% | rain) = 90% and P(say rain 90% | no rain)=10%?
Hey, thanks for the answer and sorry for my very late response. In particular thanks for the link to the OpenPhil report, very interesting! To your question—I now changed my mind again and tentatively think that you are right. Here’s how I think about it now, but I still feel unsure whether I made a reasoning error somewhere:
There’s some distribution of your probabilistic judgments that shows how frequently you report a given probability in a proposition that turned out to be true. It might show e.g. that for true propositions you report 90% probability in 10% of all your probability judgements. This might be the case even if you are perfectly calibrated as long as, for false propositions, you report 90% in (10/9) % of all your probability judgements. Then, it would still be the case that 90% of your 90% probability judgements turn out to be true—and hence you are perfectly calibrated at 90%.
So, given these assumptions, what would the Bayes factor for your 90% judgement in “rain today” be? P(you give rain 90%|rain) should be 10% since I’m sort of randomly sampling your 90% judgement from the distribution where your 90% judgement occurs 10% of the time. For the same reason, P(you give rain 90%|no rain) = 10⁄9 %. Therefore, the Bayes factor is 10%/(10/9)% = 9.
I suspect that my explanation is overly complicated feel free to point out more elegant ones :)
I am extremely interested in these sorts of questions myself (message me if you would want to chat more about them). In terms of the relation between accuracy and calibration, I think you might be able to see some of this relation from Open Philanthropy’s report on the quality of their predictions. In footnote 10, I believe they decompose Brier score into a term for miscalibration, a term for resolution, and a term for entropy.
Also, would you be able to explain a bit how it would be possible for someone who is perfectly calibrated at predicting rain to predict rain at 90% probability but the Bayes factor based on that information to not by 9? To me it seems like for someone to be perfectly calibrated at the 90% confidence level the ratio of it having rained to it not having rained whenever they predict 90% rain has to be 9:1 so P(say rain 90% | rain) = 90% and P(say rain 90% | no rain)=10%?
Hey, thanks for the answer and sorry for my very late response. In particular thanks for the link to the OpenPhil report, very interesting! To your question—I now changed my mind again and tentatively think that you are right. Here’s how I think about it now, but I still feel unsure whether I made a reasoning error somewhere:
There’s some distribution of your probabilistic judgments that shows how frequently you report a given probability in a proposition that turned out to be true. It might show e.g. that for true propositions you report 90% probability in 10% of all your probability judgements. This might be the case even if you are perfectly calibrated as long as, for false propositions, you report 90% in (10/9) % of all your probability judgements. Then, it would still be the case that 90% of your 90% probability judgements turn out to be true—and hence you are perfectly calibrated at 90%.
So, given these assumptions, what would the Bayes factor for your 90% judgement in “rain today” be?
P(you give rain 90%|rain) should be 10% since I’m sort of randomly sampling your 90% judgement from the distribution where your 90% judgement occurs 10% of the time. For the same reason, P(you give rain 90%|no rain) = 10⁄9 %. Therefore, the Bayes factor is 10%/(10/9)% = 9.
I suspect that my explanation is overly complicated feel free to point out more elegant ones :)