Mathematical System For Calibration
I am working on an article titled “You Can Gain Information Through Psychoanalysing Others”, with the central thesis being with knowledge of the probability someone assigns a proposition, and their calibration, you can calculate a Bayesian probability estimate for the truthhood of that proposition.
For the article, I would need a rigorously mathematically defined system for calculating calibration given someone’s past prediction history. I thought of developing one myself, but realised it would be more prudent to inquire if one has already been invented to avoid reinventing the wheel.
Thanks in advance for your cooperation. :)
#Disclaimer
I am chronically afflicted with a serious and invariably fatal epistemic disease known as narcissist bias (this is a misnomer as it refers a broad family of biases). No cure is known yet for narcissist bias, and I’m currently working on cataloguing and documenting the disease in full using myself as a test case. This disease affects how I present and articulate my points—especially in written text—such that I assign a Pr of > 0.8 that a somebody would find this post condescending, self-aggrandising, grandiose or otherwise deluded. This seems to be a problem with all my writing, and a cost of living with the condition I guess. I apologise in advance for any offence received, and inform that I do not intend to offend anyone or otherwise hurt their sensibilities.
There’s no need to have your disclaimer on your posts where you talk about your narcism. It distracts from your post and doesn’t help.
What do you mean by knowing someone’s calibration? If it’s summarized in a single score over many kinds of predictions, then I’m not sure your idea can work. For example, imagine Bob is perfectly calibrated when predicting earthquakes, but overconfident when predicting meteors. That makes him overconfident on average, but when he predicts an earthquake, you shouldn’t assume that he’s overconfident and update accordingly.
I put a question about how to measure Calibration on statistics StackExchange: https://stats.stackexchange.com/questions/253443/how-do-i-choose-the-best-metric-to-measure-my-calibration
That’s absolutely straightforward Bayes updating.
Not exactly.
(1) What is the family of calibration curves you’re updating on? These are functions from stated probabilities to ‘true’ probabilities, so the class of possible functions is quite large. Do we want a parametric family? A non-parametric family? We would like something which is mathematically convenient, looks as much like typical calibration curves as possible, but which has a good ability to fit anomalous curves as well when those come up.
(2) What is the prior oven this family of curves? It may not matter too much if we plan on using a lot of data, but if we want to estimate people’s calibration quickly, it would be nice to have a decent prior. This suggests a hierarchical Bayesian approach (where we estimate a good prior distribution via a higher-order prior).
(3) As mentioned by cousin_it, we would actually want to estimate different calibration curves for different topics. This suggests adding at least one more level to the hierarchical Bayesian model, so that we can simultaneously estimate the general distribution of calibration curves in the population, the all-subject calibration curve for an individual, and the single-subject calibration curve for an individual. At this point, one might prefer to shut one’s eyes and ignore the complexity of the problem.
Sure. Any practical implementation will have to figure out all the practical details including the ones that you mention. But that’s implementation issues of something that is still straightforward Bayes, at least for a single individual. If you have a history of predictions and know the actual outcomes, you can even just plot the empirical calibration curve without any estimation involved.
Now, if you have multiple people involved, things become more interesting and probably call for something like Gelman’s favourite multilevel/hierarchical models. But that’s beyond what OP asked for—he wanted a “rigorously mathematically defined system” and that’s plain-vanilla Bayes.
While I don’t have my notes in front of me, I do recall from the decision analysis class I recently took that log score is related to the weight one would give to one forecaster among several when combining forecasts. Unfortunately it does not appear that the professor uploaded the slides on ensemble forecasting, so I can’t provide any more right now. I am emailing the professor. Thought this would help in the meantime.
I can see that I misremembered the lecture. Seems to be an application of Bayes as Lumifer suggested for the basic approach. Other more complex approaches were also discussed.