I disagree with that characterisation of our disagreement, I think it’s far more fundamental than that.
I think you misrepresent the nature of forecasting (in it’s generality) versus modelling in some specifics
I think your methodology is needlessly complicated
I propose what I think is a better methodology
To expand on 1. I think (although I’m not certain, because I find your writing somewhat convoluted and unclear) that you’re making an implicit assumption that the error distribution is consistent from forecast to forecast. Namely your errors when forecasting COVID deaths and Biden’s vote share come from some similar process. This doesn’t really mirror my experience in forecasting. I think this model makes much more sense when looking at a single model which produces lots of forecasts. For example, if I had a model for COVID deaths each week, and after 5-10 weeks I noticed that my model was under or over confident then this sort of approach might make sense to tweak my model.
To expand on 2. I’ve read your article a few times and I still don’t fully understand what you’re getting at. As far as I can tell, you’re proposing a model for how to adjust your forecasts based on looking at their historic performance. Having a specific model for doing this seems to miss the point of what forecasting in the real world is like. I’ve never created a forecast, and gone “hmm… usually when I forecast things with 20% they happen 15% of the time, so I’m adjusting my forecast down” (which is I think what you’re advocating) it’s more likely a notion of, “I am often over/under confident, when I create this model is there some source of variance I am missing / over-estimating?”. Setting some concrete rules for this doesn’t make much sense to me.
Yes, I do think it’s much simpler for people to look at a list of percentiles of things happening, to plot them, and then think “am I generally over-confident / under-confident”? I think it’s generally much easier for people to reason about percentiles than standard-deviations. (Yes, I know 68-95-99, but I don’t know without thinking quite hard what 1.4 sd or 0.5 sd means). I think leaning too heavily on the math tends to make people make some pretty obvious mistakes.
I am sorry if I have straw manned you, and I think your above post is generally correct.
I think we are cumming from two different worlds.
You are coming from Metaculus where people make a lot of predictions. Where having 50+ predictions is the norm and the thus looking at a U(0, 1) gives a lot of intuitive evidence of calibration.
I come from a world where people want to improve in all kids of ways, and one of them is prediction, few people write more than 20 predictions down a year, and when they do they more or less ALWAYS make dichotomous predictions. I expect many of my readers to be terrible at predicting just like myself.
You are reading a post with the message “raise the sanity waterline from 2% to 5% of your level” and asking “why is this better than making 600 predictions and looking at their inverse CDF”, and the answer is: it’s not, but it’s still relevant because most people do not make 600 predictions and do not know what an inverse CDF is. I am even explaining what an normal distribution is because I do not expect my audience to know...
You are absolutely correct they probably do not share an error distribution. But I am trying to get people from knowing 1 distribution to knowing 2.
Scot Alexander makes a “when I predict this” then “it really means that”, every year for his binary predictions, This gives him an intuitive feel for “I should adjust my odds up/down by x”. I am trying to do the same for Normal Distribution predictions, so people can check their predictions.
I agree your methodology is superior :), All I propose that people sometimes make continuous predictions, and if they want to start doing that and track how much they suck, then I give them instructions to quickly getting a number for how well it is going.
If you’re making ~20 predictions a year, you shouldn’t be doing any funky math to analyse your forecasts. Just go through each one after the fact and decide whether or not the forecast was sensible with the benefit of hindsight.
I am even explaining what an normal distribution is because I do not expect my audience to know...
I think this is exactly my point, if someone doesn’t know what a normal distribution is, maybe they should be looking at their forecasts in a fuzzier way than trying to back fit some model to them.
All I propose that people sometimes make continuous predictions, and if they want to start doing that and track how much they suck, then I give them instructions to quickly getting a number for how well it is going.
I disagree that’s all you propose. As I said in an earlier comment, I’m broadly in favour of people making continuous forecasts as they convey more information. You paired your article with what I believe is broadly bad advise around analysing those forecasts. (Especially if we’re talking about a sample of ~20 forecasts)
I would love you as a reviewer of my second post as there I will try to justify why I think this approach is better, you can even super dislike it before I publish if you still feel like that when I present my strongest arguments, or maybe convince me that I am wrong so I dont publish part 2 and make a partial retraction for this post :). There is a decent chance you are right as you are the stronger predictor of the two of us :)
I disagree with that characterisation of our disagreement, I think it’s far more fundamental than that.
I think you misrepresent the nature of forecasting (in it’s generality) versus modelling in some specifics
I think your methodology is needlessly complicated
I propose what I think is a better methodology
To expand on 1. I think (although I’m not certain, because I find your writing somewhat convoluted and unclear) that you’re making an implicit assumption that the error distribution is consistent from forecast to forecast. Namely your errors when forecasting COVID deaths and Biden’s vote share come from some similar process. This doesn’t really mirror my experience in forecasting. I think this model makes much more sense when looking at a single model which produces lots of forecasts. For example, if I had a model for COVID deaths each week, and after 5-10 weeks I noticed that my model was under or over confident then this sort of approach might make sense to tweak my model.
To expand on 2. I’ve read your article a few times and I still don’t fully understand what you’re getting at. As far as I can tell, you’re proposing a model for how to adjust your forecasts based on looking at their historic performance. Having a specific model for doing this seems to miss the point of what forecasting in the real world is like. I’ve never created a forecast, and gone “hmm… usually when I forecast things with 20% they happen 15% of the time, so I’m adjusting my forecast down” (which is I think what you’re advocating) it’s more likely a notion of, “I am often over/under confident, when I create this model is there some source of variance I am missing / over-estimating?”. Setting some concrete rules for this doesn’t make much sense to me.
Yes, I do think it’s much simpler for people to look at a list of percentiles of things happening, to plot them, and then think “am I generally over-confident / under-confident”? I think it’s generally much easier for people to reason about percentiles than standard-deviations. (Yes, I know 68-95-99, but I don’t know without thinking quite hard what 1.4 sd or 0.5 sd means). I think leaning too heavily on the math tends to make people make some pretty obvious mistakes.
I am sorry if I have straw manned you, and I think your above post is generally correct. I think we are cumming from two different worlds.
You are coming from Metaculus where people make a lot of predictions. Where having 50+ predictions is the norm and the thus looking at a U(0, 1) gives a lot of intuitive evidence of calibration.
I come from a world where people want to improve in all kids of ways, and one of them is prediction, few people write more than 20 predictions down a year, and when they do they more or less ALWAYS make dichotomous predictions. I expect many of my readers to be terrible at predicting just like myself.
You are reading a post with the message “raise the sanity waterline from 2% to 5% of your level” and asking “why is this better than making 600 predictions and looking at their inverse CDF”, and the answer is: it’s not, but it’s still relevant because most people do not make 600 predictions and do not know what an inverse CDF is. I am even explaining what an normal distribution is because I do not expect my audience to know...
You are absolutely correct they probably do not share an error distribution. But I am trying to get people from knowing 1 distribution to knowing 2.
Scot Alexander makes a “when I predict this” then “it really means that”, every year for his binary predictions, This gives him an intuitive feel for “I should adjust my odds up/down by x”. I am trying to do the same for Normal Distribution predictions, so people can check their predictions.
I agree your methodology is superior :), All I propose that people sometimes make continuous predictions, and if they want to start doing that and track how much they suck, then I give them instructions to quickly getting a number for how well it is going.
I still think you’re missing my point.
If you’re making ~20 predictions a year, you shouldn’t be doing any funky math to analyse your forecasts. Just go through each one after the fact and decide whether or not the forecast was sensible with the benefit of hindsight.
I think this is exactly my point, if someone doesn’t know what a normal distribution is, maybe they should be looking at their forecasts in a fuzzier way than trying to back fit some model to them.
I disagree that’s all you propose. As I said in an earlier comment, I’m broadly in favour of people making continuous forecasts as they convey more information. You paired your article with what I believe is broadly bad advise around analysing those forecasts. (Especially if we’re talking about a sample of ~20 forecasts)
I would love you as a reviewer of my second post as there I will try to justify why I think this approach is better, you can even super dislike it before I publish if you still feel like that when I present my strongest arguments, or maybe convince me that I am wrong so I dont publish part 2 and make a partial retraction for this post :). There is a decent chance you are right as you are the stronger predictor of the two of us :)
I’d be happy to.
I upvoted all comments in this thread for constructive criticism, response to it, and in the end even agreeing to review each other!