Users of that forecasting system may care about this tail. They may be willing to pay for improvements in the aggregate distributional forecast such that it better models an enlightened ideal. If it were quickly realized that 99.99% of the distribution was uniform, then any subsidies for information should go to those that did a good job improving the 0.001% tail. It’s possible that some pretty big changes to this tail could be figured out.
I’m really interested in this type of scheme because it would also solve a big problem in futarchy and futarchy-like setups that use prediction polling, namely, the inability to score conditional counterfactuals (which is most of the forecasting you’ll be doing in Futarchy-like setup).
One thing you could do instead of scoring people against expert assesments is also potentially score people against the final aggregate and extremized distribution.
One issue with any framework like this is that general calibration may be very different than calibration at the tails. Whatever scoring rule you’re using to determine calibration of experts or aggregate scoring has the same issue that long tail events rarely happen.
Another solution to this problem (although it doesn’t solve the counterfactual conditional problem) is to create tailored scoring rules that provide extra rewards for events at the tails. If an event at the tails is a million times less likely to happen, but you care about it equally to events at the center, then provide a million times reward for accuracy near the tail in the event it happens. Prior work on tailored scoring rules for different utility functions here: https://www.evernote.com/l/AAhVczys0ddF3qbfGk_s4KLweJm0kUloG7k/
One thing you could do instead of scoring people against expert assessments is also potentially score people against the final aggregate and extremized distribution.
I think that an efficient use of expert assessments would be for them to see the aggregate, and then basically adjust that as is necessary, but to try to not do much original research. I just wrote a more recent shortform post about this.
One issue with any framework like this is that general calibration may be very different than calibration at the tails.
I think that we can get calibration to be as good as experts can figure out, and that could be enough to be really useful.
I’m really interested in this type of scheme because it would also solve a big problem in futarchy and futarchy-like setups that use prediction polling, namely, the inability to score conditional counterfactuals (which is most of the forecasting you’ll be doing in Futarchy-like setup).
One thing you could do instead of scoring people against expert assesments is also potentially score people against the final aggregate and extremized distribution.
One issue with any framework like this is that general calibration may be very different than calibration at the tails. Whatever scoring rule you’re using to determine calibration of experts or aggregate scoring has the same issue that long tail events rarely happen.
Another solution to this problem (although it doesn’t solve the counterfactual conditional problem) is to create tailored scoring rules that provide extra rewards for events at the tails. If an event at the tails is a million times less likely to happen, but you care about it equally to events at the center, then provide a million times reward for accuracy near the tail in the event it happens. Prior work on tailored scoring rules for different utility functions here: https://www.evernote.com/l/AAhVczys0ddF3qbfGk_s4KLweJm0kUloG7k/
Good points!
Also, thanks for the link, that’s pretty neat.
I think that an efficient use of expert assessments would be for them to see the aggregate, and then basically adjust that as is necessary, but to try to not do much original research. I just wrote a more recent shortform post about this.
I think that we can get calibration to be as good as experts can figure out, and that could be enough to be really useful.