Congratulations! I wish we could have collaborated while I was in school, but I don’t think we were researching at the same time. I haven’t read your actual papers, so feel free to answer “you should check out the paper” to my comments.
For chapter 4: From the high level summary here it sounds like you’re offloading the task of aggregation to the forecasters themselves. It’s odd to me that you’re describing this as arbitrage. Also, I have frequently seen the scoring rule be used with some intermediary function to determine monetary rewards. For example, when I worked with IARPA on geopolitical forecasting, our forecasters would get financial rewards depending on what percentile they were in relative to other forecasters. One would imagine that this would eliminate the incentive to report the aggregate as your own answer, but there’s a reason we (the researcher/platform/website) aggregate individual forecasts! It’s actually just more accurate under typical conditions. In theory an individual forecaster could improve that aggregate by forming their own independent forecast before seeing the work of others, and then aggregating, but in practice the impact of an individual forecast is quite small. I’ll have to read about QA pooling, it’s surprising to me that you could disincentivize forecasters from reporting the aggregate as their individual forecast.
For chapter 7: It seems to me that under sufficiently pessimistic conditions, there would be no good way to aggregate those two forecasts. For example, if Alice and Bob are forecasting “Will AI cause human extinction in the next 100 years?”, they both might individually forecast ~0% for different reasons. Alice believes it is impossible for AI to get powerful enough to cause human extinction, but if it were capable of acting it would kill us all. Bob believes any agent smart enough to be that powerful would necessarily be morally upstanding and believes it’s extremely likely that it will be built. Any reasonable aggregation strategy will put the aggregate at ~0% because each individual forecast is ~0%, but if they were to communicate with one another they would likely arrive at a much higher number. I suspect that you address this in the assumptions of the model in the actual paper.
Congrats again, I enjoyed your high level summary and might come back for a more detailed read of your papers.
I’m pretty sure that this is incorrect compared to healthcare more broadly, although the best I can come up with is this meta-analysis: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0226361&type=printable
Which has this to say: