Forecast’s midpoint brier score (measured at the midpoint between a question’s launch and resolution dates) across all closed Forecasts over the past few months is 0.204, a bit better than Good Judgement’s published result of 0.227 for prediction markets
The relative difficulty of the questions is probably important here, and the comparison “a bit better than Good Judgment” is probably misleading. In particular, I’d expect Good Judgement to have questions with longer time horizons (which are harder to forecast), if only because your platform is so young.
Our first priority is to build something that’s really fun for people who want to engage in rational debate about the future
How are you defining “really fun” as distinct from “addictive”?
Since June, the Forecast community has made more than 50,000 forecasts on a few hundred questions—and they’re actually reasonably accurate.
50,000 forecasts isn’t that much, maybe 30x the number of forecasts I’ve made, but if you scale this up to Facebook scale, I’d imagine you might be able to train a halfway decent ML system. I’d be keen to see a firm and binding ethical commitment which handles this eventuality before you accumulate the data, but I don’t know how that would look in the context of Facebook’s corporate structure and ethics track record.
Re: the comparison to good judgement: good point. I added an update in the text and edited the wording since we didn’t study relative question difficulty or time horizon.
Re: really fun vs addictive: we hope using Forecast brings people joy, helps them think and write more rationally, and helps improve conversations between people on contentious topics. At our current scale, this is something we mostly measure qualitatively. If/as we scale, we hope to understand how people experience Forecast more rigorously.
Re: scale and data: appreciate the feedback. This is something that is of utmost concern for us if/as we scale.
The relative difficulty of the questions is probably important here, and the comparison “a bit better than Good Judgment” is probably misleading. In particular, I’d expect Good Judgement to have questions with longer time horizons (which are harder to forecast), if only because your platform is so young.
How are you defining “really fun” as distinct from “addictive”?
50,000 forecasts isn’t that much, maybe 30x the number of forecasts I’ve made, but if you scale this up to Facebook scale, I’d imagine you might be able to train a halfway decent ML system. I’d be keen to see a firm and binding ethical commitment which handles this eventuality before you accumulate the data, but I don’t know how that would look in the context of Facebook’s corporate structure and ethics track record.
Hi there! A few comments
Re: the comparison to good judgement: good point. I added an update in the text and edited the wording since we didn’t study relative question difficulty or time horizon.
Re: really fun vs addictive: we hope using Forecast brings people joy, helps them think and write more rationally, and helps improve conversations between people on contentious topics. At our current scale, this is something we mostly measure qualitatively. If/as we scale, we hope to understand how people experience Forecast more rigorously.
Re: scale and data: appreciate the feedback. This is something that is of utmost concern for us if/as we scale.