Overall agree that progress was very surprising and I’ll be thinking about how it affects my big picture views on AI risk and timelines; a few relatively minor nitpicks/clarifications below.
For instance, superforecaster Eli Lifland posted predictions for these forecasts on his blog.
I’m not a superforecaster (TM) though I think some now use the phrase to describe any forecasters with good ~generalist track records?
While he notes that the Hypermind interface limited his ability to provide wide intervals on some questions, he doesn’t make that complaint for the MATH 2022 forecast and posted the following prediction, for which the true answer of 50.3% was even more of an outlier than Hypermind’s aggregate:
[image]
The image in the post is for another question: below shows my prediction for MATH, though it’s not really more flattering. I do think my prediction was quite poor.
I didn’t run up to the maximum standard deviation here, but I probably would have given more weight to larger values if I had been able to forecast a mixture of components like on Metaculus. The resolution of 50.3% would very likely (90%) still have been above my 95th percentile though.
Hypermind’s interface has some limitations that prevent outputting arbitrary probability distributions. In particular, in some cases there is an artificial limit on the possible standard deviations, which could lead credible intervals to be too narrow.
I think this maybe (40% for my forecast) would have flipped the MMLU forecast to be inside the 90th credible interval, at least for mine and perhaps for the crowd.
In my notes on the MMLU forecast I wrote “Why is the max SD so low???”
Overall agree that progress was very surprising and I’ll be thinking about how it affects my big picture views on AI risk and timelines; a few relatively minor nitpicks/clarifications below.
I’m not a superforecaster (TM) though I think some now use the phrase to describe any forecasters with good ~generalist track records?
The image in the post is for another question: below shows my prediction for MATH, though it’s not really more flattering. I do think my prediction was quite poor.
I didn’t run up to the maximum standard deviation here, but I probably would have given more weight to larger values if I had been able to forecast a mixture of components like on Metaculus. The resolution of 50.3% would very likely (90%) still have been above my 95th percentile though.
I think this maybe (40% for my forecast) would have flipped the MMLU forecast to be inside the 90th credible interval, at least for mine and perhaps for the crowd.
In my notes on the MMLU forecast I wrote “Why is the max SD so low???”