a few particularly enthusiastic (&smart) humans still perform at roughly this level (depending on how you measure performance), so you wouldn’t want it to be much lower, and
we only acknowledged that this is a fairly reasonable definition of superhuman performance—it’s authors in these papers who claimed that their models were (roughly) on par with, or better than the crowd forecast.
We made the deliberate choice of not getting too much into the details of what constitutes human-level/superhuman forecasting ability. We have a lot of opinions on this as well, but it is a topic for another post in order not to derail the discussion on what we think matters most here.
I think it is fair to say that Metaculus’ crowd forecast is not what would naively be thought of as a crowd average—the recency weighting does a lot of work, so a general claim that an individual AI forecaster (at say the 80th percentile of ability) is better than the human crowd is reasonable, unless specifically in the context of a Metaculus-type weighted forecast.
I agree it’s a high bar, but note that
a few particularly enthusiastic (&smart) humans still perform at roughly this level (depending on how you measure performance), so you wouldn’t want it to be much lower, and
we only acknowledged that this is a fairly reasonable definition of superhuman performance—it’s authors in these papers who claimed that their models were (roughly) on par with, or better than the crowd forecast.
We made the deliberate choice of not getting too much into the details of what constitutes human-level/superhuman forecasting ability. We have a lot of opinions on this as well, but it is a topic for another post in order not to derail the discussion on what we think matters most here.
I think it is fair to say that Metaculus’ crowd forecast is not what would naively be thought of as a crowd average—the recency weighting does a lot of work, so a general claim that an individual AI forecaster (at say the 80th percentile of ability) is better than the human crowd is reasonable, unless specifically in the context of a Metaculus-type weighted forecast.