The wording here makes me worry we’re Goodharting on quantity of predictions. And the best way to predict the community prediction is to (of course) wait for others to predict first, then match them...
If the user is interested in getting into the top ranks, this strategy won’t be anything like enough. And if not, but they want to maximize their score, the scoring system is still incentive compatible—they are better off reporting their true estimate on any given question. And for the worst (but still self-aware) predictors, this should be the metaculus prediction anyways—so they can still come away with a positive number of points, but not many. Anything much worse than that, yes, people could have negative overall scores—which, if they’ve predicted on a decent number of questions, is pretty strong evidence that they really suck at forecasting.
Looking at my track record, for questions resolved in the last 3 months, evaluated at all times, here’s how my log score looks compared to the community:
Binary questions (N=19): me: -.072 vs. community: -.045
Continuous questions (N=20): me: 2.35 vs. community: 2.33
So if anything, I’ve done a bit worse than the community overall, and am in 5th by virtue of predicting on all questions. It’s likely that the predictors significantly in front of me are that far ahead in part due to having predicted on (a) questions that have resolved recently but closed before I was active and (b) a longer portion of the lifespan for questions that were open before I became active.
Edit:
I discovered that the question set changes when I evaluate at “resolve time” and filter for the past 3 months, not sure why exactly. Numbers at resolve time:
Binary questions (N=102): me: .598 vs. community: .566
Continuous questions (N=92): me: 2.95 vs. community: 2.86
I think this weakens my case substantially, though I still think a bot that just predicts the community as soon as it becomes visible and updates every day would currently be at least top 10.
Anything much worse than that, yes, people could have negative overall scores—which, if they’ve predicted on a decent number of questions, is pretty strong evidence that they really suck at forecasting
I agree that this should have some effect of being less welcoming to newcomers, but I’m curious to what extent. I have seen plenty of people with worse brier scores than the median continuing to predict on GJO rather than being demoralized and quitting (disclaimer: survivorship bias).
The wording here makes me worry we’re Goodharting on quantity of predictions. And the best way to predict the community prediction is to (of course) wait for others to predict first, then match them...
If the user is interested in getting into the top ranks, this strategy won’t be anything like enough. And if not, but they want to maximize their score, the scoring system is still incentive compatible—they are better off reporting their true estimate on any given question. And for the worst (but still self-aware) predictors, this should be the metaculus prediction anyways—so they can still come away with a positive number of points, but not many. Anything much worse than that, yes, people could have negative overall scores—which, if they’ve predicted on a decent number of questions, is pretty strong evidence that they really suck at forecasting.
I think this isn’t true empirically for a reasonable interpretation of top ranks. For example, I’m ranked 5th on questions that have resolved in the past 3 months due to predicting on almost every question.
Looking at my track record, for questions resolved in the last 3 months, evaluated at all times, here’s how my log score looks compared to the community:
Binary questions (N=19): me: -.072 vs. community: -.045
Continuous questions (N=20): me: 2.35 vs. community: 2.33
So if anything, I’ve done a bit worse than the community overall, and am in 5th by virtue of predicting on all questions. It’s likely that the predictors significantly in front of me are that far ahead in part due to having predicted on (a) questions that have resolved recently but closed before I was active and (b) a longer portion of the lifespan for questions that were open before I became active.
Edit:
I discovered that the question set changes when I evaluate at “resolve time” and filter for the past 3 months, not sure why exactly. Numbers at resolve time:
Binary questions (N=102): me: .598 vs. community: .566
Continuous questions (N=92): me: 2.95 vs. community: 2.86
I think this weakens my case substantially, though I still think a bot that just predicts the community as soon as it becomes visible and updates every day would currently be at least top 10.
I agree that this should have some effect of being less welcoming to newcomers, but I’m curious to what extent. I have seen plenty of people with worse brier scores than the median continuing to predict on GJO rather than being demoralized and quitting (disclaimer: survivorship bias).
I think you get more points for earlier predictions.