So one should interpret the points as a measure of how useful you’ve been to the overall predictions in the platform, and not how good you should be expected to be on a specific question, right?
Not really. Overall usefulness is really about something like covariance with the overall prediction—are you contributing different ideas and models. That would be very hard to measure, while making the points incentive compatible is not nearly as hard to do.
And how well an individual predictor will do, based on historical evidence, is found in comparing their brier to the metaculus prediction on the same set of questions. This is information which users can see on their own page. But it’s not a useful figure unless you’re asking about relative performance, which as an outsider interpreting predictions, you shouldn’t care about—because you want the aggregated prediction.
So one should interpret the points as a measure of how useful you’ve been to the overall predictions in the platform, and not how good you should be expected to be on a specific question, right?
Not really. Overall usefulness is really about something like covariance with the overall prediction—are you contributing different ideas and models. That would be very hard to measure, while making the points incentive compatible is not nearly as hard to do.
And how well an individual predictor will do, based on historical evidence, is found in comparing their brier to the metaculus prediction on the same set of questions. This is information which users can see on their own page. But it’s not a useful figure unless you’re asking about relative performance, which as an outsider interpreting predictions, you shouldn’t care about—because you want the aggregated prediction.