Yeah, I don’t know the answer here, but I will also say that one flaw of the brier score is that its not even clear that these sorts of differences will be even all that meaningful. Like, what you actually want to know is, how much more information does one group here give over the other groups here, and how much credence should we assign to each of the groups (acting as if they were each hypotheses in a Bayes update) given their predictions on the data we have. And for that, you can just run the bayes update.
The brier score was chosen for forecasters as far as I can tell because its more fun than scoring yourself based on log-odds (equivalent to the bayes update thing). Its less sensitive to horribly bad predictions, and it has a bounded “how bad can you be”. Its also easier to explain and think about, and has a different incentive landscape for those trying to maximize their scores, which may be useful if you’re trying to elicit good predictions.
But if you’re trying to determine who you should listen to (ie in what proportion you should update your model given so-and-so says such-and-such) you can’t do better than a Bayes update (given the constraints), so just use that!
Yeah, I don’t know the answer here, but I will also say that one flaw of the brier score is that its not even clear that these sorts of differences will be even all that meaningful. Like, what you actually want to know is, how much more information does one group here give over the other groups here, and how much credence should we assign to each of the groups (acting as if they were each hypotheses in a Bayes update) given their predictions on the data we have. And for that, you can just run the bayes update.
The brier score was chosen for forecasters as far as I can tell because its more fun than scoring yourself based on log-odds (equivalent to the bayes update thing). Its less sensitive to horribly bad predictions, and it has a bounded “how bad can you be”. Its also easier to explain and think about, and has a different incentive landscape for those trying to maximize their scores, which may be useful if you’re trying to elicit good predictions.
But if you’re trying to determine who you should listen to (ie in what proportion you should update your model given so-and-so says such-and-such) you can’t do better than a Bayes update (given the constraints), so just use that!