I have no idea how this “Bayesian Judge” thing that uses probability estimates directly would even work.
Here’s an article on Bayesian aggregation of forecasts. Essentially, you look at past forecasts to get P(Bob: “rain”|rain) and P(Bob: “rain”|~rain). (You can just elicit those expert likelihoods, but if you want this to be a formula rather than a person, you need them to be the data you’re looking for instead of just suggestive of the data you’re looking for.) From just that, you could calibrate Bob to find out what P(rain|Bob: “rain”) and P(rain|Bob: “~rain”) are. When you also have data on past predictions from Alice, Charlie, and David, you can combine them and get a more sophisticated estimate than any individual expert. It’s generally able to notice things like “when Alice and Bob agree, they’re both wrong,” which you couldn’t find by just computing individual calibrations.
That is, this thing you’ve been talking about is a procedure that’s already been worked out and that I’ve personally performed. It’s typically only done for forecasters of mutually exclusive possibilities and is inappropriate for decision-makers for reasons I’ve already mentioned.
Here’s an article on Bayesian aggregation of forecasts. Essentially, you look at past forecasts to get P(Bob: “rain”|rain) and P(Bob: “rain”|~rain). (You can just elicit those expert likelihoods, but if you want this to be a formula rather than a person, you need them to be the data you’re looking for instead of just suggestive of the data you’re looking for.) From just that, you could calibrate Bob to find out what P(rain|Bob: “rain”) and P(rain|Bob: “~rain”) are. When you also have data on past predictions from Alice, Charlie, and David, you can combine them and get a more sophisticated estimate than any individual expert. It’s generally able to notice things like “when Alice and Bob agree, they’re both wrong,” which you couldn’t find by just computing individual calibrations.
That is, this thing you’ve been talking about is a procedure that’s already been worked out and that I’ve personally performed. It’s typically only done for forecasters of mutually exclusive possibilities and is inappropriate for decision-makers for reasons I’ve already mentioned.
neat!