I find the beginning of this post somewhat strange, and I’m not sure your post proves what you claim it does. You start out discussing what appears to be a combination of two forecasts, but present it as Bayesian updating. Recall that Bayes theorem says p(θ∣x)=p(x∣θ)p(θ)p(x). To use this theorem, you need both an x (your data / evidence), and a θ (your parameter). Using “posterior∝ prior × likelihood” (with priors p1,p2,p3 and likelihoods e1,e2,e3), you’re talking as if your expert’s likelihood equals p(x∣θ) – but is that true in any sense? A likelihood isn’t just something you multiply with your prior, it is a conditional pmf or pdf with a different outcome than your prior.
I can see two interpretations of what you’re doing at the beginning of your post:
You’re combining two forecasts. That is, with θ∈A,B,C being the outcome, you have your own pmf p1(θ) and the expert’s e=p2(θ), then combine them using p(θ)∝p1(θ)p2(θ). That’s fair enough, but I suppose p(θ)∝√p1(θ)p2(θ) or maybe p(θ)∝p1(θ)qp2(θ)1−q for some q∈[0,1] would be a better way to do it.
It might be possible to interpret your calculations as a proper application of Bayes’ rule, but that requires stretching it. Suppose θ is your subjective probability vector for the outcomes A,B,C and x is the subjective probability vector for the event supplied by an expert (the value of x is unknown to us). To use Bayes’ rule, we will have to say that the evidence vector e=p(x∣θ), the probability of observing an expert judgment of x given that θ is true. I’m not sure we ever observe such quantities directly, and it is pretty clear from your post that you’re talking about e=p2(θ) in the sense used above, not p(x∣θ).
Assuming interpretation 1, the rest of your calculations are not that interesting, as you’re using a method of knowledge pooling no one advocates.
Assuming interpretation 2, the rest of your calculations are probably incorrect. I don’t think there is a unique way to go from p(x∣θ)to, let’s say, p(x∗∣θ∗), wherex∗ is the expert’s probability vector over A,Ac and θ∗ your probability vector over A,Ac.
To use this theorem, you need both an x (your data / evidence), and a θ (your parameter).
Parameters are abstractions we use to simplify modelling. What we actually care about is the probability of unkown events given past observations.
You start out discussing what appears to be a combination of two forecasts
To clarify: this is not what I wanted to discuss. The expert is reporting how you should update your priors given the evidence, and remaining agnostic on what the priors should be.
A likelihood isn’t just something you multiply with your prior, it is a conditional pmf or pdf with a different outcome than your prior.
The whole point of Bayesianism is that it offer a precise, quantitative answer to how you should update your priors given some evidence—and that is multiplying by the likelihoods.
This is why it is often recommend in social sciences and elsewhere to report your likelihoods.
I’m not sure we ever observe [the evidence vector] directly
I agree this is not common in judgemental forecasting, where the whole updating process is very illegible. I think it holds for most Bayesian-leaning scientific reporting.
it is pretty clear from your post that you’re talking about e=p2(θ) in the sense used above, not p(x∣θ).
I am not, I am talking about evidence = likelihood vectors.
One way to think about this is that the expert is just informing us about how we should update our beliefs. “Given that the pandemic broke out in Wuhan, your subjective probability of a lab break should increase and it should increase by this amount”. But the final probability depends on your prior beliefs, that the expert cannot possibly know.
I don’t think there is a unique way to go from p(x∣θ)to, let’s say, p(x∗∣θ∗), wherex∗ is the expert’s probability vector over A,Ac and θ∗ your probability vector over A,Ac.
Yes! If I am understanding this right, I think this gets to the crux of the post. The compression is lossy, and neccessarily loses some information.
Okay, thanks for the clarification! Let’s see if I understand your setup correctly. Suppose we have the probability measures pE and p1, where pE is the probability measure of the expert. Moreover, we have an outcome x∈{A,B,C}.
In your post, you use p1(x∣z)∝pE(z∣x)p1(x), where z is an unknown outcome known only to the expert. To use Bayes’ rule, we must make the assumption that p1(z∣x)=pE(z∣x). This assumption doesn’t sound right to be, but I suppose some strange assumption is necessary for this simple framework. In this model, I agree with your calculations.
Yes! If I am understanding this right, I think this gets to the crux of the post. The compression is lossy, and necessarily loses some information.
I’m not sure. When we’re looking directly at the probability of an event x (instead of the probability of the probability an event), things get much simpler than I thought.
Let’s see what happens to the likelihood when you aggregate from the expert’s point of view. Letting x∈{A,B,C}, we need to calculate the expert’s likelihoods pE(z∣A) and pE(z∣B∪C). In this case,
which is essentially your calculations, but from the expert’s point of view. The likelihood pE(z∣B∪C) depends on pE(B∪C), the prior of the expert, which is unknown to you. That shouldn’t come as a surprise, as he needs to use the prior of in order to combine the probability of the events B and C.
But the calculations are exactly the same from your point of view, leading to
p1(z∣B∪C)=pE(z∣B)p1(B)+pE(z∣C)p1(C)p1(B)+p1(C)
Now, suppose we want to generally ensure that pE(z∣B∪C)=p1(z∣B∪C). Which is what I believe you want to do, and which seems pretty natural to do, at least since we’re allowed to assume that pE(z∣x)=p1(z∣x) for all simple events x. To ensure this, we will probably have to require that your priors are the same as the expert. In other words, your joint distributions are equal, or p1(z,x)=pE(z,x).
I find the beginning of this post somewhat strange, and I’m not sure your post proves what you claim it does. You start out discussing what appears to be a combination of two forecasts, but present it as Bayesian updating. Recall that Bayes theorem says p(θ∣x)=p(x∣θ)p(θ)p(x). To use this theorem, you need both an x (your data / evidence), and a θ (your parameter). Using “posterior∝ prior × likelihood” (with priors p1,p2,p3 and likelihoods e1,e2,e3), you’re talking as if your expert’s likelihood equals p(x∣θ) – but is that true in any sense? A likelihood isn’t just something you multiply with your prior, it is a conditional pmf or pdf with a different outcome than your prior.
I can see two interpretations of what you’re doing at the beginning of your post:
You’re combining two forecasts. That is, with θ∈A,B,C being the outcome, you have your own pmf p1(θ) and the expert’s e=p2(θ), then combine them using p(θ)∝p1(θ)p2(θ). That’s fair enough, but I suppose p(θ)∝√p1(θ)p2(θ) or maybe p(θ)∝p1(θ)qp2(θ)1−q for some q∈[0,1] would be a better way to do it.
It might be possible to interpret your calculations as a proper application of Bayes’ rule, but that requires stretching it. Suppose θ is your subjective probability vector for the outcomes A,B,C and x is the subjective probability vector for the event supplied by an expert (the value of x is unknown to us). To use Bayes’ rule, we will have to say that the evidence vector e=p(x∣θ), the probability of observing an expert judgment of x given that θ is true. I’m not sure we ever observe such quantities directly, and it is pretty clear from your post that you’re talking about e=p2(θ) in the sense used above, not p(x∣θ).
Assuming interpretation 1, the rest of your calculations are not that interesting, as you’re using a method of knowledge pooling no one advocates.
Assuming interpretation 2,x∗
the rest of your calculations are probably incorrect. I don’t think there is a unique way to go fromp(x∣θ)to, let’s say,p(x∗∣θ∗),whereis the expert’s probability vector overA,Acandθ∗your probability vector overA,Ac.Thanks for engaging!
Parameters are abstractions we use to simplify modelling. What we actually care about is the probability of unkown events given past observations.
To clarify: this is not what I wanted to discuss. The expert is reporting how you should update your priors given the evidence, and remaining agnostic on what the priors should be.
The whole point of Bayesianism is that it offer a precise, quantitative answer to how you should update your priors given some evidence—and that is multiplying by the likelihoods.
This is why it is often recommend in social sciences and elsewhere to report your likelihoods.
I agree this is not common in judgemental forecasting, where the whole updating process is very illegible. I think it holds for most Bayesian-leaning scientific reporting.
I am not, I am talking about evidence = likelihood vectors.
One way to think about this is that the expert is just informing us about how we should update our beliefs. “Given that the pandemic broke out in Wuhan, your subjective probability of a lab break should increase and it should increase by this amount”. But the final probability depends on your prior beliefs, that the expert cannot possibly know.
Yes! If I am understanding this right, I think this gets to the crux of the post. The compression is lossy, and neccessarily loses some information.
Okay, thanks for the clarification! Let’s see if I understand your setup correctly. Suppose we have the probability measures pE and p1, where pE is the probability measure of the expert. Moreover, we have an outcome x∈{A,B,C}.
In your post, you use p1(x∣z)∝pE(z∣x)p1(x), where z is an unknown outcome known only to the expert. To use Bayes’ rule, we must make the assumption that p1(z∣x)=pE(z∣x). This assumption doesn’t sound right to be, but I suppose some strange assumption is necessary for this simple framework. In this model, I agree with your calculations.
I’m not sure. When we’re looking directly at the probability of an event x (instead of the probability of the probability an event), things get much simpler than I thought.
Let’s see what happens to the likelihood when you aggregate from the expert’s point of view. Letting x∈{A,B,C}, we need to calculate the expert’s likelihoods pE(z∣A) and pE(z∣B∪C). In this case,
which is essentially your calculations, but from the expert’s point of view. The likelihood pE(z∣B∪C) depends on pE(B∪C), the prior of the expert, which is unknown to you. That shouldn’t come as a surprise, as he needs to use the prior of in order to combine the probability of the events B and C.
But the calculations are exactly the same from your point of view, leading to
Now, suppose we want to generally ensure that pE(z∣B∪C)=p1(z∣B∪C). Which is what I believe you want to do, and which seems pretty natural to do, at least since we’re allowed to assume that pE(z∣x)=p1(z∣x) for all simple events x. To ensure this, we will probably have to require that your priors are the same as the expert. In other words, your joint distributions are equal, or p1(z,x)=pE(z,x).
Do you agree with this summary?