Okay, thanks for the clarification! Let’s see if I understand your setup correctly. Suppose we have the probability measures pE and p1, where pE is the probability measure of the expert. Moreover, we have an outcome x∈{A,B,C}.
In your post, you use p1(x∣z)∝pE(z∣x)p1(x), where z is an unknown outcome known only to the expert. To use Bayes’ rule, we must make the assumption that p1(z∣x)=pE(z∣x). This assumption doesn’t sound right to be, but I suppose some strange assumption is necessary for this simple framework. In this model, I agree with your calculations.
Yes! If I am understanding this right, I think this gets to the crux of the post. The compression is lossy, and necessarily loses some information.
I’m not sure. When we’re looking directly at the probability of an event x (instead of the probability of the probability an event), things get much simpler than I thought.
Let’s see what happens to the likelihood when you aggregate from the expert’s point of view. Letting x∈{A,B,C}, we need to calculate the expert’s likelihoods pE(z∣A) and pE(z∣B∪C). In this case,
which is essentially your calculations, but from the expert’s point of view. The likelihood pE(z∣B∪C) depends on pE(B∪C), the prior of the expert, which is unknown to you. That shouldn’t come as a surprise, as he needs to use the prior of in order to combine the probability of the events B and C.
But the calculations are exactly the same from your point of view, leading to
p1(z∣B∪C)=pE(z∣B)p1(B)+pE(z∣C)p1(C)p1(B)+p1(C)
Now, suppose we want to generally ensure that pE(z∣B∪C)=p1(z∣B∪C). Which is what I believe you want to do, and which seems pretty natural to do, at least since we’re allowed to assume that pE(z∣x)=p1(z∣x) for all simple events x. To ensure this, we will probably have to require that your priors are the same as the expert. In other words, your joint distributions are equal, or p1(z,x)=pE(z,x).
Okay, thanks for the clarification! Let’s see if I understand your setup correctly. Suppose we have the probability measures pE and p1, where pE is the probability measure of the expert. Moreover, we have an outcome x∈{A,B,C}.
In your post, you use p1(x∣z)∝pE(z∣x)p1(x), where z is an unknown outcome known only to the expert. To use Bayes’ rule, we must make the assumption that p1(z∣x)=pE(z∣x). This assumption doesn’t sound right to be, but I suppose some strange assumption is necessary for this simple framework. In this model, I agree with your calculations.
I’m not sure. When we’re looking directly at the probability of an event x (instead of the probability of the probability an event), things get much simpler than I thought.
Let’s see what happens to the likelihood when you aggregate from the expert’s point of view. Letting x∈{A,B,C}, we need to calculate the expert’s likelihoods pE(z∣A) and pE(z∣B∪C). In this case,
which is essentially your calculations, but from the expert’s point of view. The likelihood pE(z∣B∪C) depends on pE(B∪C), the prior of the expert, which is unknown to you. That shouldn’t come as a surprise, as he needs to use the prior of in order to combine the probability of the events B and C.
But the calculations are exactly the same from your point of view, leading to
Now, suppose we want to generally ensure that pE(z∣B∪C)=p1(z∣B∪C). Which is what I believe you want to do, and which seems pretty natural to do, at least since we’re allowed to assume that pE(z∣x)=p1(z∣x) for all simple events x. To ensure this, we will probably have to require that your priors are the same as the expert. In other words, your joint distributions are equal, or p1(z,x)=pE(z,x).
Do you agree with this summary?