JonasMoss comments on A Bayesian Aggregation Paradox

JonasMoss 7 Mar 2022 20:30 UTC
1 point
I find the beginning of this post somewhat strange, and I’m not sure your post proves what you claim it does. You start out discussing what appears to be a combination of two forecasts, but present it as Bayesian updating. Recall that Bayes theorem says $p (θ ∣ x) = \frac{p (x ∣ θ) p (θ)}{p (x)}$ . To use this theorem, you need both an $x$ (your data / evidence), and a $θ$ (your parameter). Using “posterior $\propto$ prior $\times$ likelihood” (with priors $p_{1}, p_{2}, p_{3}$ and likelihoods $e_{1}, e_{2}, e_{3}$ ), you’re talking as if your expert’s likelihood equals $p (x ∣ θ)$ – but is that true in any sense? A likelihood isn’t just something you multiply with your prior, it is a conditional pmf or pdf with a different outcome than your prior.
I can see two interpretations of what you’re doing at the beginning of your post:
1. You’re combining two forecasts. That is, with $θ \in A, B, C$ being the outcome, you have your own pmf $p_{1} (θ)$ and the expert’s $e = p_{2} (θ)$ , then combine them using $p (θ) \propto p_{1} (θ) p_{2} (θ)$ . That’s fair enough, but I suppose $p (θ) \propto \sqrt{p_{1} (θ) p_{2} (θ)}$ or maybe $p (θ) \propto p_{1} (θ)^{q} p_{2} (θ)^{1 - q}$ for some $q \in [0, 1]$ would be a better way to do it.
2. It might be possible to interpret your calculations as a proper application of Bayes’ rule, but that requires stretching it. Suppose $θ$ is your subjective probability vector for the outcomes $A, B, C$ and $x$ is the subjective probability vector for the event supplied by an expert (the value of $x$ is unknown to us). To use Bayes’ rule, we will have to say that the evidence vector $e = p (x ∣ θ)$ , the probability of observing an expert judgment of $x$ given that $θ$ is true. I’m not sure we ever observe such quantities directly, and it is pretty clear from your post that you’re talking about $e = p_{2} (θ)$ in the sense used above, not $p (x ∣ θ)$ .
Assuming interpretation 1, the rest of your calculations are not that interesting, as you’re using a method of knowledge pooling no one advocates.
Assuming interpretation 2, ~~the rest of your calculations are probably incorrect. I don’t think there is a unique way to go from~~ $p (x ∣ θ)$ ~~to, let’s say,~~ $p (x^{*} ∣ θ^{*})$ , ~~where~~ $x^{*}$ ~~is the expert’s probability vector over~~ $A, A^{c}$ ~~and~~ $θ^{*}$ ~~your probability vector over~~ $A, A^{c}$ .
- Jsevillamol 7 Mar 2022 23:36 UTC
  2 points
  Parent
  Thanks for engaging!
  To use this theorem, you need both an $x$ (your data / evidence), and a $θ$ (your parameter).
  Parameters are abstractions we use to simplify modelling. What we actually care about is the probability of unkown events given past observations.
  You start out discussing what appears to be a combination of two forecasts
  To clarify: this is not what I wanted to discuss. The expert is reporting how you should update your priors given the evidence, and remaining agnostic on what the priors should be.
  A likelihood isn’t just something you multiply with your prior, it is a conditional pmf or pdf with a different outcome than your prior.
  The whole point of Bayesianism is that it offer a precise, quantitative answer to how you should update your priors given some evidence—and that is multiplying by the likelihoods.
  This is why it is often recommend in social sciences and elsewhere to report your likelihoods.
  I’m not sure we ever observe [the evidence vector] directly
  I agree this is not common in judgemental forecasting, where the whole updating process is very illegible. I think it holds for most Bayesian-leaning scientific reporting.
  it is pretty clear from your post that you’re talking about $e = p_{2} (θ)$ in the sense used above, not $p (x ∣ θ)$ .
  I am not, I am talking about evidence = likelihood vectors.
  One way to think about this is that the expert is just informing us about how we should update our beliefs. “Given that the pandemic broke out in Wuhan, your subjective probability of a lab break should increase and it should increase by this amount”. But the final probability depends on your prior beliefs, that the expert cannot possibly know.
  I don’t think there is a unique way to go from $p (x ∣ θ)$ to, let’s say, $p (x^{*} ∣ θ^{*})$ , where $x^{*}$ is the expert’s probability vector over $A, A^{c}$ and $θ^{*}$ your probability vector over $A, A^{c}$ .
  Yes! If I am understanding this right, I think this gets to the crux of the post. The compression is lossy, and neccessarily loses some information.
  - JonasMoss 8 Mar 2022 9:38 UTC
    1 point
    Parent
    Okay, thanks for the clarification! Let’s see if I understand your setup correctly. Suppose we have the probability measures $p_{E}$ and $p_{1}$ , where $p_{E}$ is the probability measure of the expert. Moreover, we have an outcome $x \in {A, B, C} .$
    
    In your post, you use $p_{1} (x ∣ z) \propto p_{E} (z ∣ x) p_{1} (x)$ , where $z$ is an unknown outcome known only to the expert. To use Bayes’ rule, we must make the assumption that $p_{1} (z ∣ x) = p_{E} (z ∣ x)$ . This assumption doesn’t sound right to be, but I suppose some strange assumption is necessary for this simple framework. In this model, I agree with your calculations.
    Yes! If I am understanding this right, I think this gets to the crux of the post. The compression is lossy, and necessarily loses some information.
    I’m not sure. When we’re looking directly at the probability of an event $x$ (instead of the probability of the probability an event), things get much simpler than I thought.
    Let’s see what happens to the likelihood when you aggregate from the expert’s point of view. Letting $x \in {A, B, C}$ , we need to calculate the expert’s likelihoods $p_{E} (z ∣ A)$ and $p_{E} (z ∣ B \cup C)$ . In this case,
    $\begin{matrix} p_{E} (z ∣ B \cup C) & = & \frac{p_{E} (B \cup C ∣ z)}{p_{E} (B \cup C)} p_{E} (z), = & \frac{p_{E} (B ∣ z) + p_{E} (C ∣ z)}{p_{E} (B \cup C)} p_{E} (z), = & \frac{p_{E} (z ∣ B) P (B) + p_{E} (z ∣ C) P (C)}{p_{E} (B) + p_{E} (C)}, \end{matrix}$
    which is essentially your calculations, but from the expert’s point of view. The likelihood $p_{E} (z ∣ B \cup C)$ depends on $p_{E} (B \cup C)$ , the prior of the expert, which is unknown to you. That shouldn’t come as a surprise, as he needs to use the prior of in order to combine the probability of the events $B$ and $C$ .
    But the calculations are exactly the same from your point of view, leading to
    $p_{1} (z ∣ B \cup C) = \frac{p_{E} (z ∣ B) p_{1} (B) + p_{E} (z ∣ C) p_{1} (C)}{p_{1} (B) + p_{1} (C)}$
    Now, suppose we want to generally ensure that $p_{E} (z ∣ B \cup C) = p_{1} (z ∣ B \cup C)$ . Which is what I believe you want to do, and which seems pretty natural to do, at least since we’re allowed to assume that $p_{E} (z ∣ x) = p_{1} (z ∣ x)$ for all simple events $x$ . To ensure this, we will probably have to require that your priors are the same as the expert. In other words, your joint distributions are equal, or $p_{1} (z, x) = p_{E} (z, x)$ .
    Do you agree with this summary?