Dagon comments on Jaynesian interpretation—How does “estimating probabilities” make sense?

Dagon 21 Jul 2021 22:27 UTC
3 points
That’s my take as well. “estimating the probability” really means “calculating the plausibility based on this knowledge”.
- Haziq Muhammad 22 Jul 2021 4:34 UTC
  1 point
  Parent
  I believe, mathematically, your claim can be expressed as:
  $P (H | D)$ = $a r g m a x θ P (θ | D)$
  where $θ$ is the ”probability“ parameter of the Bernoulli distribution, H represents the the proposition that heads occurs, and D represents our data. The left side of this equation is the plausibility based on knowledge and the right side is Professor Jaynes’ ‘estimate of the probability’ . How can we prove this statement?
  Edit:
  Latex is being a nuisance as usual :) The right side of the equation is the argmax with respect to theta of P(theta | data)
  - Jan Christian Refsgaard 22 Jul 2021 7:41 UTC
    2 points
    Parent
    I think argmax is not the way to go as the beta distribution and binomial likelihood is only symmetric when the coin is fair, if you want a point estimate the mean of the distribution is better, which will always be closer to ⁵⁰⁄₅₀ than the mode, and thus more conservative, you are essentially ignoring all the uncertainty of theta and thus overestimating the probability.
    - Haziq Muhammad 22 Jul 2021 8:47 UTC
      1 point
      Parent
      What is the theoretical justification behind taking the mean? Argmax feels more intuitive for me because it is literally “the most plausible value of theta”. In either case, whether we use argmax or mean, can we prove that it is equal to P(H|D)?
      - Jan Christian Refsgaard 22 Jul 2021 11:33 UTC
        1 point
        Parent
        If I have a distribution of 2 kids and a professional boxer, and a random one is going to hit me, then argmax tells me that I will always be hit by a kids, sure if you draw from the distribution only once then argmax will beat the mean in ²⁄₃ of the cases, but its much worse at answering what will happen if I draw 9 hits (argmax=nothing, mean=3hits from a boxer)
        This distribution is skewed, like the beta distribution, and is therefore better summarized by the mean than the mode.
        In Bayesian statistics argmax on sigma will often lead to sigma=0, if you assume that sigma follows a exponential distribution, thus it will lead you to assume that there is no variance in your sample
        The variance is also lower around the mean than the mode if that counts as a theoretical justification :)