Does each likelihood distribution have a unique conjugate prior? I doesn’t seem immediately obvious that they do, but people say things like “The conjugate prior for the bernoulli distribution is the beta distribution”.
No, in general are many conjugate priors for a given likelihood, if for no other reason than any weighted mixture of conjugate priors is also a conjugate prior.
What about the converse—does a conjugate prior exist for each likelihood (assume “nice” families of probability measures with a R-N derivative w.r.t counting measure or lebesgue measure if you like)? I think probably not (with a fairly high degree of certainty) but I don’t think I’ve ever seen a proof of it.
The existence of a conjugate prior is not guaranteed. They exist for members of the exponential family, which is a very broad and useful class of distributions. I don’t know of a proof, but if a gun were held to my head, I’d assert with reasonable confidence that the Cauchy likelihood doesn’t have a conjugate prior.
I’m pretty sure that the Cauchy likelihood, like the other members of the t family, is a weighted mixture of normal distributions. (Gamma distribution over the inverse of the variance)
EDIT: There’s a paper on this, “Scale mixtures of normal distributions” by Andrews and Mallows, if you want the details
Oh, for sure it is. But that only gives it a conditionally conjugate prior, not a fully (i.e., marginally) conjugate prior. That’s great for Gibbs sampling, but not for pen-and-paper computations.
In the three years since I wrote the grandparent, I’ve found a nice mixture representation for any unimodal symmetric distribution:
Suppose f(x), the pdf for a real-valued X, is unimodal and symmetric around 0. If W is positive-valued with pdf g(w) = -w f ’(w) and U ~ Unif(-W, W), then U’s marginal distribution is the same as X. Proof is by integration-by-parts. ETA: No, wait, it’s direct. Derp.
I don’t think it would be too hard to convert this width-weighted-mixture-of-uniforms representation to a precision-weighted-mixture-of-normals representation.
It turns out that it’s not too difficult to construct a counter example if you restrict the hyper-parameter space of the family of prior distributions. For example, let the likelihood, f(x|theta) only take on two values of theta, so the prior just puts mass p on theta=0 (i.e. P(theta=0) = p )and mass 1-p on theta=1. If you restrict p < 0.5, then the posterior will yield a distribution on theta with p > 0.5 for some likelihoods and some values of x.
Does each likelihood distribution have a unique conjugate prior? I doesn’t seem immediately obvious that they do, but people say things like “The conjugate prior for the bernoulli distribution is the beta distribution”.
No, in general are many conjugate priors for a given likelihood, if for no other reason than any weighted mixture of conjugate priors is also a conjugate prior.
What about the converse—does a conjugate prior exist for each likelihood (assume “nice” families of probability measures with a R-N derivative w.r.t counting measure or lebesgue measure if you like)? I think probably not (with a fairly high degree of certainty) but I don’t think I’ve ever seen a proof of it.
The existence of a conjugate prior is not guaranteed. They exist for members of the exponential family, which is a very broad and useful class of distributions. I don’t know of a proof, but if a gun were held to my head, I’d assert with reasonable confidence that the Cauchy likelihood doesn’t have a conjugate prior.
I’m pretty sure that the Cauchy likelihood, like the other members of the t family, is a weighted mixture of normal distributions. (Gamma distribution over the inverse of the variance)
EDIT: There’s a paper on this, “Scale mixtures of normal distributions” by Andrews and Mallows, if you want the details
Oh, for sure it is. But that only gives it a conditionally conjugate prior, not a fully (i.e., marginally) conjugate prior. That’s great for Gibbs sampling, but not for pen-and-paper computations.
In the three years since I wrote the grandparent, I’ve found a nice mixture representation for any unimodal symmetric distribution:
I don’t think it would be too hard to convert this width-weighted-mixture-of-uniforms representation to a precision-weighted-mixture-of-normals representation.
It turns out that it’s not too difficult to construct a counter example if you restrict the hyper-parameter space of the family of prior distributions. For example, let the likelihood, f(x|theta) only take on two values of theta, so the prior just puts mass p on theta=0 (i.e. P(theta=0) = p )and mass 1-p on theta=1. If you restrict p < 0.5, then the posterior will yield a distribution on theta with p > 0.5 for some likelihoods and some values of x.