Bucky comments on Laplace Approximation

Bucky 21 Jul 2019 20:46 UTC
1 point
Thanks for this sequence, I’ve read each post 3 or 4 times to try to properly get it.
Am I right in thinking that in order to replace $d P [θ] = d θ$ we not only require a uniform prior but also that $θ$ span unit volume?
- johnswentworth 21 Jul 2019 21:26 UTC
  4 points
  Parent
  Correct. In general, $d P [θ] = p [θ] d θ$ is the probability density of $θ$ , so if it’s uniform on a unit volume then $p [θ] = 1$ .
  The main advantage of this notation is that it’s parameterization-independent. For example: in a coin-flipping example, we could have a uniform prior over the frequency of heads $p_{H}$ , so $d P [p_{H}] = d p_{H}$ . But then, we could re-write that frequency in terms of the odds $o_{H} = \frac{p_{H}}{1 - p_{H}}$ , so we’d get $p_{H} = \frac{o_{H}}{1 + o_{H}}$ and
  $d P [o_{H}] = d P [p_{H}] = d p_{H} = d (\frac{o_{H}}{1 + o_{H}}) = \frac{d o_{H}}{(1 + o_{H})^{2}}$
  So the probability density $p [p_{H}] = 1$ is equivalent to the density $p [o_{H}] = \frac{1}{(1 + o_{H})^{2}}$ . (That first step, $d P [o_{H}] = d P [p_{H}]$ , is because these two variables contain exactly the same information in two different forms—that’s the parameterization independence. After that, it’s math: substitute and differentiate.)
  (Notice that the uniform prior on $p_{H}$ is not uniform over $o_{H}$ . This is one of the main reasons why “use a uniform prior” is not a good general-purpose rule for choosing priors: it depends on what parameters we choose. Cartesian and polar coordinate give different “uniform” priors.)
  The moral of the story is that, when dealing with continuous probability densities, the fundamental “thing” is not the density function $p [θ]$ but the density times the differential $p [θ] d θ$ , which we call $d P [θ]$ . This is important mainly when changing coordinates: if we have some coordinate change $θ (ϕ)$ , then $p [θ (ϕ)] d θ (ϕ) = p [ϕ] d ϕ$ , but $p [θ (ϕ)] \neq p [ϕ]$ .
  If anybody wants an exercise with this: try transforming $\int_{θ} e^{P [d a t a | θ]} d P [θ] = \int_{θ} e^{P [d a t a | θ]} p [θ] d θ$ to a different coordinate system. Apply Laplace’ approximation in both systems, and confirm that they yield the same answer. (This should mainly involve applying the chain rule twice to the Hessian; if you get stuck, remember that $θ_{m a x}$ is a maximum point and consider what that implies.)