but there’s no first-order term and the second-order term is negative for some reason?! What happened there?
There’s no first-order term because you are expanding around a maximum of the log posterior density. Similarly, the second-order term is negative (well, negative definite) precisely because the posterior density falls off away from the mode. What’s happening in rough terms is that each additional piece of data has, in expectation, the effect of making the log posterior curve down more sharply (around the true value of the parameter) by the amount of one copy of the Fisher information matrix (this is all assuming the model is true, etc.). You might also be interested in the concept of “observed information,” which represents the negative of the Hessian of the (actual not expected) log-likelihood around the mode.
Yes, that formula doesn’t make sense (you forgot the 1⁄2, by the way). I believe 8.52/8.53 should not have a minus there and 8.54 should have a minus that it’s missing. Also 8.52 should have expected values or big-O probability notation. This is a frequentist calculation so I’d suggest a more standard reference like Ferguson
There’s no first-order term because you are expanding around a maximum of the log posterior density. Similarly, the second-order term is negative (well, negative definite) precisely because the posterior density falls off away from the mode. What’s happening in rough terms is that each additional piece of data has, in expectation, the effect of making the log posterior curve down more sharply (around the true value of the parameter) by the amount of one copy of the Fisher information matrix (this is all assuming the model is true, etc.). You might also be interested in the concept of “observed information,” which represents the negative of the Hessian of the (actual not expected) log-likelihood around the mode.
ah, thank you! It makes me so happy to finally see why that first term disappears.
But now I don’t see why you subtract the second-order terms.
I mean, I do see that since you’re at a maximum, the value of the function has to decrease as you move away from it.
But, in the single-parameter case, Jaynes’s formula becomes
}=\log{p(x%7C\theta_0)}%20-%20\frac{\partial%5E2%20\log{p(x%7C\theta)}}{\partial%20\theta%5E2}(\delta\theta)%5E2)But that second derivative there is negative. And since we’re subtracting it, the function is growing as we move away from the minimum!
Yes, that formula doesn’t make sense (you forgot the 1⁄2, by the way). I believe 8.52/8.53 should not have a minus there and 8.54 should have a minus that it’s missing. Also 8.52 should have expected values or big-O probability notation. This is a frequentist calculation so I’d suggest a more standard reference like Ferguson