The measure for peak broadness used near the end confuses me in many ways. It seems to imply that a large Hessian determinant means a broad peak. But wouldn’t you expect the opposite, if anything? E.g. in one dimension, this would seem to imply that a larger second derivative would mean a broader peak. That just seems exactly false.
It seems like there’s either something missing in this post, or in my head.
… it is embarrassingly plausible that I made a sign error and that whole argument is exactly wrong.
The picture in my head is “broad basin ⇒ circular-ish peak ⇒ large determinant” (since long, narrow peaks have low volume and low determinant). But maybe the diagonals were exactly the wrong things to keep fixed in order to make that argument work.
If you want to quantify the shape of a peak, then the inverse of the Hessian seems more intuitive than the Hessian itself. E.g. for the PDF of a normal distribution, the inverse of the Hessian corresponds to the covariance matrix. But for the inverse Hessian, large determinant does mean broad basin, unlike for the standard Hessian. And the inverse Hessian has basically the same off-diagonal elements as the Hessian does.
Yes, I really don’t see how this would work right now. If I try doing Taylor series, which is what I’d start with for something like this, I very much get the opposite result.
I’m actually (hopefully) joining ai safety camp to work on your topics next month, so maybe we can talk about this more then?
The measure for peak broadness used near the end confuses me in many ways. It seems to imply that a large Hessian determinant means a broad peak. But wouldn’t you expect the opposite, if anything? E.g. in one dimension, this would seem to imply that a larger second derivative would mean a broader peak. That just seems exactly false.
It seems like there’s either something missing in this post, or in my head.
… it is embarrassingly plausible that I made a sign error and that whole argument is exactly wrong.
The picture in my head is “broad basin ⇒ circular-ish peak ⇒ large determinant” (since long, narrow peaks have low volume and low determinant). But maybe the diagonals were exactly the wrong things to keep fixed in order to make that argument work.
I have a possible fix to the argument:
If you want to quantify the shape of a peak, then the inverse of the Hessian seems more intuitive than the Hessian itself. E.g. for the PDF of a normal distribution, the inverse of the Hessian corresponds to the covariance matrix. But for the inverse Hessian, large determinant does mean broad basin, unlike for the standard Hessian. And the inverse Hessian has basically the same off-diagonal elements as the Hessian does.
Yes, I really don’t see how this would work right now. If I try doing Taylor series, which is what I’d start with for something like this, I very much get the opposite result.
I’m actually (hopefully) joining ai safety camp to work on your topics next month, so maybe we can talk about this more then?
Yeah definitely.