Vivek Hebbar comments on Hessian and Basin volume

Vivek Hebbar 11 Jul 2022 0:10 UTC
LW: 5 AF: 4
1
AF
The loss is defined over all dimensions of parameter space, so $L (x) = x_{1}^{2} + x_{2}^{2}$ is still a function of all 3 x’s. You should think of it as $L (x) = x_{1}^{2} + x_{2}^{2} + 0 x_{3}^{2}$ . It’s thickness in the $x_{3}$ direction is infinite, not zero.
Here’s what a zero-determinant Hessian corresponds to:
The basin here is not lower dimensional; it is just infinite in some dimension. The simplest way to fix this is to replace the infinity with some large value. Luckily, there is a fairly principled way to do this:
1. Regularization / weight decay provides actual curvature, which should be added in to the loss, and doing this is the same as adding $λ I_{n}$ to the Hessian.
2. The scale of the initialization distribution provides a natural scale for how much volume an infinite sweep should count as (very roughly, the volume only matters if it overlaps with the initialization distribution, and the distance of sweep for which this is true is on the order of $σ$ , the standard deviation of the initialization).
So the $(λ + \frac{k}{σ^{2}}) I_{n}$ is a fairly principled correction, and much better than just “throwing out” the other dimensions. “Throwing out” dimensions is unprincipled, dimensionally incorrect, numerically problematic, and should give worse results.
What links here?
- You’re Measuring Model Complexity Wrong by Jesse Hoogland (11 Oct 2023 11:46 UTC; 89 points)