It’s measuring the volume of points in parameter space with loss <ϵ when ϵ is infinitesimal.
This is slightly tricky because it doesn’t restrict itself to bounded parameter spaces,[1] but you can fix it with a technicality by considering how the volume scales with ϵ instead.
In real networks trained with finite amounts of data, you care about the case where ϵ is small but finite, so this is ultimately inferior to just measuring how many configurations of floating point numbers get loss <ϵ, if you can manage that.
I still think SLT has some neat insights that helped me deconfuse myself about networks.
For example, like lots of people, I used to think you could maybe estimate the volume of basins with loss <ϵ using just the eigenvalues of the Hessian. You can’t. At least not in general.
Like the floating point numbers in a real network, which can only get so large. A prior of finite width over the parameters also effectively bounds the space
It’s measuring the volume of points in parameter space with loss <ϵ when ϵ is infinitesimal.
This is slightly tricky because it doesn’t restrict itself to bounded parameter spaces,[1] but you can fix it with a technicality by considering how the volume scales with ϵ instead.
In real networks trained with finite amounts of data, you care about the case where ϵ is small but finite, so this is ultimately inferior to just measuring how many configurations of floating point numbers get loss <ϵ, if you can manage that.
I still think SLT has some neat insights that helped me deconfuse myself about networks.
For example, like lots of people, I used to think you could maybe estimate the volume of basins with loss <ϵ using just the eigenvalues of the Hessian. You can’t. At least not in general.
Like the floating point numbers in a real network, which can only get so large. A prior of finite width over the parameters also effectively bounds the space