Getting the Hessian eigenvalues does not require calculating the full Hessian. You use Jacobian vector product methods in e.g. JAX. The Hessian itself never has to be explicitly represented in memory.
And even assuming the estimator for the Hessian pseudoinverse is cheap and precise, you’d still need to get its rank anyway, which would by default be just as expensive as getting the rank of the Hessian.
That makes sense, I guess it just comes down to an empirical question of which is easier.
Question about what you said earlier: How can you use the top/bottom eigenvalues to estimate the rank of the Hessian? I’m not as familiar with this so any pointers would be appreciated!
The rank of a matrix = the number of non-zero eigenvalues of the matrix!
So you can either use the top eigenvalues to count the non-zeros, or you can use the fact that an n×n matrix always has n eigenvalues to determine the number of non-zero eigenvalues by counting the bottom zero-eigenvalues.
Also for more detail on the “getting hessian eigenvalues without calculating the full hessian” thing, I’d really recommend Johns explanation in this linear algebra lecture he recorded.
Thanks for this! I misinterpreted Lucius as saying “use the single highest and single lowest eigenvalues to estimate the rank of a matrix” which I didn’t think was possible.
Counting the number of non-zero eigenvalues makes a lot more sense!
Why would we want or need to do this, instead of just calculating the top/bottom Hessian eigenvalues?
Isn’t calculating the Hessian for large statistical models kind of hard? And aren’t second derivatives prone to numerical errors?
Agree that this is only valuable if sampling on the loss landscape is easier or more robust than calculating the Hessian.
Getting the Hessian eigenvalues does not require calculating the full Hessian. You use Jacobian vector product methods in e.g. JAX. The Hessian itself never has to be explicitly represented in memory.
And even assuming the estimator for the Hessian pseudoinverse is cheap and precise, you’d still need to get its rank anyway, which would by default be just as expensive as getting the rank of the Hessian.
That makes sense, I guess it just comes down to an empirical question of which is easier.
Question about what you said earlier: How can you use the top/bottom eigenvalues to estimate the rank of the Hessian? I’m not as familiar with this so any pointers would be appreciated!
The rank of a matrix = the number of non-zero eigenvalues of the matrix! So you can either use the top eigenvalues to count the non-zeros, or you can use the fact that an n×n matrix always has n eigenvalues to determine the number of non-zero eigenvalues by counting the bottom zero-eigenvalues.
Also for more detail on the “getting hessian eigenvalues without calculating the full hessian” thing, I’d really recommend Johns explanation in this linear algebra lecture he recorded.
Thanks for this! I misinterpreted Lucius as saying “use the single highest and single lowest eigenvalues to estimate the rank of a matrix” which I didn’t think was possible.
Counting the number of non-zero eigenvalues makes a lot more sense!