harsimony comments on The Hessian rank bounds the learning coefficient

harsimony 16 Aug 2024 19:03 UTC
1 point
0
Isn’t calculating the Hessian for large statistical models kind of hard? And aren’t second derivatives prone to numerical errors?

Agree that this is only valuable if sampling on the loss landscape is easier or more robust than calculating the Hessian.
- Lucius Bushnaq 16 Aug 2024 20:59 UTC
  11 points
  4
  Parent
  Getting the Hessian eigenvalues does not require calculating the full Hessian. You use Jacobian vector product methods in e.g. JAX. The Hessian itself never has to be explicitly represented in memory.
  
  And even assuming the estimator for the Hessian pseudoinverse is cheap and precise, you’d still need to get its rank anyway, which would by default be just as expensive as getting the rank of the Hessian.
  - harsimony 16 Aug 2024 21:44 UTC
    1 point
    0
    Parent
    That makes sense, I guess it just comes down to an empirical question of which is easier.
    
    Question about what you said earlier: How can you use the top/bottom eigenvalues to estimate the rank of the Hessian? I’m not as familiar with this so any pointers would be appreciated!
    - George Ingebretsen 3 Sep 2024 1:36 UTC
      4 points
      0
      Parent
      The rank of a matrix = the number of non-zero eigenvalues of the matrix! So you can either use the top eigenvalues to count the non-zeros, or you can use the fact that an $n \times n$ matrix always has $n$ eigenvalues to determine the number of non-zero eigenvalues by counting the bottom zero-eigenvalues.
      
      Also for more detail on the “getting hessian eigenvalues without calculating the full hessian” thing, I’d really recommend Johns explanation in this linear algebra lecture he recorded.
      - harsimony 3 Sep 2024 19:54 UTC
        3 points
        0
        Parent
        Thanks for this! I misinterpreted Lucius as saying “use the single highest and single lowest eigenvalues to estimate the rank of a matrix” which I didn’t think was possible.
        
        Counting the number of non-zero eigenvalues makes a lot more sense!