harsimony comments on The Hessian rank bounds the learning coefficient

harsimony 16 Aug 2024 15:32 UTC
1 point
0
You may find this interesting “On the Covariance-Hessian Relation in Evolution Strategies”:

https://arxiv.org/pdf/1806.03674

It makes a lot of assumptions, but as I understand it if you: a. Sample points near the minima [1]. b. Select only the lowest loss point from that sample and save it. c. Repeat that process many times d. Create a covariance matrix of the selected points

The covariance matrix will converge to the inverse of the Hessian, assuming the loss landscape is quadratic. Since the inverse of a matrix has the same rank, you could probably just use this covariance matrix to bound the local learning coefficient.

Though since a covariance matrix has rank less than n-1 (where n is the number of sample points) you would need to sample and evaluate roughly d/2 points. The process seems pretty parallelize-able though.

[1] Specifically using an an isotropic, unit variance normal distribution centered at the minima.
- Lucius Bushnaq 16 Aug 2024 18:40 UTC
  2 points
  0
  Parent
  Why would we want or need to do this, instead of just calculating the top/bottom Hessian eigenvalues?
  - harsimony 16 Aug 2024 19:03 UTC
    1 point
    0
    Parent
    Isn’t calculating the Hessian for large statistical models kind of hard? And aren’t second derivatives prone to numerical errors?
    
    Agree that this is only valuable if sampling on the loss landscape is easier or more robust than calculating the Hessian.
    - Lucius Bushnaq 16 Aug 2024 20:59 UTC
      11 points
      4
      Parent
      Getting the Hessian eigenvalues does not require calculating the full Hessian. You use Jacobian vector product methods in e.g. JAX. The Hessian itself never has to be explicitly represented in memory.
      
      And even assuming the estimator for the Hessian pseudoinverse is cheap and precise, you’d still need to get its rank anyway, which would by default be just as expensive as getting the rank of the Hessian.
      - harsimony 16 Aug 2024 21:44 UTC
        1 point
        0
        Parent
        That makes sense, I guess it just comes down to an empirical question of which is easier.
        
        Question about what you said earlier: How can you use the top/bottom eigenvalues to estimate the rank of the Hessian? I’m not as familiar with this so any pointers would be appreciated!
        George Ingebretsen 3 Sep 2024 1:36 UTC
        4 points
        0
        Parent
        The rank of a matrix = the number of non-zero eigenvalues of the matrix! So you can either use the top eigenvalues to count the non-zeros, or you can use the fact that an $n \times n$ matrix always has $n$ eigenvalues to determine the number of non-zero eigenvalues by counting the bottom zero-eigenvalues.
        
        Also for more detail on the “getting hessian eigenvalues without calculating the full hessian” thing, I’d really recommend Johns explanation in this linear algebra lecture he recorded.
        harsimony 3 Sep 2024 19:54 UTC
        3 points
        0
        Parent
        Thanks for this! I misinterpreted Lucius as saying “use the single highest and single lowest eigenvalues to estimate the rank of a matrix” which I didn’t think was possible.
        
        Counting the number of non-zero eigenvalues makes a lot more sense!