Caridorc Tergilti comments on Recipe: Hessian eigenvector computation for PyTorch models

Caridorc Tergilti 14 Aug 2023 17:46 UTC
1 point
0
Given that this method returns a numeric matrix, then it must be an Hessian evaluated at a point or the average Hessian of many points. Is the result the Hessian averaged over all training data? Is this average useful rather than just cancelling out high and low Hessian values
- Nina Panickssery 14 Aug 2023 17:55 UTC
  1 point
  0
  Parent
  The method described does not explicitly compute the full Hessian matrix. Instead, it derives the top eigenvalues and eigenvectors of the Hessian. The implementation accumulates a large batch from a dataloader by concatenating n_batches of the typical batch size. This is an approximation to estimate the genuine loss/gradient on the complete dataset more closely. If you have a large and high-variance dataset, averaging gradients over multiple batches might be better. This is because the loss calculated from a single, accumulated batch may not be adequately representative of the entire dataset’s true loss.