Looks like I really need to study some SLT! I will say though that I haven’t seen many cases in transformer language models where the eigenvalues of the Hessian are 90% zeros—that seems extremely high.
Looks like I really need to study some SLT! I will say though that I haven’t seen many cases in transformer language models where the eigenvalues of the Hessian are 90% zeros—that seems extremely high.