Dmitry Vaintrob comments on Investigating the learning coefficient of modular addition: hackathon project

Dmitry Vaintrob 17 Oct 2023 21:30 UTC
4 points
0
Oh I can see how this could be confusing. We’re sampling at every step in the orthogonal complement to the gradient at that step (“initialization” here refers to the beginning of sampling, i.e., we don’t update the normal vector during sampling). And the reason to do this is that we’re hoping to prevent the sampler from quickly leaving the unstable point and jumping into a lower-loss basin (by restricting we are guaranteeing that the unstable point is a critical point)
- Daniel Murfet 17 Oct 2023 21:45 UTC
  2 points
  2
  Parent
  Oh that makes a lot of sense, yes.