johnswentworth comments on Updating the Lottery Ticket Hypothesis

johnswentworth 20 Apr 2021 15:19 UTC
LW: 6 AF: 5
AF
Sort of. They end up equivalent to a single Newton step, not a single gradient step (or at least that’s what this model says). In general, a set of linear equations is not solved by one gradient step, but is solved by one Newton step. It generally takes many gradient steps to solve a set of linear equations.
(Caveat to this: if you directly attempt a Newton step on this sort of system, you’ll probably get an error, because the system is underdetermined. Actually making Newton steps work for NN training would probably be a huge pain in the ass, since the underdetermination would cause numerical issues.)
- adamShimi 20 Apr 2021 17:24 UTC
  LW: 5 AF: 4
  AF Parent
  By Newton step, do you mean one step of Newton’s method?
  - johnswentworth 20 Apr 2021 17:26 UTC
    LW: 2 AF: 2
    AF Parent
    Yes.