Sort of. They end up equivalent to a single Newton step, not a single gradient step (or at least that’s what this model says). In general, a set of linear equations is not solved by one gradient step, but is solved by one Newton step. It generally takes many gradient steps to solve a set of linear equations.
(Caveat to this: if you directly attempt a Newton step on this sort of system, you’ll probably get an error, because the system is underdetermined. Actually making Newton steps work for NN training would probably be a huge pain in the ass, since the underdetermination would cause numerical issues.)
Sort of. They end up equivalent to a single Newton step, not a single gradient step (or at least that’s what this model says). In general, a set of linear equations is not solved by one gradient step, but is solved by one Newton step. It generally takes many gradient steps to solve a set of linear equations.
(Caveat to this: if you directly attempt a Newton step on this sort of system, you’ll probably get an error, because the system is underdetermined. Actually making Newton steps work for NN training would probably be a huge pain in the ass, since the underdetermination would cause numerical issues.)
By Newton step, do you mean one step of Newton’s method?
Yes.