Ofer comments on Gradient descent is not just more efficient genetic algorithms

Ofer 10 Sep 2021 23:15 UTC
1 point
I don’t see how this is relevant here. If it is the case that changing only $w_{1}$ does not affect the loss, and changing only $w_{2}$ does not affect the loss, then SGD would not change them (their gradient components will be zero), even if changing them both can affect the loss.
- leogao 10 Sep 2021 23:55 UTC
  2 points
  Parent
  It’s relevant because it demonstrates that in differentiable functions, if it is the case that changing only $w_{1}$ does not affect the loss, and changing only $w_{2}$ does not affect the loss, then it is not possible that changing them both can affect the loss either.