leogao comments on Gradient descent is not just more efficient genetic algorithms

leogao 9 Sep 2021 21:34 UTC
1 point
SGD updates the weights in the direction of the gradient, and if changing a given weight alone does not affect the loss then the gradient component that is associated with that weight will be 0 and thus SGD will not change that weight.
If the partial derivative wrt two different parameters is zero, i.e $\frac{\partial f}{\partial θ_{1}} = \frac{\partial f}{\partial θ_{2}} = 0$ , then it must be that changing both simultaneously does not change the loss either (to be precise, ${lim}_{h \to 0} \frac{f (x + h (θ_{1} + θ_{2}))}{h} = 0$ ).
- Ofer 10 Sep 2021 23:15 UTC
  1 point
  Parent
  I don’t see how this is relevant here. If it is the case that changing only $w_{1}$ does not affect the loss, and changing only $w_{2}$ does not affect the loss, then SGD would not change them (their gradient components will be zero), even if changing them both can affect the loss.
  - leogao 10 Sep 2021 23:55 UTC
    2 points
    Parent
    It’s relevant because it demonstrates that in differentiable functions, if it is the case that changing only $w_{1}$ does not affect the loss, and changing only $w_{2}$ does not affect the loss, then it is not possible that changing them both can affect the loss either.