SGD updates the weights in the direction of the gradient, and if changing a given weight alone does not affect the loss then the gradient component that is associated with that weight will be 0 and thus SGD will not change that weight.
If the partial derivative wrt two different parameters is zero, i.e ∂f∂θ1=∂f∂θ2=0, then it must be that changing both simultaneously does not change the loss either (to be precise, limh→0f(x+h(θ1+θ2))h=0).
I don’t see how this is relevant here. If it is the case that changing only w1 does not affect the loss, and changing only w2 does not affect the loss, then SGD would not change them (their gradient components will be zero), even if changing them both can affect the loss.
It’s relevant because it demonstrates that in differentiable functions, if it is the case that changing only w1 does not affect the loss, and changing only w2 does not affect the loss, then it is not possible that changing them both can affect the loss either.
If the partial derivative wrt two different parameters is zero, i.e ∂f∂θ1=∂f∂θ2=0, then it must be that changing both simultaneously does not change the loss either (to be precise, limh→0f(x+h(θ1+θ2))h=0).
I don’t see how this is relevant here. If it is the case that changing only w1 does not affect the loss, and changing only w2 does not affect the loss, then SGD would not change them (their gradient components will be zero), even if changing them both can affect the loss.
It’s relevant because it demonstrates that in differentiable functions, if it is the case that changing only w1 does not affect the loss, and changing only w2 does not affect the loss, then it is not possible that changing them both can affect the loss either.