But gradient descent doesn’t modify a neural network one weight at a time
Sure, but the gradient component that is associated with a given weight is still zero if updating that weight alone would not affect loss.
What do you think the gradient of min(x, y) is?
Current theme: default
Less Wrong (text)
Less Wrong (link)
Sure, but the gradient component that is associated with a given weight is still zero if updating that weight alone would not affect loss.
What do you think the gradient of min(x, y) is?