That’s a good point of clarification which perhaps weakens the point I was making there. From the paper,
adding the same amount of noise to the activations of the standard (non-BatchNorm) network prevents it from training entirely
Current theme: default
Less Wrong (text)
Less Wrong (link)
That’s a good point of clarification which perhaps weakens the point I was making there. From the paper,