That’s a good point of clarification which perhaps weakens the point I was making there. From the paper,
adding the same amount of noise to the activations of the standard (non-BatchNorm) network prevents it from training entirely
That’s a good point of clarification which perhaps weakens the point I was making there. From the paper,