SGD is a form of efficient approximate Bayesian updating.
Yeah I saw you were arguing that in one of your posts. I’ll take a closer look. I honestly have not heard of this before.
Regarding my statement—I agree looking back at it it is horribly sloppy and sounds absurd, but when I was writing I was just thinking about how all L1 and L2 regularization do is bias towards smaller weights—the models still take up the same amount of space on disk and require the same amount amount of compute to run in terms of FLOPs. But yes you’re right they make the models easier to approximate.
So actually L1/L2 regularization does allow you to compress the model by reducing entropy, as evidenced by the fact that any effective pruning/quantization system necessarily involves some strong regularizer applied during training or after.
The model itself can’t possibly know or care whether you later actually compress said weights or not, so it’s never the actual compression itself that matters, vs the inherent compressibility (which comes from the regularization).
Yeah I saw you were arguing that in one of your posts. I’ll take a closer look. I honestly have not heard of this before.
Regarding my statement—I agree looking back at it it is horribly sloppy and sounds absurd, but when I was writing I was just thinking about how all L1 and L2 regularization do is bias towards smaller weights—the models still take up the same amount of space on disk and require the same amount amount of compute to run in terms of FLOPs. But yes you’re right they make the models easier to approximate.
So actually L1/L2 regularization does allow you to compress the model by reducing entropy, as evidenced by the fact that any effective pruning/quantization system necessarily involves some strong regularizer applied during training or after.
The model itself can’t possibly know or care whether you later actually compress said weights or not, so it’s never the actual compression itself that matters, vs the inherent compressibility (which comes from the regularization).