Adrià Garriga-alonso comments on On Frequentism and Bayesian Dogma

Adrià Garriga-alonso 18 Oct 2023 5:24 UTC
3 points
0

I don’t think I understand your model of why neural networks are so effective. It sounds like you say that on the one hand neural networks have lots of parameters, so you should expect them to be terrible, but they are actually very good because SGD is a such a shitty optimizer on the other hand that it acts as an implicit regularizer.

Yeah, that’s basically my model. How it regularizes I don’t know. Perhaps the volume of “simple” functions is the main driver of this, rather than gradient descent dynamics. I think the randomness of it is important; full-gradient descent (no stochasticity) would not work nearly as well.
- Garrett Baker 18 Oct 2023 5:40 UTC
  2 points
  0
  Parent
  Oh this reminded me of the temperature component of SLT, which I believe modulates how sharply one should sample from the bayesian posterior, or perhaps how heavily to update on new evidence. I forget. In any case, it does this to try to capture the stochasticity component of SGD. Its still an open problem to show how successfully though, I believe.