In such a case, I claim this is just sneaking in bayes rule without calling it by name, and this is not a very smart thing to do, because the bayesian frame gives you a bunch more leverage on analyzing the system
I disagree. An inductive bias is not necessarily a prior distribution. What’s the prior?
The prior assigns uniform probability to all weights, and I believe a good understanding of the mapping from weights to functions is unknown, though lots of the time there are many directions you can move in in the weight space which don’t change your function, so one would expect its a relatively compressive mapping (in contrast to, say, a polynomial parameterization, where the mapping is one-to-one).
Also, side-comment: Thanks for the discussion! Its fun.
EDIT: Actually, there should be a term for the stochasticity which you integrate into the SLT equations like you would temperature in a physical system. I don’t remember exactly how this works though. Or if its even known the exact connection with SGD.
I disagree. An inductive bias is not necessarily a prior distribution. What’s the prior?
From another comment of mine:
Also, side-comment: Thanks for the discussion! Its fun.
EDIT: Actually, there should be a term for the stochasticity which you integrate into the SLT equations like you would temperature in a physical system. I don’t remember exactly how this works though. Or if its even known the exact connection with SGD.