Daniel Murfet comments on Simple versus Short: Higher-order degeneracy and error-correction

Daniel Murfet 11 Mar 2024 18:58 UTC
23 points
13
Maybe I can clarify a few points here:
- A statistical model is regular if it is identifiable and the Fisher information matrix is everywhere nondegenerate. Statistical models where the prediction involves feeding samples from the input distribution through neural networks are not regular.
- Regular models are the ones for which there is a link between low description length and low free energy (i.e. the class of models which the Bayesian posterior tends to prefer are those that are assigned lower description length, at the same level of accuracy).
- It’s not really accurate to describe regular models as “typical”, especially not on LW where we are generally speaking about neural networks when we think of machine learning.
- It’s true that the example presented in this post is, potentially, not typical (it’s not a neural network nor is it a standard kind of statistical model). So it’s unclear to what extent this observation generalises. However, it does illustrate the general point that it is a mistake to presume that intuitions based on regular models hold for general statistical models.
- A pervasive failure mode in modern ML is to take intuitions developed for regular models, and assume they hold “with some caveats” for neural networks. We have at this point many examples where this leads one badly astray, and in my opinion the intuition I see widely shared here on LW about neural network inductive biases and description length falls into this bucket.
- I don’t claim to know the content of those inductive biases, but my guess is that it is much more interesting and complex than “something like description length”.