Well neural networks do obey Occam’s razor, at least according to the formalisation of that statement that is contained in the post (namely, neural networks when formulated in the context of Bayesian learning obey the free energy formula, a generalisation of the BIC which is often thought of as a formalisation of Occam’s razor).
Would that not imply that my polynomial example also obeys Occam’s razor?
However, I accept your broader point, which I take to be: readers of these posts may naturally draw the conclusion that SLT currently says something profound about (ii) from my other post, and the use of terms like “generalisation” in broad terms in the more expository parts (as opposed to the technical parts) arguably doesn’t make enough effort to prevent them from drawing these inferences.
Yes, I think this probably is the case. I also think the vast majority of readers won’t go deep enough into the mathematical details to get a fine-grained understanding of what the maths is actually saying.
I’m often critical of the folklore-driven nature of the ML literature and what I view as its low scientific standards, and especially in the context of technical AI safety I think we need to aim higher, in both our technical and more public-facing work.
Yes, I very much agree with this too.
Does that sound reasonable?
Yes, absolutely!
At least right now, the value proposition I see of SLT lies not in explaining the “generalisation puzzle” but in understanding phase transitions and emergent structure; that might end up circling back to say something about generalisation, eventually.
I also think that SLT probably will be useful for understanding phase shifts and training dynamics (as I also noted in my post above), so we have no disagreements there either.
Would that not imply that my polynomial example also obeys Occam’s razor?
Yes, I think this probably is the case. I also think the vast majority of readers won’t go deep enough into the mathematical details to get a fine-grained understanding of what the maths is actually saying.
Yes, I very much agree with this too.
Yes, absolutely!
I also think that SLT probably will be useful for understanding phase shifts and training dynamics (as I also noted in my post above), so we have no disagreements there either.