I agree with your summary. I’m mainly just clarifying what my view is of the strength and overall role of the Algorithmic Information Theory arguments, since you said you found them unconvincing.
I do however disagree that those arguments can be applied to “literally any machine learning algorithm”, although they certainly do apply to a much larger class of ML algorithms than just neural networks. However, I also don’t think this is necessarily a bad thing. The picture that the AIT arguments give makes it reasonably unsurprising that you would get the double-descent phenomenon as you increase the size of a model (at small sizes VC-dimensionality mechanisms dominate, but at larger sizes the overparameterisation starts to induce a simplicity bias, which eventually starts to dominate). Since you get double descent in the model size for both neural networks and eg random forests, you should expect there to be some mechanism in common between them (even if the details of course differ from case to case).
I agree with your summary. I’m mainly just clarifying what my view is of the strength and overall role of the Algorithmic Information Theory arguments, since you said you found them unconvincing.
I do however disagree that those arguments can be applied to “literally any machine learning algorithm”, although they certainly do apply to a much larger class of ML algorithms than just neural networks. However, I also don’t think this is necessarily a bad thing. The picture that the AIT arguments give makes it reasonably unsurprising that you would get the double-descent phenomenon as you increase the size of a model (at small sizes VC-dimensionality mechanisms dominate, but at larger sizes the overparameterisation starts to induce a simplicity bias, which eventually starts to dominate). Since you get double descent in the model size for both neural networks and eg random forests, you should expect there to be some mechanism in common between them (even if the details of course differ from case to case).