Thanks! I definitely believe this, and I think we have a lot of evidence for this in both toy models and LLMs (I’m planning a couple of posts on this idea of “training stories”), and also theoretical reasons in some contexts. I’m not sure how easy it is to extend the specific approach used in the proof for parity to a general context. I think it inherently uses the fact of orthogonality of Fourier functions on boolean inputs, and understanding other ML algorithms in terms of nice orthogonal functions seems hard to do rigorously, unless you either make some kind of simplifying “presumption of independence” model on learnable algorithms or work in a toy context. In the toy case, there is a nice paper that does exactly this (explains how NN’s will tend to find “incrementally learnable” algorithms), by using a similar idea to the parity proof I outlined. This is the leap complexity paper (that Kaarel and I have looked into; I think you’ve also looked into related things)
Thanks! I definitely believe this, and I think we have a lot of evidence for this in both toy models and LLMs (I’m planning a couple of posts on this idea of “training stories”), and also theoretical reasons in some contexts. I’m not sure how easy it is to extend the specific approach used in the proof for parity to a general context. I think it inherently uses the fact of orthogonality of Fourier functions on boolean inputs, and understanding other ML algorithms in terms of nice orthogonal functions seems hard to do rigorously, unless you either make some kind of simplifying “presumption of independence” model on learnable algorithms or work in a toy context. In the toy case, there is a nice paper that does exactly this (explains how NN’s will tend to find “incrementally learnable” algorithms), by using a similar idea to the parity proof I outlined. This is the leap complexity paper (that Kaarel and I have looked into; I think you’ve also looked into related things)