For those interested, a continuous version of the padding argument is used in Theorem 4.1 of Clift-Murfet-Wallbridge to show that the learning coefficient is a lower bound on the Kolmogorov complexity (in a sense) in the setting of noisy Turing machines. Just take the synthesis problem to be given by a TM’s input-output map in that theorem. The result is treated in a more detailed way in Waring’s thesis (Proposition 4.19). Noisy TMs are of course not neural networks, but they are a place where the link between the learning coefficient in SLT and algorithmic information theory has already been made precise.
For what it’s worth, as explained in simple versus short, I don’t actually think the local learning coefficient is algorithmic complexity (in the sense of program length) in neural networks, only that it is a lower bound. So I don’t really see the LLC as a useful “approximation” of the algorithmic complexity.
For those wanting to read more about the padding argument in the classical setting, Hutter-Catt-Quarel “An Introduction to Universal Artificial Intelligence” has a nice detailed treatment.
I agree neural networks probably don’t actually satisfy the padding argument on the nose and agree that the exact degeneracy is quite interesting (as I say at the end of the op).
I do think for large enough overparameterization the padding argument suggests the LLC might come close to the K-complexity in many cases. But more interestingly to me is that the padding argument doesn’t really require the programming language to be Turing-complete. In those cases the degeneracy will be proportional to complexity/simplicity measures that are specific to the programming language (/architecture class).
Inshallah I will get to writing something about that soon.
Re: the SLT dogma.
For those interested, a continuous version of the padding argument is used in Theorem 4.1 of Clift-Murfet-Wallbridge to show that the learning coefficient is a lower bound on the Kolmogorov complexity (in a sense) in the setting of noisy Turing machines. Just take the synthesis problem to be given by a TM’s input-output map in that theorem. The result is treated in a more detailed way in Waring’s thesis (Proposition 4.19). Noisy TMs are of course not neural networks, but they are a place where the link between the learning coefficient in SLT and algorithmic information theory has already been made precise.
For what it’s worth, as explained in simple versus short, I don’t actually think the local learning coefficient is algorithmic complexity (in the sense of program length) in neural networks, only that it is a lower bound. So I don’t really see the LLC as a useful “approximation” of the algorithmic complexity.
For those wanting to read more about the padding argument in the classical setting, Hutter-Catt-Quarel “An Introduction to Universal Artificial Intelligence” has a nice detailed treatment.
Thank you for the references Dan.
I agree neural networks probably don’t actually satisfy the padding argument on the nose and agree that the exact degeneracy is quite interesting (as I say at the end of the op).
I do think for large enough overparameterization the padding argument suggests the LLC might come close to the K-complexity in many cases. But more interestingly to me is that the padding argument doesn’t really require the programming language to be Turing-complete. In those cases the degeneracy will be proportional to complexity/simplicity measures that are specific to the programming language (/architecture class). Inshallah I will get to writing something about that soon.