Daniel Murfet comments on Alexander Gietelink Oldenziel’s Shortform

Daniel Murfet 17 Nov 2024 20:23 UTC
17 points
0
Re: the SLT dogma.

For those interested, a continuous version of the padding argument is used in Theorem 4.1 of Clift-Murfet-Wallbridge to show that the learning coefficient is a lower bound on the Kolmogorov complexity (in a sense) in the setting of noisy Turing machines. Just take the synthesis problem to be given by a TM’s input-output map in that theorem. The result is treated in a more detailed way in Waring’s thesis (Proposition 4.19). Noisy TMs are of course not neural networks, but they are a place where the link between the learning coefficient in SLT and algorithmic information theory has already been made precise.

For what it’s worth, as explained in simple versus short, I don’t actually think the local learning coefficient is algorithmic complexity (in the sense of program length) in neural networks, only that it is a lower bound. So I don’t really see the LLC as a useful “approximation” of the algorithmic complexity.
For those wanting to read more about the padding argument in the classical setting, Hutter-Catt-Quarel “An Introduction to Universal Artificial Intelligence” has a nice detailed treatment.
- Alexander Gietelink Oldenziel 17 Nov 2024 20:52 UTC
  5 points
  0
  Parent
  Thank you for the references Dan.
  
  I agree neural networks probably don’t actually satisfy the padding argument on the nose and agree that the exact degeneracy is quite interesting (as I say at the end of the op).
  
  I do think for large enough overparameterization the padding argument suggests the LLC might come close to the K-complexity in many cases. But more interestingly to me is that the padding argument doesn’t really require the programming language to be Turing-complete. In those cases the degeneracy will be proportional to complexity/simplicity measures that are specific to the programming language (/architecture class). Inshallah I will get to writing something about that soon.