johnswentworth comments on Parsing Chris Mingard on Neural Networks

johnswentworth 6 May 2021 23:02 UTC
LW: 19 AF: 9
0
AF
Minor note: I think “mappings with low Kolmogorov complexity occupy larger volumes” is an extremely misleading way to explain what the evidence points to here. (Yes, I know Chris used that language in his blog post, but he really should not have.) The intuition of “simple mappings occupy larger volumes” is an accurate summary, but the relevant complexity measure is very much not Kolmogorov complexity. It is a measure which does not account for every computable way of compressing things, only some specific ways of compressing things.
In a sense, the results are more interesting in this light, since they tell us something about which specific ways of compressing things are relevant to our particular world. Since we cannot compute all the various ways of compressing things (i.e. Kolmogorov complexity and basically everything adjacent to it is uncomputable), this is a useful thing to know.
What links here?
- Parsing Chris Mingard on Neural Networks by Alex Flint (6 May 2021 22:16 UTC; 69 points)
- Alex Flint 7 May 2021 16:19 UTC
  LW: 6 AF: 4
  AF Parent
  Thanks for this clarification John.
  
  In a sense, the results are more interesting in this light, since they tell us something about which specific ways of compressing things are relevant to our particular world
  
  But did Mingard et al show that there is some specific practical complexity measure that explains the size of the volumes occupied in parameter space better than alternative practical complexity measures? If so then think we would have uncovered an even more detailed understanding of which mappings occupy large volumes in parameter space, and since neural nets just generally work so well in the world, we could say that we know something about what kind of compression is relevant to our particular world. But if they just used some practical complexity measure as a rough proxy for Kolmogorov complexity then it leaves open the door for mappings occupying large volumes in parameter space to be simple according to many more ways of compressing things than were used in their particular complexity measure.
  - johnswentworth 8 May 2021 16:46 UTC
    LW: 2 AF: 2
    AF Parent
    Yeah, I don’t think there was anything particularly special about the complexity measure they used, and I wouldn’t be surprised if some other measures did as-well-or-better at predicting which functions fill large chunks of parameter space.
    - Chris Mingard 12 May 2021 18:16 UTC
      5 points
      Parent
      I 100% agree that Kolmogorov complexity is not the best measure of complexity here—and I would refer anyone to yours and Joar’s comments at https://www.lesswrong.com/posts/YSFJosoHYFyXjoYWa/why-neural-networks-generalise-and-why-they-are-kind-of for an excellent discussion of this. I am aware that Kolmogorov complexity is defined wrt a UTM, and I should have offered clarification in the blog that a lot of steps were used to make the link between Kolmogorov complexity and these types of input-output maps, and state that we only talk about Kolmgorov complexity because of the Levin bound (somewhat repurposed for input-output maps), which interestingly appears to capture the relationship between probabilities of functions and their complexities for several different complexity measures quite accurately.