Minor note: I think “mappings with low Kolmogorov complexity occupy larger volumes” is an extremely misleading way to explain what the evidence points to here. (Yes, I know Chris used that language in his blog post, but he really should not have.) The intuition of “simple mappings occupy larger volumes” is an accurate summary, but the relevant complexity measure is very much not Kolmogorov complexity. It is a measure which does not account for every computable way of compressing things, only some specific ways of compressing things.
In a sense, the results are more interesting in this light, since they tell us something about which specific ways of compressing things are relevant to our particular world. Since we cannot compute all the various ways of compressing things (i.e. Kolmogorov complexity and basically everything adjacent to it is uncomputable), this is a useful thing to know.
In a sense, the results are more interesting in this light, since they tell us something about which specific ways of compressing things are relevant to our particular world
But did Mingard et al show that there is some specific practical complexity measure that explains the size of the volumes occupied in parameter space better than alternative practical complexity measures? If so then think we would have uncovered an even more detailed understanding of which mappings occupy large volumes in parameter space, and since neural nets just generally work so well in the world, we could say that we know something about what kind of compression is relevant to our particular world. But if they just used some practical complexity measure as a rough proxy for Kolmogorov complexity then it leaves open the door for mappings occupying large volumes in parameter space to be simple according to many more ways of compressing things than were used in their particular complexity measure.
Yeah, I don’t think there was anything particularly special about the complexity measure they used, and I wouldn’t be surprised if some other measures did as-well-or-better at predicting which functions fill large chunks of parameter space.
I 100% agree that Kolmogorov complexity is not the best measure of complexity here—and I would refer anyone to yours and Joar’s comments at https://www.lesswrong.com/posts/YSFJosoHYFyXjoYWa/why-neural-networks-generalise-and-why-they-are-kind-of for an excellent discussion of this. I am aware that Kolmogorov complexity is defined wrt a UTM, and I should have offered clarification in the blog that a lot of steps were used to make the link between Kolmogorov complexity and these types of input-output maps, and state that we only talk about Kolmgorov complexity because of the Levin bound (somewhat repurposed for input-output maps), which interestingly appears to capture the relationship between probabilities of functions and their complexities for several different complexity measures quite accurately.
Minor note: I think “mappings with low Kolmogorov complexity occupy larger volumes” is an extremely misleading way to explain what the evidence points to here. (Yes, I know Chris used that language in his blog post, but he really should not have.) The intuition of “simple mappings occupy larger volumes” is an accurate summary, but the relevant complexity measure is very much not Kolmogorov complexity. It is a measure which does not account for every computable way of compressing things, only some specific ways of compressing things.
In a sense, the results are more interesting in this light, since they tell us something about which specific ways of compressing things are relevant to our particular world. Since we cannot compute all the various ways of compressing things (i.e. Kolmogorov complexity and basically everything adjacent to it is uncomputable), this is a useful thing to know.
Thanks for this clarification John.
But did Mingard et al show that there is some specific practical complexity measure that explains the size of the volumes occupied in parameter space better than alternative practical complexity measures? If so then think we would have uncovered an even more detailed understanding of which mappings occupy large volumes in parameter space, and since neural nets just generally work so well in the world, we could say that we know something about what kind of compression is relevant to our particular world. But if they just used some practical complexity measure as a rough proxy for Kolmogorov complexity then it leaves open the door for mappings occupying large volumes in parameter space to be simple according to many more ways of compressing things than were used in their particular complexity measure.
Yeah, I don’t think there was anything particularly special about the complexity measure they used, and I wouldn’t be surprised if some other measures did as-well-or-better at predicting which functions fill large chunks of parameter space.
I 100% agree that Kolmogorov complexity is not the best measure of complexity here—and I would refer anyone to yours and Joar’s comments at https://www.lesswrong.com/posts/YSFJosoHYFyXjoYWa/why-neural-networks-generalise-and-why-they-are-kind-of for an excellent discussion of this. I am aware that Kolmogorov complexity is defined wrt a UTM, and I should have offered clarification in the blog that a lot of steps were used to make the link between Kolmogorov complexity and these types of input-output maps, and state that we only talk about Kolmgorov complexity because of the Levin bound (somewhat repurposed for input-output maps), which interestingly appears to capture the relationship between probabilities of functions and their complexities for several different complexity measures quite accurately.