Kolmogorov complexity is definitely a misleading path here, and it’s unfortunate that Joar chose it as the “leading” example of complexity in the post. Note this passage:
However, they do not give a detailed answer to the question of precisely which complexity measure they minimise—they merely show that this result holds for many different complexity measures. For example, I would expect that fully connected neural networks are biased towards functions with low Boolean circuit complexity, or something very close to that. Verifying this claim, and deriving similar results about other kinds of network architectures, would make it easier to reason about what kinds of functions we should expect a neural network to be likely or unlikely to learn. This would also make it easier to reason about out-of-distribution generalisation, etc.
This quote from the above comment is better:
If we want to explain generalisation in neural networks, then we must explain if and how their inductive bias aligns with out (human) priors. Moreover, our human priors are (in most contexts) largely captured by computational complexity. Therefore, we must somewhere, in some way, connect neural networks to computational complexity.
Joar’s “DNNs are (kind of) Bayesian” (for the reasons that I don’t understand because I didn’t read their papers, so I just take his word here), and
Fields et al.‘s “brains are ‘almost’ Bayesian because Bayesian learning is information-efficient (= energy-efficient), and there is a strong evolutionary pressure for brains in animals to be energy-efficient”,
is this an explanation explanation of DNNs’ remarkable generalisation ability? Or more quantification should be added to both of these claims to turn this into a good explanation?
Thank you for this—I agree with what you are saying here. In the post, I went with a somewhat loose equivocation between “good priors” and “a prior towards low Kolmogorov complexity”, but this does skim past a lot of nuance. I do also very much not want to say that the DNN prior is exactly towards low Kolmogorov complexity (this would be uncomputable), but only that it is mostly correlated with Kolmogorov complexity for typical problems.
Kolmogorov complexity is definitely a misleading path here, and it’s unfortunate that Joar chose it as the “leading” example of complexity in the post. Note this passage:
This quote from the above comment is better:
I’ve expressed this idea with some links here:
Then if we combine two claims:
Joar’s “DNNs are (kind of) Bayesian” (for the reasons that I don’t understand because I didn’t read their papers, so I just take his word here), and
Fields et al.‘s “brains are ‘almost’ Bayesian because Bayesian learning is information-efficient (= energy-efficient), and there is a strong evolutionary pressure for brains in animals to be energy-efficient”,
is this an explanation explanation of DNNs’ remarkable generalisation ability? Or more quantification should be added to both of these claims to turn this into a good explanation?
Thank you for this—I agree with what you are saying here. In the post, I went with a somewhat loose equivocation between “good priors” and “a prior towards low Kolmogorov complexity”, but this does skim past a lot of nuance. I do also very much not want to say that the DNN prior is exactly towards low Kolmogorov complexity (this would be uncomputable), but only that it is mostly correlated with Kolmogorov complexity for typical problems.