Does this not essentially amount to just assuming that the inductive bias of neural networks in fact matches the prior that we (as humans) have about the world?
This is basically a justification of something like your point 1, but AFAICT it’s closer to a proof in the SLT setting than in your setting.
I think it could probably be turned into a proof in either setting, at least if we are allowed to help ourselves to assumptions like “the ground truth function is generated by a small neural net” and “learning is done in a Bayesian way”, etc.
Does this not essentially amount to just assuming that the inductive bias of neural networks in fact matches the prior that we (as humans) have about the world?
No? It amounts to assuming that smaller neural networks are a better match for the actual data generating process of the world.
The assumption that small neural networks are a good match for the actual data generating process of the world, is equivalent to the assumption that neural networks have an inductive bias that gives large weight to the actual data generating process of the world, if we also append the claim that neural networks have an inductive bias that gives large weight to functions which can be described by small neural networks (and this latter claim is not too difficult to justify, I think).
Does this not essentially amount to just assuming that the inductive bias of neural networks in fact matches the prior that we (as humans) have about the world?
I think it could probably be turned into a proof in either setting, at least if we are allowed to help ourselves to assumptions like “the ground truth function is generated by a small neural net” and “learning is done in a Bayesian way”, etc.
No? It amounts to assuming that smaller neural networks are a better match for the actual data generating process of the world.
The assumption that small neural networks are a good match for the actual data generating process of the world, is equivalent to the assumption that neural networks have an inductive bias that gives large weight to the actual data generating process of the world, if we also append the claim that neural networks have an inductive bias that gives large weight to functions which can be described by small neural networks (and this latter claim is not too difficult to justify, I think).