I’m very happy with running counting arguments over the actual neural network parameter space; the problem there is just that I don’t think we understand it well enough to do so effectively.
This is basically my position as well
The cited argument is a counting argument over the space of functions which achieve zero/low training loss.
You could instead try to put a measure directly over the functions in your setup, but the problem there is that function space really isn’t the right space to run a counting argument like this; you need to be in algorithm space, otherwise you’ll do things like what happens in this post where you end up predicting overfitting rather than generalization (which implies that you’re using a prior that’s not suitable for running counting arguments on).
Indeed, this is a crucial point that I think the post is trying to make. The cited counting arguments are counting functions instead of parameterizations. That’s the mistake (or, at least “a” mistake). I’m glad we agree it’s a mistake, but then I’m confused why you think that part of the post is unsound.
(Rereads)
Rereading the portion in question now, it seems that they changed it a lot since the draft. Personally, I think their argumentation is now weaker than it was before. The original argumentation clearly explained the mistake of counting functions instead of parameterizations, while the present post does not. It instead abstracts it as “an indifference principle”, where the reader has to do the work to realize that indifference over functions is inappropriate.
I’m sorry to hear that you think the argumentation is weaker now.
the reader has to do the work to realize that indifference over functions is inappropriate
I don’t think that indifference over functions in particular is inappropriate. I think indifference reasoning in general is inappropriate.
I’m very happy with running counting arguments over the actual neural network parameter space
I wouldn’t call the correct version of this a counting argument. The correct version uses the actual distribution used to initialize the parameters as a measure, and not e.g. the Lebesgue measure. This isn’t appealing to the indifference principle at all, and so in my book it’s not a counting argument. But this could be terminological.
This is basically my position as well
The cited argument is a counting argument over the space of functions which achieve zero/low training loss.
Indeed, this is a crucial point that I think the post is trying to make. The cited counting arguments are counting functions instead of parameterizations. That’s the mistake (or, at least “a” mistake). I’m glad we agree it’s a mistake, but then I’m confused why you think that part of the post is unsound.
(Rereads)
Rereading the portion in question now, it seems that they changed it a lot since the draft. Personally, I think their argumentation is now weaker than it was before. The original argumentation clearly explained the mistake of counting functions instead of parameterizations, while the present post does not. It instead abstracts it as “an indifference principle”, where the reader has to do the work to realize that indifference over functions is inappropriate.
I’m sorry to hear that you think the argumentation is weaker now.
I don’t think that indifference over functions in particular is inappropriate. I think indifference reasoning in general is inappropriate.
I wouldn’t call the correct version of this a counting argument. The correct version uses the actual distribution used to initialize the parameters as a measure, and not e.g. the Lebesgue measure. This isn’t appealing to the indifference principle at all, and so in my book it’s not a counting argument. But this could be terminological.