The perspective and the computations that are presented here (which in my opinion are representative of the mathematical parts of the linked posts and of various other unnamed posts) do not use any significant facts about neural networks or their architecture.
You’re correct that the written portion of the Information Loss --> Basin flatness post doesn’t use any non-trivial facts about NNs. The purpose of the written portion was to explain some mathematical groundwork, which is then used for the non-trivial claim. (I did not know at the time that there was a standard name “Submersion theorem”. I had also made formal mistakes, which I am glad you pointed out in your comments. The essence was mostly valid though.) The non-trivial claim occurs in the video section of the post, where a sort of degeneracy occuring in ReLU MLPs is examined. I now no longer believe that the precise form of my claim is relevant to practical networks. An approximate form (where low rank is replaced with something similar to low determinant) seems salvageable, though still of dubious value, since I think I have better framings now.
Secondly, the use of the submersion theorem here only makes sense when N>kD.
Agreed. I was addressing the overparameterized case, not the underparameterized one. In hindsight, I should have mentioned this at the very beginning of the post—my bad.
All in all, I don’t think my original post held up well. I guess I was excited to pump out the concept quickly, before the dust settled. Maybe this was a mistake? Usually I make the ~opposite error of never getting around to posting things.
I think there should be a space both for in-progress research dumps and for more worked out final research reports on the forum. Maybe it would make sense to have separate categories for them or so.
You’re correct that the written portion of the Information Loss --> Basin flatness post doesn’t use any non-trivial facts about NNs. The purpose of the written portion was to explain some mathematical groundwork, which is then used for the non-trivial claim. (I did not know at the time that there was a standard name “Submersion theorem”. I had also made formal mistakes, which I am glad you pointed out in your comments. The essence was mostly valid though.) The non-trivial claim occurs in the video section of the post, where a sort of degeneracy occuring in ReLU MLPs is examined. I now no longer believe that the precise form of my claim is relevant to practical networks. An approximate form (where low rank is replaced with something similar to low determinant) seems salvageable, though still of dubious value, since I think I have better framings now.
Agreed. I was addressing the overparameterized case, not the underparameterized one. In hindsight, I should have mentioned this at the very beginning of the post—my bad.
(Sorry for the very late response)
All in all, I don’t think my original post held up well. I guess I was excited to pump out the concept quickly, before the dust settled. Maybe this was a mistake? Usually I make the ~opposite error of never getting around to posting things.
I think there should be a space both for in-progress research dumps and for more worked out final research reports on the forum. Maybe it would make sense to have separate categories for them or so.