Off the top of your head, do you know anything about/have any hypotheses about how double descent interacts with the gaussian processes interpretation of deep nets? It seems like the sort of theory which could potentially quantify the inductive bias of SGD.
The neural tangent kernel guys have a paper where they give a heuristic argument explaining the double descent curve(in number of parameters) using the NTK.
Off the top of your head, do you know anything about/have any hypotheses about how double descent interacts with the gaussian processes interpretation of deep nets? It seems like the sort of theory which could potentially quantify the inductive bias of SGD.
The neural tangent kernel guys have a paper where they give a heuristic argument explaining the double descent curve(in number of parameters) using the NTK.