This paper finds that the probability that SGD finds a function is correlated with the posterior probability of a Gaussian process conditioned on the same data. Except if you use the Gaussian process they’re using to do predictions, it does not work as well as the NN. So you can’t explain that the NN works well by appealing that it’s similar to this particular Bayesian posterior.
Yup this changes my mind about the relevance of this paper.
Yup this changes my mind about the relevance of this paper.