Have you seen this paper? They find that “SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large.”
Have you seen this paper? They find that “SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large.”