This is certainly true for all convex loss landscapes in all dimensions, almost by definition of “convex”. I don’t think anyone understands very much about the properties of non-convex high-dimensional loss landscapes, but I can say that in the case of deep learning having monotonically decreasing loss on the linear path between the initialization and the end of training, the weights we obtain when we arbitrarily decide to stop gradient descent aren’t anywhere close to a local minimum of the landscape. Basically all networks of any useful size get stuck at saddle points at best, or we stop training before they even get the chance to be stuck. So it might be the case that a linear path to the actual minimum would not have monotonic loss at all, it’s just that high-dimensional spaces are so mindbogglingly large that we never explore far enough from the init point to have the chance to get behind a mountain in the landscape, so to speak.
This is certainly true for all convex loss landscapes in all dimensions, almost by definition of “convex”. I don’t think anyone understands very much about the properties of non-convex high-dimensional loss landscapes, but I can say that in the case of deep learning having monotonically decreasing loss on the linear path between the initialization and the end of training, the weights we obtain when we arbitrarily decide to stop gradient descent aren’t anywhere close to a local minimum of the landscape. Basically all networks of any useful size get stuck at saddle points at best, or we stop training before they even get the chance to be stuck. So it might be the case that a linear path to the actual minimum would not have monotonic loss at all, it’s just that high-dimensional spaces are so mindbogglingly large that we never explore far enough from the init point to have the chance to get behind a mountain in the landscape, so to speak.