Yep, I am assuming MSE loss generally, but as you point out, any smooth and convex loss function will be locally approximately quadratic. “Saddle points all the way down” isn’t possible if a global min exists, since a saddle point implies the existence of an adjacent lower point. As for asymptotes, this is indeed possible, especially in classification tasks. I have basically ignored this and stuck to regression here.
I might return to the issue of classification / solutions at infinity in a later post, but for now I will say this: It doesn’t seem that much different, especially when it comes to manifold dimension; an m-dimensional manifold in parameter space generally extends to infinity, and it corresponds to an m-1 dimensional manifold in angle space (you can think of it as a hypersphere of asymptote directions).
I would say the main things neglected in this post are:
Manifold count (Most important neglected thing)
Basin width in non-infinite directions
Distance from the origin
These apply to both regression and classification.
Yep, I am assuming MSE loss generally, but as you point out, any smooth and convex loss function will be locally approximately quadratic. “Saddle points all the way down” isn’t possible if a global min exists, since a saddle point implies the existence of an adjacent lower point. As for asymptotes, this is indeed possible, especially in classification tasks. I have basically ignored this and stuck to regression here.
I might return to the issue of classification / solutions at infinity in a later post, but for now I will say this: It doesn’t seem that much different, especially when it comes to manifold dimension; an m-dimensional manifold in parameter space generally extends to infinity, and it corresponds to an m-1 dimensional manifold in angle space (you can think of it as a hypersphere of asymptote directions).
I would say the main things neglected in this post are:
Manifold count (Most important neglected thing)
Basin width in non-infinite directions
Distance from the origin
These apply to both regression and classification.