Vivek Hebbar comments on Information Loss --> Basin flatness

Vivek Hebbar 22 May 2022 0:07 UTC
LW: 1 AF: 1
AF
Yep, I am assuming MSE loss generally, but as you point out, any smooth and convex loss function will be locally approximately quadratic. “Saddle points all the way down” isn’t possible if a global min exists, since a saddle point implies the existence of an adjacent lower point. As for asymptotes, this is indeed possible, especially in classification tasks. I have basically ignored this and stuck to regression here.
I might return to the issue of classification / solutions at infinity in a later post, but for now I will say this: It doesn’t seem that much different, especially when it comes to manifold dimension; an m-dimensional manifold in parameter space generally extends to infinity, and it corresponds to an m-1 dimensional manifold in angle space (you can think of it as a hypersphere of asymptote directions).
I would say the main things neglected in this post are:
1. Manifold count (Most important neglected thing)
2. Basin width in non-infinite directions
3. Distance from the origin
These apply to both regression and classification.