Oliver Sourbut comments on Information Loss --> Basin flatness

Oliver Sourbut 21 May 2022 20:25 UTC
LW: 2 AF: 1
AF
Interesting stuff! I’m still getting my head around it, but I think implicit in a lot of this is that loss is some quadratic function of ‘behaviour’ - is that right? If so, it could be worth spelling that out. Though maybe in a small neighbourhood of a local minimum this is approximately true anyway?

This also brings to mind the question of what happens when we’re in a region with no local minimum (e.g. saddle points all the way down, or asymptoting to a lower loss, etc.)
- Vivek Hebbar 22 May 2022 0:07 UTC
  LW: 1 AF: 1
  AF Parent
  Yep, I am assuming MSE loss generally, but as you point out, any smooth and convex loss function will be locally approximately quadratic. “Saddle points all the way down” isn’t possible if a global min exists, since a saddle point implies the existence of an adjacent lower point. As for asymptotes, this is indeed possible, especially in classification tasks. I have basically ignored this and stuck to regression here.
  I might return to the issue of classification / solutions at infinity in a later post, but for now I will say this: It doesn’t seem that much different, especially when it comes to manifold dimension; an m-dimensional manifold in parameter space generally extends to infinity, and it corresponds to an m-1 dimensional manifold in angle space (you can think of it as a hypersphere of asymptote directions).
  I would say the main things neglected in this post are:
  1. Manifold count (Most important neglected thing)
  2. Basin width in non-infinite directions
  3. Distance from the origin
  These apply to both regression and classification.