Can you give an example of L which has the mode of singularity you’re talking about? I don’t think I’m quite following what you’re talking about here.
In SLT L is assumed analytic, so I don’t understand how the Hessian can fail to be well-defined anywhere. It’s possible that the Hessian vanishes at some point, suggesting that the singularity there is even worse than quadratic, e.g.L(x,y)=x2y2 at the origin or something like that. But even in this regime essentially the same logic is going to apply—the worse the singularity, the further away you can move from it without changing the value of L very much, and accordingly the singularity contributes more to the volume of the set L(x)<Lmin+ε as ε→0.
In SLT L is assumed analytic, so I don’t understand how the Hessian can fail to be well-defined
Yeah sorry that was probably needlessly confusing, I was just referencing the image in Jesse’s tweet for ease of illustration(you’re right that it’s not analytic, I’m not sure what’s going on there) The Hessian could also just be 0 at a self-intersection point like in the example you gave. That’s the sort of case I had in mind. I was confused by your earlier comment because it sounded like you were just describing a valley of dimension r, but as you say there could be isolated points like that also.
I still maintain that this behavior—of volume clustering near singularities when considering a narrow band about the loss minimum—is the main distinguishing feature of SLT and so could use a mention in the OP.
Can you give an example of L which has the mode of singularity you’re talking about? I don’t think I’m quite following what you’re talking about here.
In SLT L is assumed analytic, so I don’t understand how the Hessian can fail to be well-defined anywhere. It’s possible that the Hessian vanishes at some point, suggesting that the singularity there is even worse than quadratic, e.g.L(x,y)=x2y2 at the origin or something like that. But even in this regime essentially the same logic is going to apply—the worse the singularity, the further away you can move from it without changing the value of L very much, and accordingly the singularity contributes more to the volume of the set L(x)<Lmin+ε as ε→0.
Yeah sorry that was probably needlessly confusing, I was just referencing the image in Jesse’s tweet for ease of illustration(you’re right that it’s not analytic, I’m not sure what’s going on there) The Hessian could also just be 0 at a self-intersection point like in the example you gave. That’s the sort of case I had in mind. I was confused by your earlier comment because it sounded like you were just describing a valley of dimension r, but as you say there could be isolated points like that also.
I still maintain that this behavior—of volume clustering near singularities when considering a narrow band about the loss minimum—is the main distinguishing feature of SLT and so could use a mention in the OP.