For the more substantive results in section 4, I do believe the direction is always flat --> sharp.
I agree with this (with ‘sharp’ replaced by ‘generalise’, as I think you intend). It seems to me potentially interesting to ask whether this is necessarily the case.
When I said “the direction is always flat-->sharp”, I meant that their theorems showed you could produce a sharp minimum given a flat one, but not the other way around, sorry if I was unclear.
Definitely agreed that “under which conditions does flatness imply generalization” is a very interesting question. I think this paper has a reasonably satisfying analysis, although I also have some reservations about “SGD as Bayesian sampler” picture.
p.s.
I agree with this (with ‘sharp’ replaced by ‘generalise’, as I think you intend). It seems to me potentially interesting to ask whether this is necessarily the case.
When I said “the direction is always flat-->sharp”, I meant that their theorems showed you could produce a sharp minimum given a flat one, but not the other way around, sorry if I was unclear.
Definitely agreed that “under which conditions does flatness imply generalization” is a very interesting question. I think this paper has a reasonably satisfying analysis, although I also have some reservations about “SGD as Bayesian sampler” picture.