I don’t have any empirical evidence, but we can think about what a flat minimum with high noise would mean. It would probably mean the system is able to predict some data points very well, and other data points very poorly, and both of these are robust: we can make large changes to the parameters while still predicting the predictable data points about-as-well, and the unpredictable data points about-as-poorly. In human terms, it would be like having a paradigm in which certain phenomena are very predictable, and other phenomena look like totally-random noise without any hint that they even could be predictable.
Not sure what it would look like in the perfect-training-prediction regime, though.
I don’t have any empirical evidence, but we can think about what a flat minimum with high noise would mean. It would probably mean the system is able to predict some data points very well, and other data points very poorly, and both of these are robust: we can make large changes to the parameters while still predicting the predictable data points about-as-well, and the unpredictable data points about-as-poorly. In human terms, it would be like having a paradigm in which certain phenomena are very predictable, and other phenomena look like totally-random noise without any hint that they even could be predictable.
Not sure what it would look like in the perfect-training-prediction regime, though.