When I said they didn’t produce flat minima with poor generalization, I meant “they didn’t produce flat minima with poor generalization in the normal parameter space of a neural network”. This is what is relevant to the “flatness as an explanation for generalization” hypothesis, since that hypothesis is about how flatness correlates with generalization in neural networks in practice. That there exist other parameter-function mappings where this is not the case does not refute this. Of course, if you wished to produce a mathematical proof that flat minima generalize this would be an important example to keep in mind—but this post was about high-level scientific hypotheses about neural network generalization, not mathematical proofs. In that context I think it’s correct to say that the paper does not provide a meaningful counterexample.
When I said they didn’t produce flat minima with poor generalization, I meant “they didn’t produce flat minima with poor generalization in the normal parameter space of a neural network”. This is what is relevant to the “flatness as an explanation for generalization” hypothesis, since that hypothesis is about how flatness correlates with generalization in neural networks in practice. That there exist other parameter-function mappings where this is not the case does not refute this. Of course, if you wished to produce a mathematical proof that flat minima generalize this would be an important example to keep in mind—but this post was about high-level scientific hypotheses about neural network generalization, not mathematical proofs. In that context I think it’s correct to say that the paper does not provide a meaningful counterexample.