I’m not sure if the number of near zero eigenvalues is the right thing to look at.
If the training process is walking around the parameter space until it “stumbles on” a basin, what’s relevant for which basin is found isn’t just the size of the basin floor, it’s also how big the basin walls are. Analogy: A very narrow cylindrical hole in a flat floor may be harder to fall into than a very wide, sloped hole. Even though the bottom of the later may be just a single point.
I’ve typically operated under the assumption that something like “basin volume” may be closer to the thing that matters, the difference to the dimensionality picture being that the typical size of non-zero eigenvalues can also be quite relevant.
Have you tried looking at complete eigenvalue spectra of your most common minima in these graphs, and compared them to the spectra of minima of higher dimension? Do the later by any chance have significantly bigger eigenvalues outside the ones that are close to zero?
I’m not sure if the number of near zero eigenvalues is the right thing to look at.
If the training process is walking around the parameter space until it “stumbles on” a basin, what’s relevant for which basin is found isn’t just the size of the basin floor, it’s also how big the basin walls are. Analogy: A very narrow cylindrical hole in a flat floor may be harder to fall into than a very wide, sloped hole. Even though the bottom of the later may be just a single point.
I’ve typically operated under the assumption that something like “basin volume” may be closer to the thing that matters, the difference to the dimensionality picture being that the typical size of non-zero eigenvalues can also be quite relevant.
Have you tried looking at complete eigenvalue spectra of your most common minima in these graphs, and compared them to the spectra of minima of higher dimension? Do the later by any chance have significantly bigger eigenvalues outside the ones that are close to zero?