My reading of the post says that two algorithms, with different generalization and structural properties, can lie in the same basin, and it uses evidence from our knowledge of the mechanisms behind grokking on synthetic data to make this point. But the above papers show that in more realistic settings empirically, two models lie in the same basin (up to permutation symmetries) if and only if they have similar generalization and structural properties.
the above papers show that in more realistic settings empirically, two models lie in the same basin (up to permutation symmetries) if and only if they have similar generalization and structural properties.
I think they only check if they lie in linearly-connected bits of the same basin if they have similar generalization properties? E.g. Figure 4 of Mechanistic Mode Connectivity is titled “Non-Linear Mode Connectivity of Mechanistically Dissimilar Models” and the subtitle states that “quadratic paths can be easily identified to mode connect mechanistically dissimilar models[, and] linear paths cannot be identified, even after permutation”. Linear Connectivity Reveals Generalization Strategies seems to be focussed on linear mode connectivity, rather than more general mode connectivity.
My reading of the post says that two algorithms, with different generalization and structural properties, can lie in the same basin, and it uses evidence from our knowledge of the mechanisms behind grokking on synthetic data to make this point. But the above papers show that in more realistic settings empirically, two models lie in the same basin (up to permutation symmetries) if and only if they have similar generalization and structural properties.
I think they only check if they lie in linearly-connected bits of the same basin if they have similar generalization properties? E.g. Figure 4 of Mechanistic Mode Connectivity is titled “Non-Linear Mode Connectivity of Mechanistically Dissimilar Models” and the subtitle states that “quadratic paths can be easily identified to mode connect mechanistically dissimilar models[, and] linear paths cannot be identified, even after permutation”. Linear Connectivity Reveals Generalization Strategies seems to be focussed on linear mode connectivity, rather than more general mode connectivity.