Mechanistically dissimilar algorithms can be “mode connected”—that is, local minima-ish that are connected by a path of local minima (the paper proves this for their definition of “mechanistically similar”)
Mea culpa: AFAICT, the ‘proof’ in Mechanistic Mode Connectivity fails. It basically goes:
Prior work has shown that under overparametrization, all global loss minimizers are mode connected.
Therefore, mechanistically distinct global loss minimizers are also mode connected.
The problem is that prior work made the assumption that for a net of the right size, there’s only one loss minimizer up to permutation—aka there are no mechanistically distinct loss minimizers.
[EDIT: the proof also cites Nguyen (2019) in support of its arguments. I haven’t checked the proof in Nguyen (2019), but if it holds up, it does substantiate the claim in Mechanistic Mode Connectivity—altho if I’m reading it correctly you need so much overparameterization that the neural net has a layer with as many hidden neurons as there are training data points.]
Mea culpa: AFAICT, the ‘proof’ in Mechanistic Mode Connectivity fails. It basically goes:
Prior work has shown that under overparametrization, all global loss minimizers are mode connected.
Therefore, mechanistically distinct global loss minimizers are also mode connected.
The problem is that prior work made the assumption that for a net of the right size, there’s only one loss minimizer up to permutation—aka there are no mechanistically distinct loss minimizers.
[EDIT: the proof also cites Nguyen (2019) in support of its arguments. I haven’t checked the proof in Nguyen (2019), but if it holds up, it does substantiate the claim in Mechanistic Mode Connectivity—altho if I’m reading it correctly you need so much overparameterization that the neural net has a layer with as many hidden neurons as there are training data points.]
Update: I currently think that Nguyen (2019) proves the claim, but it actually requires a layer to have two hidden neurons per training example.