Do you really expect to find this phenomenon in larger NNs? What if the networks are pruned to remove extraneous degrees of freedom, so that we’re really talking about whether the “important” information has weak additivity?
Actually, someone found the Mode Connectivity in ViT (a quite larger NN I thought). Not sure if LLFC will be still satisfied on ViT but it worth a try. As for the pruned network, I believe that pruned network still holds the LLFC. (I think you refer the LLFC as “weak additivity”, right?)
Do you really expect to find this phenomenon in larger NNs? What if the networks are pruned to remove extraneous degrees of freedom, so that we’re really talking about whether the “important” information has weak additivity?
Actually, someone found the Mode Connectivity in ViT (a quite larger NN I thought). Not sure if LLFC will be still satisfied on ViT but it worth a try. As for the pruned network, I believe that pruned network still holds the LLFC. (I think you refer the LLFC as “weak additivity”, right?)