What if we constrain v to be in some subspace that is actually used by the MLP? (We can get it from PCA over activations on many inputs.)
This way v won’t have any dormant component, so the MLP output after patching also cannot use that dormant pathway.
What if we constrain v to be in some subspace that is actually used by the MLP? (We can get it from PCA over activations on many inputs.)
This way v won’t have any dormant component, so the MLP output after patching also cannot use that dormant pathway.