The direction extracted using the same method will vary per layer, but this doesn’t mean that the correct feature direction varies that much, but rather that it cannot be extracted using a linear function of the activations at too early/late layers.
The direction extracted using the same method will vary per layer, but this doesn’t mean that the correct feature direction varies that much, but rather that it cannot be extracted using a linear function of the activations at too early/late layers.