In all of this, there seems to be an implicit assumption that the ordering of the embedding dimensions is consistent across layers, in the sense that “dog” is more strongly associated with dimension 12 in layers 2, 3, 4, etc.
I don’t see any reason why this should be the case from either a training or model structure perspective. How, then, does the logit lens (which should clearly not be invariant with regard to a permutation of its inputs) still produce valid results for some intermediate layers?
In all of this, there seems to be an implicit assumption that the ordering of the embedding dimensions is consistent across layers, in the sense that “dog” is more strongly associated with dimension 12 in layers 2, 3, 4, etc.
I don’t see any reason why this should be the case from either a training or model structure perspective. How, then, does the logit lens (which should clearly not be invariant with regard to a permutation of its inputs) still produce valid results for some intermediate layers?