Yes, this is what I meant, reposting here insights @Arthur Conmy gave me on twitter
In general I expect the encoder directions to basically behave like the decoder direction with noise. This is because the encoder has to figure out how much features fire while keeping track of interfering features due to superposition. And this adjustment will make it messier
Did you also try to interpret input SAE features?
Do you mean SAE encoder weights by input features? We did not look into them.
Yes, this is what I meant, reposting here insights @Arthur Conmy gave me on twitter