Imo they’re just completely different techniques, which aren’t really comparable. Activation patching is about understanding the difference between two activations by patching one to replace the other and seeing what happens. SAEs are a technique for decomposing an activation into interpretable pieces
Imo they’re just completely different techniques, which aren’t really comparable. Activation patching is about understanding the difference between two activations by patching one to replace the other and seeing what happens. SAEs are a technique for decomposing an activation into interpretable pieces