Logan Riggs comments on Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning

Logan Riggs 19 Jun 2024 15:40 UTC
LW: 3 AF: 2
0
AF
Thanks so much! All the links and info will save me time:)
Regarding cos-sim, after thinking a bit, I think it’s more sinister. For cross-cos-sim comparison, you get different results if you take the max over the 0th or 1st dimension (equivalent to doing cos(local, e2e) vs cos(e2e, local). As an example, you could have 2 features each, 3 point in the same direction and 1 points opposte. Making up numbers:
feature-directions(1D) = [ [1],[1]] & [[1],[-1]]
cos-sim = [[1, 1], [-1, −1]]
For more intuition, suppose 4 local features surround 1 e2e feature (and the other features are pointed elsewhere). Then the 4 local features will all have high max-cos sim but the e2e only has 1. So it’s not just double-counting, but quadruple counting. You could see for yourself if you swap your dim=1 to 0 in your code.
But my original comment showed your results are still directionally correct when doing [global max w/ replacement] (if I coded it correctly).
Btw it’s not intuitive to me that the encoder directions might be similar even though the decoder directions are not. Curious if you could share your intuitions here.
The decoder directions have degrees of freedom, but the encoder directions...might have similar degrees of freedom and I’m wrong, lol. BUT! they might be functionally equivalent, so they activate on similar datapoints across seeds. That is more laborious to check though, waaaah.
I can check both (encoder directions first) because previous literature is really only on the SVD of gradient (ie the output), but an SAE might be more constrained when separating out inputs into sparse features. Thanks for prompting for my intuition!