Huh, that’s indeed somewhat surprising if the SAE features are capturing the things that matter to CLIP (in that they reduce loss) and only those things, as opposed to “salient directions of variation in the data”. I’m curious exactly what “failing to work” means—here I think the negative result (and the exact details of said result) are argubaly more interesting than a positive result would be.
Huh, that’s indeed somewhat surprising if the SAE features are capturing the things that matter to CLIP (in that they reduce loss) and only those things, as opposed to “salient directions of variation in the data”. I’m curious exactly what “failing to work” means—here I think the negative result (and the exact details of said result) are argubaly more interesting than a positive result would be.