The peaks at 0.05 and 0.3 are strange. What regulariser did you use? Also, could you check whether all features whose nearest neighbour has cosine similarity 0.3 have the same nearest neighbour (and likewise for 0.05)?
I expect the 0.05 peak might be the minimum cosine similarity if you want to distribute 8192 vectors over a 512-dimensional space uniformly? I used a bit of a weird regularizer where I penalized:
mean cosine similarity + mean max cosine similarity + max max cosine similarity
I will check later whether the 0.3 peak all have the same neighbour.
Nice, that’s promising! It would also be interesting to see how those peaks are affected when you retrain the SAE both on the same target model and on different target models.
The peaks at 0.05 and 0.3 are strange. What regulariser did you use? Also, could you check whether all features whose nearest neighbour has cosine similarity 0.3 have the same nearest neighbour (and likewise for 0.05)?
I expect the 0.05 peak might be the minimum cosine similarity if you want to distribute 8192 vectors over a 512-dimensional space uniformly? I used a bit of a weird regularizer where I penalized:
mean cosine similarity + mean max cosine similarity + max max cosine similarity
I will check later whether the 0.3 peak all have the same neighbour.
Nice, that’s promising! It would also be interesting to see how those peaks are affected when you retrain the SAE both on the same target model and on different target models.