Thanks for the feedback! Yeah I was also surprised SAEs seem to work on ViTs pretty much straight out of the box (I didn’t even need to play around with the hyper parameters too much)! As I mentioned in the post I think it would be really interesting to train on a much larger (more typical) dataset—similar to the dataset the CLIP model was trained on.
I also agree that I probably should have emphasised the “guess the image” game as a result rather than an aside, I’ll bare that in mind for future posts!
Thanks for the feedback! Yeah I was also surprised SAEs seem to work on ViTs pretty much straight out of the box (I didn’t even need to play around with the hyper parameters too much)! As I mentioned in the post I think it would be really interesting to train on a much larger (more typical) dataset—similar to the dataset the CLIP model was trained on.
I also agree that I probably should have emphasised the “guess the image” game as a result rather than an aside, I’ll bare that in mind for future posts!