leogao comments on Research Report: Alternative sparsity methods for sparse autoencoders with OthelloGPT.

leogao 15 Jun 2024 1:38 UTC
5 points
0
Did you use the initialization scheme in our paper where the decoder is initialized to the transpose of the encoder (and then columns unit normalized)? There should not be any dead latents with topk at small scale with this init.

Also, if I understand correctly, leaky topk is similar to the multi-topk method in our paper. I’d be interested in a comparison of the two methods.
- Andrew Quaisley 15 Jun 2024 18:27 UTC
  1 point
  0
  Parent
  I did not use your initialization scheme, since I was unaware of your paper at the time I was running those experiments. I will definitely try that soon!
  Yeah, I can see how leaky topk and multi-topk are doing similar things. I wonder if leaky topk also gives a progressive code past the value of k used in training. That definitely seems worth looking into. Thanks for the suggestions!