Louka Ewington-Pitsos comments on Efficient Dictionary Learning with Switch Sparse Autoencoders

Louka Ewington-Pitsos 19 Aug 2024 4:06 UTC
1 point
0
Just to close the loop on this one, the official huggingface transformers library just uses a for-loop to achieve MoE. I also implemented a version myself using a for loop and it’s much more efficient than either vanilla matrix multiplication or that weird batch matmul I write up there for large latent and batch sizes.