Hoagy comments on Sparse Autoencoders Find Highly Interpretable Directions in Language Models

Hoagy 22 Sep 2023 15:01 UTC
LW: 3 AF: 2
0
AF
Hi Charlie, yep it’s in the paper—but I should say that we did not find a working CUDA-compatible version and used the scikit version you mention. This meant that the data volumes used are somewhat limited—still on the order of a million examples but 10-50x less than went into the autoencoders.

It’s not clear whether the extra data would provide much signal since it can’t learn an overcomplete basis and so has no way of learning rare features but it might be able to outperform our ICA baseline presented here, so if you wanted to give someone a project of making that available, I’d be interested to see it!