wassname comments on The Plan − 2023 Version

wassname Jan 4, 2024, 4:27 AM
1 point
0

how well the Toy Models work generalizes

It might be hard to scale it to large multilayer models too. The toy model was a single layer model, where the sparse autoencoder was quite big. iirc the latent space was 8 times as big as the residual stream. Imagine trying to interpret GPT4 with a autoencoder that big, and you need to do it over most layers, it’s intractable.

Maybe they can introduce more efficient ways to un-superposition the features, but it doesn’t look trivial.