tailcalled comments on Why I’m bearish on mechanistic interpretability: the shards are not in the network

tailcalled 14 Sep 2024 11:25 UTC
2 points
0
Nontrivial algorithms of LLMs require scaffolding and so aren’t really concentrated within the network’s internal computation flow. Even something as simple as generating text requires repeatedly feeding sampled tokens back to the network, which means that the network has an extra connection from outputs to input that is rarely modelled by mechanistic interpretability.