Nontrivial algorithms of LLMs require scaffolding and so aren’t really concentrated within the network’s internal computation flow. Even something as simple as generating text requires repeatedly feeding sampled tokens back to the network, which means that the network has an extra connection from outputs to input that is rarely modelled by mechanistic interpretability.
Nontrivial algorithms of LLMs require scaffolding and so aren’t really concentrated within the network’s internal computation flow. Even something as simple as generating text requires repeatedly feeding sampled tokens back to the network, which means that the network has an extra connection from outputs to input that is rarely modelled by mechanistic interpretability.