tailcalled comments on Why I’m bearish on mechanistic interpretability: the shards are not in the network

tailcalled 14 Sep 2024 13:58 UTC
2 points
0
I don’t doubt you can find may facts about SAE latents, I just don’t think they will be relevant for anything that matters.
I’m by-default bearish on neuroscience too, though it’s more nuanced there.
Edit: The “the real thinking happens in the scaffolding” is a reasonable argument (and current mech interp doesn’t address this) but that’s a different argument (and just means we understand individual forward passes with mech interp).
Feeding the output into the input isn’t much thinking. It just allows the thinking to occur in a very diffuse way.