I don’t doubt you can find may facts about SAE latents, I just don’t think they will be relevant for anything that matters.
I’m by-default bearish on neuroscience too, though it’s more nuanced there.
Edit: The “the real thinking happens in the scaffolding” is a reasonable argument (and current mech interp doesn’t address this) but that’s a different argument (and just means we understand individual forward passes with mech interp).
Feeding the output into the input isn’t much thinking. It just allows the thinking to occur in a very diffuse way.
I don’t doubt you can find may facts about SAE latents, I just don’t think they will be relevant for anything that matters.
I’m by-default bearish on neuroscience too, though it’s more nuanced there.
Feeding the output into the input isn’t much thinking. It just allows the thinking to occur in a very diffuse way.