carboniferous_umbraculum comments on Taking the parameters which seem to matter and rotating them until they don’t

carboniferous_umbraculum 30 Aug 2022 20:33 UTC
6 points
3
I’m not at liberty to share it directly but I am aware that Anthropic have a draft of small toy models with hand-coded synthetic data showing superposition very cleanly. They go as far as saying that searching for an interpretable basis may essentially be mistaken.