Joseph Bloom comments on Claude 3.5 Sonnet

Joseph Bloom 21 Jun 2024 12:06 UTC
4 points
3
Maybe we should make fake datasets for this? Neurons often aren’t that interpretable and we’re still confused about SAE features a lot of the time. It would be nice to distinguish “can do autointerp | interpretable generating function of complexity x” from “can do autointerp”.