but it’s not clear to me that you couldn’t train a smaller, somewhat-lossy meta-SAE even on an idealized SAE, so long as the data distribution had rare events or rare properties you could thow away cheaply.
IMO am “idealized” SAE just has no structure relating features, so nothing for a meta SAE to find. I’m not sure this is possible or desirable, to be clear! But I think that’s what idealized units of analysis should look like
You could also play a similar game showing that latents in a larger SAE are “merely” compositions of latents in a smaller SAE.
I agree, we do this briefly later in the post, I believe. I see our contribution more as showing that this kind of thing is possible, than that meta SAEs are objectively the best tool for it
IMO am “idealized” SAE just has no structure relating features, so nothing for a meta SAE to find. I’m not sure this is possible or desirable, to be clear! But I think that’s what idealized units of analysis should look like
I agree, we do this briefly later in the post, I believe. I see our contribution more as showing that this kind of thing is possible, than that meta SAEs are objectively the best tool for it