In the limit of infinite SAE width and infinite (iid) training data, you can get perfect reconstruction and perfect sparsity (both L0 and L1). We can think of this as maximal feature splitting. Obviously, this is undesirable, because you’ve discarded all of the structure present in your data.
Therefore, reconstruction and sparsity aren’t exactly the thing we most fundamentally care about. It just happens to do something reasonable at practical scales. However, that doesn’t mean we have to throw it out—we might hope that it gives us enough of a foothold in practice.
In particular, the maximal feature splitting case requires exponentially many latents. We might believe that in practice, on the spectrum from splitting too little (polysemanticity) to splitting too much, erring on the side of splitting too much is preferable, because we can still do circuit finding and so on if we artificially cut some existing features into smaller pieces.
Regarding achieving perfect reconstruction and perfect sparsity in the limit, I was also thinking along those lines i.e. in the limit you could have a single neuron in the sparse layer for every possible input direction. However please correct me if I’m wrong but assuming the SAE has only one hidden layer then I don’t think you could prevent neurons from activating for nearby input directions (unless all input directions had equal magnitude), so you’d end up with many neurons activating for any given input and thus imperfect sparsity.
Otherwise mostly agreed. Though as discussed, as well as making it necessary to figure out how to break apart feature combinations (as you said), feature splitting would also seem to incur the risk of less common “true features” not being represented even within combinations so those would get missed entirely.
In the limit of infinite SAE width and infinite (iid) training data, you can get perfect reconstruction and perfect sparsity (both L0 and L1). We can think of this as maximal feature splitting. Obviously, this is undesirable, because you’ve discarded all of the structure present in your data.
Therefore, reconstruction and sparsity aren’t exactly the thing we most fundamentally care about. It just happens to do something reasonable at practical scales. However, that doesn’t mean we have to throw it out—we might hope that it gives us enough of a foothold in practice.
In particular, the maximal feature splitting case requires exponentially many latents. We might believe that in practice, on the spectrum from splitting too little (polysemanticity) to splitting too much, erring on the side of splitting too much is preferable, because we can still do circuit finding and so on if we artificially cut some existing features into smaller pieces.
Regarding achieving perfect reconstruction and perfect sparsity in the limit, I was also thinking along those lines i.e. in the limit you could have a single neuron in the sparse layer for every possible input direction. However please correct me if I’m wrong but assuming the SAE has only one hidden layer then I don’t think you could prevent neurons from activating for nearby input directions (unless all input directions had equal magnitude), so you’d end up with many neurons activating for any given input and thus imperfect sparsity.
Otherwise mostly agreed. Though as discussed, as well as making it necessary to figure out how to break apart feature combinations (as you said), feature splitting would also seem to incur the risk of less common “true features” not being represented even within combinations so those would get missed entirely.