This was super interesting. I hadn’t really thought about the tension between SLT and superposition before, but this is in the middle of it.
Like, there’s nothing logically inconsistent with the best local basis for the weights being undercomplete while the best basis for the activations is overcomplete. But if both are true, it seems like the relationship to the data distribution has to be quite special (and potentially fragile).
This was super interesting. I hadn’t really thought about the tension between SLT and superposition before, but this is in the middle of it.
Like, there’s nothing logically inconsistent with the best local basis for the weights being undercomplete while the best basis for the activations is overcomplete. But if both are true, it seems like the relationship to the data distribution has to be quite special (and potentially fragile).