Sorry for the late answer! I agree with your assessment of the TMS paper. In our case, the L1 regularization is strong enough that the encodings do completely align with the canonical basis: in the experiments that gave the “Polysemantic neurons vs hidden neurons” graph, we observe that all weights are either 0 or close to 1 or −1. And I think that all solutions which minimize the loss (with L1-regularization included) align with the canonical basis.
Sorry for the late answer! I agree with your assessment of the TMS paper. In our case, the L1 regularization is strong enough that the encodings do completely align with the canonical basis: in the experiments that gave the “Polysemantic neurons vs hidden neurons” graph, we observe that all weights are either 0 or close to 1 or −1. And I think that all solutions which minimize the loss (with L1-regularization included) align with the canonical basis.