Oooo I’ll definitely take a look. This looks very relevant.
Note that in that research, the negative bias has a couple of meanings/implications:
It should correspond to the noise level in your input channel.
Higher negative biases directly contribute to the sparsity/monosemanticty of the network.
We don’t have any noise, but we do think that the bias is serving a de-interference role (because features get packed together in a space that’s not big enough to avoid interference).
Along those lines, you might be able to further improve monosemanticity by using the lasso loss function.
Can you say more about why? We know that L1 regularization on the activations (or weights) increases monosemanticity, but why do you think this would happen when done as part of the task loss?
My weak prediction is that adding low levels of noise would change the polysemantic activations, but not the monosemantic ones.
Adding L1 to the loss allows the network to converge on solutions that are more monosemantic than otherwise, at the cost of some estimation error. Basically, the network is less likely to lean on polysemantic neurons to make up small errors. I think your best bet is to apply the L1 loss on the hidden layer and the output later activations.
Nope! Do you have predictions for what noise might do here?
Oooo I’ll definitely take a look. This looks very relevant.
We don’t have any noise, but we do think that the bias is serving a de-interference role (because features get packed together in a space that’s not big enough to avoid interference).
Can you say more about why? We know that L1 regularization on the activations (or weights) increases monosemanticity, but why do you think this would happen when done as part of the task loss?
My weak prediction is that adding low levels of noise would change the polysemantic activations, but not the monosemantic ones.
Adding L1 to the loss allows the network to converge on solutions that are more monosemantic than otherwise, at the cost of some estimation error. Basically, the network is less likely to lean on polysemantic neurons to make up small errors. I think your best bet is to apply the L1 loss on the hidden layer and the output later activations.