Adam Jermyn comments on Engineering Monosemanticity in Toy Models

Adam Jermyn 21 Nov 2022 20:14 UTC
2 points
0
Do you have results with noisy inputs?
Nope! Do you have predictions for what noise might do here?
The negative bias lines up well with previous sparse coding implementations: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=JHuo2D0AAAAJ&citation_for_view=JHuo2D0AAAAJ:u-x6o8ySG0sC
Oooo I’ll definitely take a look. This looks very relevant.
Note that in that research, the negative bias has a couple of meanings/implications:
- It should correspond to the noise level in your input channel.
- Higher negative biases directly contribute to the sparsity/monosemanticty of the network.
We don’t have any noise, but we do think that the bias is serving a de-interference role (because features get packed together in a space that’s not big enough to avoid interference).
Along those lines, you might be able to further improve monosemanticity by using the lasso loss function.
Can you say more about why? We know that L1 regularization on the activations (or weights) increases monosemanticity, but why do you think this would happen when done as part of the task loss?
- samshap 22 Nov 2022 1:35 UTC
  1 point
  0
  Parent
  My weak prediction is that adding low levels of noise would change the polysemantic activations, but not the monosemantic ones.
  
  Adding L1 to the loss allows the network to converge on solutions that are more monosemantic than otherwise, at the cost of some estimation error. Basically, the network is less likely to lean on polysemantic neurons to make up small errors. I think your best bet is to apply the L1 loss on the hidden layer and the output later activations.