Nathan Helm-Burger comments on Sparse trinary weighted RNNs as a path to better language model interpretability

Nathan Helm-Burger 19 Sep 2022 3:24 UTC
4 points
0
I agree that I think you would need a LOT more weights. Kind of a ridiculous seeming amount perhaps, like maybe 10000x or more. But I actually think that’s a potential strength. I think that reducing super-position and having a very sparse wide network with only a small portion of that network active at any one time could actually be made to be both compute efficient and interpretable. If each of those sparse weights does fewer things, then it becomes much easier to label those specific things, and to see what logic went into any given decision.
As for whether it’s computationally tractable… There’s good reason to think that that’s possible. The brain is basically a very wide sparse net that’s quite computationally efficient. Here’s a recent interview from Yannic Kilcher on the subject:
My view is slightly different, in that I don’t think we should prune down the networks and leave them pruned. I think we want absurdly huge networks with clear labels. I’m currently imagining something that’s like a mixture of experts implemented in this giant wide network, but the experts have significant overlap with each other. So maybe creating this with a series of learn-prune-learn-prune-learn to build up an increasing complex very sparse space.
If we can get the unwanted cognition/behaviors to sit entirely in their own section of weights, we can then ablate the unwanted behaviors without losing wanted capability. That’s my hope anyway.
- Am8ryllis 19 Sep 2022 17:30 UTC
  1 point
  0
  Parent
  I agree that reducing superposition is probably valuable even if it requires a significantly larger network. I still don’t understand why the transition from float to binary would cause a dramatic reduction in superposition capacity. But if it does prevent superposition, great! I’ll just give it more parameters as needed. But if we still get superposition, I will need to apply other techniques to make it stop.
  
  (I have not yet finished my closer re-read of Toy Models of Superposition after my initial skimming. Perhaps once I do I will understand better.)
  Hopefully in a few months I will have empirical data regarding how much more neurons we need. Then I can stop hand waving about vague intuitions.
  If we can get the unwanted cognition/behaviors to sit entirely in their own section of weights, we can then ablate the unwanted behaviors without losing wanted capability. That’s my hope anyway.
  My thoughts and hope as well.