sudhanshu_kasewa comments on Neuronpedia

sudhanshu_kasewa 28 Jul 2023 11:35 UTC
2 points
Very cool. Thanks for putting this together.
Half-baked, possibly off-topic: I wonder if there’s some data-collection that can be used to train out polysemi from a model by fine-tuning.
e.g.:
- Show 3 examples (just like in this game), and have the user pick the odd-one-out
  - User can say “they are all the same”, if so, remove one at random, and replace with a new example
- Tag the (neuron, positive example) pairs with (numerical value) label 1, the odd-one-out with 0
- Fine-tune with next-word-prediction and an auxilliary loss using this new collected dataset
  - Can probably use some automated (e.g. semantic similarity) labelling method to cluster labelled+unlabelled instances, to increase the size of the dataset
Neuronpedia interface/codebase could probably be forked to do this kind of data collection very easily.
- Johnny Lin 28 Jul 2023 17:57 UTC
  2 points
  Parent
  I love these variations on the game. Yes, the idea is to build a scalable backend and then we can build various different games on top of it. Initially there was no game and it was just browsing random neurons manually! Then the game was built on top of it. Internally I call the current Neuronpedia game “Vote Mode”.
  Would love to have you on our Discord even if just to lurk and occasionally pitch an idea. Thanks for playing!