Half-baked, possibly off-topic: I wonder if there’s some data-collection that can be used to train out polysemi from a model by fine-tuning.
e.g.:
Show 3 examples (just like in this game), and have the user pick the odd-one-out
User can say “they are all the same”, if so, remove one at random, and replace with a new example
Tag the (neuron, positive example) pairs with (numerical value) label 1, the odd-one-out with 0
Fine-tune with next-word-prediction and an auxilliary loss using this new collected dataset
Can probably use some automated (e.g. semantic similarity) labelling method to cluster labelled+unlabelled instances, to increase the size of the dataset
Neuronpedia interface/codebase could probably be forked to do this kind of data collection very easily.
I love these variations on the game. Yes, the idea is to build a scalable backend and then we can build various different games on top of it. Initially there was no game and it was just browsing random neurons manually! Then the game was built on top of it. Internally I call the current Neuronpedia game “Vote Mode”.
Would love to have you on our Discord even if just to lurk and occasionally pitch an idea. Thanks for playing!
Very cool. Thanks for putting this together.
Half-baked, possibly off-topic: I wonder if there’s some data-collection that can be used to train out polysemi from a model by fine-tuning.
e.g.:
Show 3 examples (just like in this game), and have the user pick the odd-one-out
User can say “they are all the same”, if so, remove one at random, and replace with a new example
Tag the (neuron, positive example) pairs with (numerical value) label 1, the odd-one-out with 0
Fine-tune with next-word-prediction and an auxilliary loss using this new collected dataset
Can probably use some automated (e.g. semantic similarity) labelling method to cluster labelled+unlabelled instances, to increase the size of the dataset
Neuronpedia interface/codebase could probably be forked to do this kind of data collection very easily.
I love these variations on the game. Yes, the idea is to build a scalable backend and then we can build various different games on top of it. Initially there was no game and it was just browsing random neurons manually! Then the game was built on top of it. Internally I call the current Neuronpedia game “Vote Mode”.
Would love to have you on our Discord even if just to lurk and occasionally pitch an idea. Thanks for playing!