RogerDearnaley comments on Finding Neurons in a Haystack: Case Studies with Sparse Probing

RogerDearnaley 5 May 2023 23:24 UTC
1 point
0
A fascinating paper.
An interesting research direction for this would be to perform a sufficient number of case studies to form a significant sized dataset, and further ensure that the approaches involved provide good coverage of the possibilities, and then attempt to few short-learn and/or fine-tune the ability for an autonomous agent/cognitive architecture powered by LLMs to reproduce the results of individual case studies, i.e. to attempt automate this form of mechanistic interpretability, given a suitable labeled input set, or a reliable means of labeling them.
It would also be interesting to be able to go the other way: take specific randomly selected neuron, look at it’s activation patterns across the entire corpus, and figure out whether it’s a monosematic neuron, and if so for what, or else look at its activation correlations with other neurons in the same layer and determine which superpostions for which k-values it forms part of and what they each represent. Using an LLM, or semantic search, to look at a large set of high-activation contexts and trying to come up with plausible descriptions for it might be quite helpful here.
- Roger Dearnaley 10 May 2023 6:40 UTC
  1 point
  0
  Parent
  For my second paragraph above: in a blog post out today, it turns out this is not only feasible, but OpenAI have experimented with doing it, and have now open-sourced the technology for doing it:
  https://openai.com/research/language-models-can-explain-neurons-in-language-models
  Open AI were only looking at explaining single neurons, so combining their approach with the original paper’s sparse probing technique for superpositions seems like the obvious next step.