This is a post to keep track my research workflow of studying LLM. Since I am doing it on my spare time, I want to keep my pipeline as simple as possible.
Step 1: Formulate a question for investigating model’s behavior .
Step 2: Find the influential layer for the behavior
My current workflow to study the internal mechanisms of LLM
This is a post to keep track my research workflow of studying LLM. Since I am doing it on my spare time, I want to keep my pipeline as simple as possible.
Step 1: Formulate a question for investigating model’s behavior .
Step 2: Find the influential layer for the behavior
Output across layers
https://github.com/jalammar/ecco
Activation patching (Rome)
Notebook examples:
https://colab.research.google.com/drive/1uFui2i40eU0G9kvbCNTFMgXFHSB7lL9i
https://colab.research.google.com/github/UFO-101/an-neuron/blob/main/an_neuron_investigation.ipynb
Step 3: Locate the influential neuron
activation patching for individual neurons
Use Neuroscope to see the behavior of the neurons https://neuroscope.io/
Step 4: Visualize the neuron activation
Interactive Neuroscope
https://colab.research.google.com/github/neelnanda-io/TransformerLens/blob/main/demos/Interactive_Neuroscope.ipynb#scrollTo=Aa74dGVpF8lD
Reference:
We Found An Neuron in GPT-2
Interfaces for Explaining Transformer Language Models
200 COP in MI: Studying Learned Features in Language Models