I was trying to reconstruct the neuronal locations, ended up analysing activations instead. My current stand is that the whole automated interpretability code/folder OpenAI shared seems unuseable if we can’t reverse engineer what they claim as neuronal locations. I mean the models are open-sourced, we can technically reverse engineer if it if they just provided the code.
I was trying to reconstruct the neuronal locations, ended up analysing activations instead. My current stand is that the whole automated interpretability code/folder OpenAI shared seems unuseable if we can’t reverse engineer what they claim as neuronal locations. I mean the models are open-sourced, we can technically reverse engineer if it if they just provided the code.