MiguelDev comments on Robustness of Model-Graded Evaluations and Automated Interpretability

MiguelDev 16 Jul 2023 2:41 UTC
1 point
0
I was trying to reconstruct the neuronal locations, ended up analysing activations instead. My current stand is that the whole automated interpretability code/folder OpenAI shared seems unuseable if we can’t reverse engineer what they claim as neuronal locations. I mean the models are open-sourced, we can technically reverse engineer if it if they just provided the code.