Simon Lermen comments on Robustness of Model-Graded Evaluations and Automated Interpretability

Simon Lermen 16 Jul 2023 2:37 UTC
2 points
0
Their approach would be a lot more transparent if they’d actually have a demo where you forward data and measure the activations for a model of your choice. Instead, they only have this activation dump on Azure. That being said, they have 6 open issues on their repo since May. The notebook demos don’t work at all.
https://github.com/openai/automated-interpretability/issues/8
- MiguelDev 16 Jul 2023 2:41 UTC
  1 point
  0
  Parent
  I was trying to reconstruct the neuronal locations, ended up analysing activations instead. My current stand is that the whole automated interpretability code/folder OpenAI shared seems unuseable if we can’t reverse engineer what they claim as neuronal locations. I mean the models are open-sourced, we can technically reverse engineer if it if they just provided the code.