MiguelDev comments on Robustness of Model-Graded Evaluations and Automated Interpretability

MiguelDev 15 Jul 2023 23:29 UTC
2 points
0
Explainer Model

I was trying to decode from OpenAI’s code how did they derive the neurons used in the tools and it kept pointing me to an Azure server that holds the information. I wished they also shared how they were able to extract the neurons.
- Simon Lermen 16 Jul 2023 2:37 UTC
  2 points
  0
  Parent
  Their approach would be a lot more transparent if they’d actually have a demo where you forward data and measure the activations for a model of your choice. Instead, they only have this activation dump on Azure. That being said, they have 6 open issues on their repo since May. The notebook demos don’t work at all.
  https://github.com/openai/automated-interpretability/issues/8
  - MiguelDev 16 Jul 2023 2:41 UTC
    1 point
    0
    Parent
    I was trying to reconstruct the neuronal locations, ended up analysing activations instead. My current stand is that the whole automated interpretability code/folder OpenAI shared seems unuseable if we can’t reverse engineer what they claim as neuronal locations. I mean the models are open-sourced, we can technically reverse engineer if it if they just provided the code.