Logan Zoellner comments on How useful is mechanistic interpretability?

Logan Zoellner 4 Dec 2023 14:29 UTC
3 points
1
“We reverse engineered every single circuit and can predict exactly what the model will do using our hand-crafted code” seems like it’s setting the bar way too high for MI.
Instead, aim for stuff like the AI Lie Detector, which both 1) works and 2) is obviously useful.
To do a Side-Channel Attack on a system, you don’t have to explain every detail of the system (or even know physics at all). You just have to find something in the system that is correlated with the thing you care about.
- ryan_greenblatt 4 Dec 2023 15:46 UTC
  6 points
  0
  Parent
  Notably, the ai lie detector work isn’t mech interp under the definition I provided.