“We reverse engineered every single circuit and can predict exactly what the model will do using our hand-crafted code” seems like it’s setting the bar way too high for MI.
Instead, aim for stuff like the AI Lie Detector, which both 1) works and 2) is obviously useful.
To do a Side-Channel Attack on a system, you don’t have to explain every detail of the system (or even know physics at all). You just have to find something in the system that is correlated with the thing you care about.
“We reverse engineered every single circuit and can predict exactly what the model will do using our hand-crafted code” seems like it’s setting the bar way too high for MI.
Instead, aim for stuff like the AI Lie Detector, which both 1) works and 2) is obviously useful.
To do a Side-Channel Attack on a system, you don’t have to explain every detail of the system (or even know physics at all). You just have to find something in the system that is correlated with the thing you care about.
Notably, the ai lie detector work isn’t mech interp under the definition I provided.