Context: I have been working in IT Security with reverse engineering as my main interest for nearly 7 years, been interested in it for ~8.5 years, and have been working on automated reverse engineering at a research institute for 3 years. I have been vaguely interested in mechanistic interpretability for the last months and am strongly considering switching to it as a field of research, depending on how much of my RE skillset/experience transfers.
I think one topic that the article/viewpoint/Itay Yona are missing is how much of modern RE is only feasible because we have a few decades worth of tooling taking care of the aspects that are already possible to automate. The main industry tools IDA/Ghidra/Binary Ninja all have a roughly similar processing pipeline that consists of various steps. This is typically just called “auto analysis”, and this can easily take a few hours for larger binaries, sometimes even more. I think the lessons and approaches from automated RE are much more important than the ones from manual RE, but to some extent this is personal bias, because I like automated RE a lot. But in any case manual, RE would not be feasible without some underlying automated RE that takes care of as much work as possible, to allow the human to focus on what is relevant, in a form that is as close to their level of abstraction as possible, while also being transparent, i.e. make it easily possible to switch to the underling levels of abstraction again, or even show them side by side.
Much of the work is generating various graph representations of the program, e.g. the Control Flow Graph, Data Flow Graphs, analysis based on Abstract Interpretation over some domain (e.g. types) and generating Abstract Syntax Trees that are more similar to C code in terms of the level of abstraction over the actual assembly, which makes it far easier for a human to quickly understand them. Those are then as the basis for further analysis to recover more high level constructs, and to aggregate and propagate information from various parts of the program.
I have the impression that some intermediate step/layer in this process might be a similar level of abstraction to “NNs as computational graphs”, and that the steps that follow in (automated RE) might be fairly applicable to mechanistic interpretability, while the steps before this layer are far less directly useful for NNs.
Those next steps are then roughly to find patterns inside this graph that can be replaced with larger conceptual nodes, e.g. a complicated looking piece of instructions might simply be in-lined and highly optimized strcpy
, i.e. copies a string from one memory location to another. The nice thing about software RE is that compilers will typically not inline functions to save space, but a NN will most likely not give us that luxury. “Induction Heads” might be the first step in exactly that direction, but I have to admit that I haven’t had the time/energy to really look into this, so I can’t confidently say this, and I might just be interpreting them in the mental framework I am already used to.
Overall, focusing on tools and automated RE is already a niche inside the somewhat niche field of RE, but I think it’s crucial for mechanistic interpretability because it has to be scalable. A human looking small toy models is only interesting because we hope that larger models exhibit the same patterns/structure, that we can then automatically recover, to find higher level patterns that only emerge in a larger model, which can only be spotted as a pattern in the intermediate abstraction.
There are many different graph representations of code, some of them are crucial for automated decompiling, others probably aren’t. So I’m not sure which one you are referring to here. And in the end, the result of the decompilation process is a graph (either a tree like the AST of the C-Code, but I admit that it is kinda nitpicky to call that a “Graph representation”[0]), or more of a true graph like Binary Ninjas High Level Intermediate Language (if displayed as a CFG with blocks consisting of High Level Intermediate Language instructions)
I meant this is the sense of “modern RE is implicitly built on a few decades of research and engineering”. Of course you could always use some kinds of graphs, but having good tools to do this for you massively accelerates the RE process. There is a reason IDA could just charge whatever they wanted before Ghidra and Binary Ninja came out: A good auto analysis and decompiler is a massive benefit that allows someone without much experience to do basic RE, and for an expert it still saves them a lot of time, because they can focus on higher level abstractions.
[0] Though you really start appreciating the difference in how those trees are internally represented, even if they look the same as Pseudo Code, when you want to work with them. E.g. the IDA C AST is a true AST that is useful for further transformations and analysis while the Ghidra Syntax Tree isn’t even an Abstract Syntax Tree because it still contains information like white space tokens. So you would have to reparse this to a C AST before you can do anything useful with it, and this is a whole other rant arising from a side project of mine...