Thanks, that’s a good insight. The graph representation of code is very different than automated decompiling like hex-rays in my opinion. I agree that graph representation is probably the most critical step towards a more high-level analysis and understanding. I am not sure why you claim it required decades of tools because since the dawn of computer-science turing-machines were described with graphs.
In any case this is an interesting point as it suggest we might want to focus on finding graph-like concepts which will be useful for describing the different states of a neural network computation, and later developing IDA-like tool :)
since we share similar backgrounds and aspiration feel free to reach out:
The graph representation of code is very different than automated decompiling like hex-rays in my opinion.
There are many different graph representations of code, some of them are crucial for automated decompiling, others probably aren’t. So I’m not sure which one you are referring to here. And in the end, the result of the decompilation process is a graph (either a tree like the AST of the C-Code, but I admit that it is kinda nitpicky to call that a “Graph representation”[0]), or more of a true graph like Binary Ninjas High Level Intermediate Language (if displayed as a CFG with blocks consisting of High Level Intermediate Language instructions)
I am not sure why you claim it required decades of tools
I meant this is the sense of “modern RE is implicitly built on a few decades of research and engineering”. Of course you could always use some kinds of graphs, but having good tools to do this for you massively accelerates the RE process. There is a reason IDA could just charge whatever they wanted before Ghidra and Binary Ninja came out: A good auto analysis and decompiler is a massive benefit that allows someone without much experience to do basic RE, and for an expert it still saves them a lot of time, because they can focus on higher level abstractions.
[0] Though you really start appreciating the difference in how those trees are internally represented, even if they look the same as Pseudo Code, when you want to work with them. E.g. the IDA C AST is a true AST that is useful for further transformations and analysis while the Ghidra Syntax Tree isn’t even an Abstract Syntax Tree because it still contains information like white space tokens. So you would have to reparse this to a C AST before you can do anything useful with it, and this is a whole other rant arising from a side project of mine...
Thanks, that’s a good insight. The graph representation of code is very different than automated decompiling like hex-rays in my opinion. I agree that graph representation is probably the most critical step towards a more high-level analysis and understanding. I am not sure why you claim it required decades of tools because since the dawn of computer-science turing-machines were described with graphs.
In any case this is an interesting point as it suggest we might want to focus on finding graph-like concepts which will be useful for describing the different states of a neural network computation, and later developing IDA-like tool :)
since we share similar backgrounds and aspiration feel free to reach out:
https://www.linkedin.com/in/itay-yona-b40a7756/
There are many different graph representations of code, some of them are crucial for automated decompiling, others probably aren’t. So I’m not sure which one you are referring to here. And in the end, the result of the decompilation process is a graph (either a tree like the AST of the C-Code, but I admit that it is kinda nitpicky to call that a “Graph representation”[0]), or more of a true graph like Binary Ninjas High Level Intermediate Language (if displayed as a CFG with blocks consisting of High Level Intermediate Language instructions)
I meant this is the sense of “modern RE is implicitly built on a few decades of research and engineering”. Of course you could always use some kinds of graphs, but having good tools to do this for you massively accelerates the RE process. There is a reason IDA could just charge whatever they wanted before Ghidra and Binary Ninja came out: A good auto analysis and decompiler is a massive benefit that allows someone without much experience to do basic RE, and for an expert it still saves them a lot of time, because they can focus on higher level abstractions.
[0] Though you really start appreciating the difference in how those trees are internally represented, even if they look the same as Pseudo Code, when you want to work with them. E.g. the IDA C AST is a true AST that is useful for further transformations and analysis while the Ghidra Syntax Tree isn’t even an Abstract Syntax Tree because it still contains information like white space tokens. So you would have to reparse this to a C AST before you can do anything useful with it, and this is a whole other rant arising from a side project of mine...