Thanks, that’s a good insight. The graph representation of code is very different than automated decompiling like hex-rays in my opinion. I agree that graph representation is probably the most critical step towards a more high-level analysis and understanding. I am not sure why you claim it required decades of tools because since the dawn of computer-science turing-machines were described with graphs.
In any case this is an interesting point as it suggest we might want to focus on finding graph-like concepts which will be useful for describing the different states of a neural network computation, and later developing IDA-like tool :)
since we share similar backgrounds and aspiration feel free to reach out:
The true rank is revealed because the output dimensionality is vocab_size, which is >> hidden_dim. It is unclear how to get something equivalent to that from the cortex. It is possible to record multiple neurons (population) and use dimensionality reduction (usually some sort of manifold learning) to learn the true dimensionality of the population. It is useful in some areas of the brain such as the hippocampal formation.