Current NN matrices are dense and continuous weighted. A significant part of the difficulty of interpretability is that they have all to all connections; it is difficult to verify that one activation does or does not affect another activation.
However we can quantize the weights to 3 bit and then we can probably melt the whole thing into pure combinational logic. While I am not entirely confident that this form is strictly better from an interpretability perspective, it is differently difficult.
“Giant inscrutable matrices” are probably not the final form of current NNs, we can potentially turn them into different and nicer form.
I’ve been working on pure combinational logic LLMs for the past few years, and have a (fairly small) byte level pure combinational logic FSM RNN language model quantized to And Inverter Gate form. I’m currently building the tooling to simplify the logic DAG and analyze it.
Are you, or others, interested in talking with me about it?