I think logic gate networks are not substantially more interpretable than neural networks, simply because of their size. Both are complex networks with millions of nodes. Interpretability approaches have to work at a higher level of abstraction in either case.
Regarding language models: The original paper presents a simple feedforward network. The follow-up paper, by mostly the same authors, came out a few months ago. It extends DLGNs to convolutions, analogous to CNNs. Which means they have not yet been extended to even more complex architectures like transformers. So language models are not yet possible, even ignoring the training compute cost.
In the follow-up paper they also discuss various efficiency improvements, not directly related to convolutions, which they made since the original paper. Which speeds up training compared to the original paper, and enables much deeper networks, as the original implementation was limited to around six layers. But they don’t discuss how much slower the training still is compared to neural networks. Though the inference speed-up is extreme. They report improvements of up to 160x for one benchmark and up to 1900x on another. Over the previously fastest neural networks, for equivalent accuracy. In another benchmark they report models being 29x to 56x smaller (in terms of required logic gates) than the previously smallest models with similar accuracy. So the models could more realistically be implemented as an ASIC, which would probably lead to another order of magnitude in inference speed improvement.
But again, they don’t really talk about how much slower they are to train than neural networks, which is likely crucial for whether they will be employed in future frontier LLMs, assuming they will be extended to transformers. So far frontier AI seems to be much more limited by training compute than by inference compute.
I think logic gate networks are not substantially more interpretable than neural networks, simply because of their size. Both are complex networks with millions of nodes. Interpretability approaches have to work at a higher level of abstraction in either case.
Regarding language models: The original paper presents a simple feedforward network. The follow-up paper, by mostly the same authors, came out a few months ago. It extends DLGNs to convolutions, analogous to CNNs. Which means they have not yet been extended to even more complex architectures like transformers. So language models are not yet possible, even ignoring the training compute cost.
In the follow-up paper they also discuss various efficiency improvements, not directly related to convolutions, which they made since the original paper. Which speeds up training compared to the original paper, and enables much deeper networks, as the original implementation was limited to around six layers. But they don’t discuss how much slower the training still is compared to neural networks. Though the inference speed-up is extreme. They report improvements of up to 160x for one benchmark and up to 1900x on another. Over the previously fastest neural networks, for equivalent accuracy. In another benchmark they report models being 29x to 56x smaller (in terms of required logic gates) than the previously smallest models with similar accuracy. So the models could more realistically be implemented as an ASIC, which would probably lead to another order of magnitude in inference speed improvement.
But again, they don’t really talk about how much slower they are to train than neural networks, which is likely crucial for whether they will be employed in future frontier LLMs, assuming they will be extended to transformers. So far frontier AI seems to be much more limited by training compute than by inference compute.