I strongly agree that interpretability work should make more effort to test and report on alternative architectures. There are a number of close competitors to GPT-2 style transformers.
Even just trying it and reporting that your technique doesn’t work as well on alt arch would be valuable info for safety.
If you don’t know what I mean about alternative architectures, try googling some of these terms :
I strongly agree that interpretability work should make more effort to test and report on alternative architectures. There are a number of close competitors to GPT-2 style transformers.
Even just trying it and reporting that your technique doesn’t work as well on alt arch would be valuable info for safety.
If you don’t know what I mean about alternative architectures, try googling some of these terms :
state-space models (MAMBA, RWKV, Aaren) , nlp diffusion models (e.g. https://arxiv.org/html/2408.04220v1) , recursive looping models, reservoir computing models, next generation reservoir computing models, spiking neural nets, Komolgrov-Arnold networks, FunSearch