It seems like retrieval-based transformers like RETRO is “obviously” the way to go—(1) there’s just no need to store all the factual information as fixed weights, (2) and it uses much less parameter/memory. Maybe mechanistic interpretability should start paying more attention to these type of architectures, especially since they’re probably going to be a more relevant form of architecture.
They might also be easier to interpret thanks to specialization!
It seems like retrieval-based transformers like RETRO is “obviously” the way to go—(1) there’s just no need to store all the factual information as fixed weights, (2) and it uses much less parameter/memory. Maybe mechanistic interpretability should start paying more attention to these type of architectures, especially since they’re probably going to be a more relevant form of architecture.
They might also be easier to interpret thanks to specialization!