Dalcy comments on Dalcy’s Shortform

Dalcy 1 Jan 2023 4:57 UTC
1 point
0
It seems like retrieval-based transformers like RETRO is “obviously” the way to go—(1) there’s just no need to store all the factual information as fixed weights, (2) and it uses much less parameter/memory. Maybe mechanistic interpretability should start paying more attention to these type of architectures, especially since they’re probably going to be a more relevant form of architecture.
They might also be easier to interpret thanks to specialization!