Nathan Helm-Burger comments on LLM Modularity: The Separability of Capabilities in Large Language Models

Nathan Helm-Burger 11 Apr 2023 21:15 UTC
2 points
1
This is great. My hunch is that modularity could be greatly improved with little loss of capabilities, if we used some sort of loss function which weakly prioritized modularity of skills during training.
I tried to do some experiments on this idea of separability of skills in transformers last year, but didn’t get very far. In part, because I was less thorough than you, in part because I was using smaller models, and trying for more entangled skills (toxic internet comments vs wikipedia entries).