Chris_Leong comments on You can remove GPT2’s LayerNorm by fine-tuning for an hour

Chris_Leong 9 Aug 2024 9:20 UTC
5 points
2
Fascinating. I would love to see follow up work on whether it does harm generalisation, because if we were able to train more interpretable models without damaging generalisation, that would be amazing.
I’d love to see other research along these lines. Like what if we could use interpretability to figure out what a circuit does, replace the circuit with something more principled/transparent, then train for a bit longer with the new circuit in place.