Caleb Biddulph comments on You can remove GPT2’s LayerNorm by fine-tuning for an hour

Caleb Biddulph 9 Aug 2024 0:44 UTC
3 points
0
This is great! Maybe you’d get better results if you “distill” GPT2-LN into GPT2-noLN by fine-tuning on the entire token probability distribution on OpenWebText.