Daniel Tan comments on You can remove GPT2’s LayerNorm by fine-tuning for an hour