Daniel Tan comments on You can remove GPT2’s LayerNorm by fine-tuning for an hour

Daniel Tan 12 Aug 2024 9:03 UTC
1 point
0
Interesting stuff! I’m very curious as to whether removing layer norm damages the model in some measurable way.
One thing that comes to mind is that previous work finds that the final LN is responsible for mediating ‘confidence’ through ‘entropy neurons’; if you’ve trained sufficiently I would expect all of these neurons to not be present anymore, which then raises the question of whether the model still exhibits this kind of self-confidence-regulation