RSS

cmathw

Karma: 76

Gated At­ten­tion Blocks: Pre­limi­nary Progress to­ward Re­mov­ing At­ten­tion Head Superposition

8 Apr 2024 11:14 UTC
37 points
4 comments15 min readLW link

Poly­se­man­tic At­ten­tion Head in a 4-Layer Transformer

9 Nov 2023 16:16 UTC
51 points
0 comments6 min readLW link