Chris_Leong comments on Because of LayerNorm, Directions in GPT-2 MLP Layers are Monosemantic

Chris_Leong 28 Jan 2024 7:28 UTC
3 points
1
Just to check I understand this correctly: from what I can gather it seems that this shows that LayerNorm is monosemantic if your residual stream activation is just that direction. It doesn’t show that it is monosemantic for the purposes of doing vector addition where we want to stack multiple monosemantic directions at once. That is, if you want to represent other dimensions as well, these might push the LayerNormed vector into a different spline. Am I correct here?

That said, maybe we can model the other dimensions as random jostling in such as way that it all cancels out if a lot of dimensions are activated?
- ojorgensen 28 Jan 2024 19:29 UTC
  1 point
  0
  Parent
  Yeah I think we have the same understanding here (in hindsight I should have made this more explicit in the post / title).
  
  I would be excited to see someone empirically try to answer the question you mention at the end. In particular, given some direction $u$ and a LayerNormed vector $v$ , one might try to quantify how smoothly rotating from $v$ towards $u$ changes the output of the MLP layer. This seems like a good test of whether the Polytope Lens is helpful / necessary for understanding the MLPs of Transformers (with smooth changes corresponding to your ‘random jostling cancels out’ corresponding to not needing to worry about Polytope Lens style issues).
  - Chris_Leong 29 Jan 2024 1:51 UTC
    2 points
    0
    Parent
    Also: It seems like there would be an easier way to get this observation that this post makes, ie. directly showing that kV and V get mapped to the same point by layer norm (excluding the epsilon).
    
    Don’t get me wrong, the circle is cool, but seems like it’s a bit of a detour.