Just to check I understand this correctly: from what I can gather it seems that this shows that LayerNorm is monosemantic if your residual stream activation is just that direction. It doesn’t show that it is monosemantic for the purposes of doing vector addition where we want to stack multiple monosemantic directions at once. That is, if you want to represent other dimensions as well, these might push the LayerNormed vector into a different spline. Am I correct here?
That said, maybe we can model the other dimensions as random jostling in such as way that it all cancels out if a lot of dimensions are activated?
Yeah I think we have the same understanding here (in hindsight I should have made this more explicit in the post / title).
I would be excited to see someone empirically try to answer the question you mention at the end. In particular, given some direction u and a LayerNormed vector v, one might try to quantify how smoothly rotating from v towards u changes the output of the MLP layer. This seems like a good test of whether the Polytope Lens is helpful / necessary for understanding the MLPs of Transformers (with smooth changes corresponding to your ‘random jostling cancels out’ corresponding to not needing to worry about Polytope Lens style issues).
Also: It seems like there would be an easier way to get this observation that this post makes, ie. directly showing that kV and V get mapped to the same point by layer norm (excluding the epsilon).
Don’t get me wrong, the circle is cool, but seems like it’s a bit of a detour.
Just to check I understand this correctly: from what I can gather it seems that this shows that LayerNorm is monosemantic if your residual stream activation is just that direction. It doesn’t show that it is monosemantic for the purposes of doing vector addition where we want to stack multiple monosemantic directions at once. That is, if you want to represent other dimensions as well, these might push the LayerNormed vector into a different spline. Am I correct here?
That said, maybe we can model the other dimensions as random jostling in such as way that it all cancels out if a lot of dimensions are activated?
Yeah I think we have the same understanding here (in hindsight I should have made this more explicit in the post / title).
I would be excited to see someone empirically try to answer the question you mention at the end. In particular, given some direction u and a LayerNormed vector v, one might try to quantify how smoothly rotating from v towards u changes the output of the MLP layer. This seems like a good test of whether the Polytope Lens is helpful / necessary for understanding the MLPs of Transformers (with smooth changes corresponding to your ‘random jostling cancels out’ corresponding to not needing to worry about Polytope Lens style issues).
Also: It seems like there would be an easier way to get this observation that this post makes, ie. directly showing that kV and V get mapped to the same point by layer norm (excluding the epsilon).
Don’t get me wrong, the circle is cool, but seems like it’s a bit of a detour.