Also: It seems like there would be an easier way to get this observation that this post makes, ie. directly showing that kV and V get mapped to the same point by layer norm (excluding the epsilon).
Don’t get me wrong, the circle is cool, but seems like it’s a bit of a detour.
Also: It seems like there would be an easier way to get this observation that this post makes, ie. directly showing that kV and V get mapped to the same point by layer norm (excluding the epsilon).
Don’t get me wrong, the circle is cool, but seems like it’s a bit of a detour.