Thanks for the great writeup.
One point of confusion for me about the fundamentals, what does it mean for ‘GPT-J token embeddings’ to ‘inhabit a zone’ ?
e.g. Is it a discrete point?, a probability cloud?, ‘smeared’ somehow across multiple dimensions?, etc...
The intended meaning was that the set of points in embedding space corresponding to the 50257 tokens are contained in a particular volume of space (the intersection of two hyperspherical shells).
Thanks for the great writeup.
One point of confusion for me about the fundamentals, what does it mean for ‘GPT-J token embeddings’ to ‘inhabit a zone’ ?
e.g. Is it a discrete point?, a probability cloud?, ‘smeared’ somehow across multiple dimensions?, etc...
The intended meaning was that the set of points in embedding space corresponding to the 50257 tokens are contained in a particular volume of space (the intersection of two hyperspherical shells).