Suddenly it looks like a much smaller shell! But really there is only one gaussian (high dimensional gaussians have their mass concentrated almost entirely in a thin shell), centered around some point away from the origin. There is no weird torus.
Thanks for the elucidation! This is really helpful and interesting, but I’m still left somewhat confused.
Your concise demonstration immediately convinced me that any Gaussian distributed around a point some distance from the origin in high-dimensional Euclidean space would have the property I observed in the distribution of GPT-J embeddings, i.e. their norms will be normally distributed in a tight band, while their distances-from-centroid will also be normally distributed in a (smaller) tight band. So I can concede that this has nothing to do with where the token embeddings ended up as a result of training GPT-J (as I had imagined) and is instead a general feature of Gaussian distributions in high dimensions.
However, I’m puzzled by “Suddenly it looks like a much smaller shell!”
Don’t these histograms unequivocally indicate the existence of two separate shells with different centres and radii, both of which contain the vast bulk of the points in the distribution? Yes, there’s only one distribution of points, but it still seems like it’s almost entirely contained in the intersection of a pair of distinct hyperspherical shells.
The distribution is in an infinite number of hyperspherical shells. There was nothing special about the first shell being centered at the origin. The same phenomenon would appear when measuring the distance from any point. High-dimensional space is weird.
This mockup conveys actively incorrect spatial intuitions. The observed radii are exactly what you’d expect if there’s only a single gaussian.
Let’s say we look at a 1000d gaussian centered around some point away from the origin:
We get what appears to be a shell at radius 100.
Then, we plot the distribution of distances to the centroid:
Suddenly it looks like a much smaller shell! But really there is only one gaussian (high dimensional gaussians have their mass concentrated almost entirely in a thin shell), centered around some point away from the origin. There is no weird torus.
Why would it not be centered at the origin?
Thanks for the elucidation! This is really helpful and interesting, but I’m still left somewhat confused.
Your concise demonstration immediately convinced me that any Gaussian distributed around a point some distance from the origin in high-dimensional Euclidean space would have the property I observed in the distribution of GPT-J embeddings, i.e. their norms will be normally distributed in a tight band, while their distances-from-centroid will also be normally distributed in a (smaller) tight band. So I can concede that this has nothing to do with where the token embeddings ended up as a result of training GPT-J (as I had imagined) and is instead a general feature of Gaussian distributions in high dimensions.
However, I’m puzzled by “Suddenly it looks like a much smaller shell!”
Don’t these histograms unequivocally indicate the existence of two separate shells with different centres and radii, both of which contain the vast bulk of the points in the distribution? Yes, there’s only one distribution of points, but it still seems like it’s almost entirely contained in the intersection of a pair of distinct hyperspherical shells.
The distribution is in an infinite number of hyperspherical shells. There was nothing special about the first shell being centered at the origin. The same phenomenon would appear when measuring the distance from any point. High-dimensional space is weird.
Thanks! I’m starting to get the picture (insofar as that’s possible).