Did you try getting the centroid of all words, rather than all tokens? The set of token will contain a lot of nonsense fragments.
Did you try getting the centroid of all words, rather than all tokens? The set of token will contain a lot of nonsense fragments.