Did you try getting the centroid of all words, rather than all tokens? The set of token will contain a lot of nonsense fragments.
Current theme: default
Less Wrong (text)
Less Wrong (link)
Did you try getting the centroid of all words, rather than all tokens? The set of token will contain a lot of nonsense fragments.