Jan_Kulveit comments on Mapping the semantic void: Strange goings-on in GPT embedding spaces

Jan_Kulveit 15 Dec 2023 5:30 UTC
9 points
0
Speculative guess about the semantic richness: the embeddings at distances like 5-10 are typical to concepts which are usually represented by multi token strings. E.g. “spotted salamander” is 5 tokens.
- gwern 15 Dec 2023 14:48 UTC
  2 points
  0
  Parent
  If one manufactured extreme embeddings at 5-10 distance, would they decode to completions of tokens which implied a long phrase beforehand?