One reason the neuron is congruent with multiple of the same tokens may be because those token embeddings are similar (you can test this by checking their cosine similarities).
Yup! I think that’d be quite interesting. Is there any work on characterizing the embedding space of GPT2?
Adam Scherlis did some preliminary exploration here:https://www.lesswrong.com/posts/BMghmAxYxeSdAteDc/an-exploration-of-gpt-2-s-embedding-weightsHere’s a more thorough investigation of the overall shape of said embeddings with interactive figures:https://bert-vs-gpt2.dbvis.de/
There’s also a lot of academic work on the geometry of LM embeddings, e.g.:
https://openreview.net/forum?id=xYGNO86OWDH (BERT, ERNIE)
https://arxiv.org/abs/2209.02535 (GPT-2-medium)
(Plus a mountain more on earlier text/token embeddings like Word2Vec.)
https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation is related to the embedding space
One reason the neuron is congruent with multiple of the same tokens may be because those token embeddings are similar (you can test this by checking their cosine similarities).
Yup! I think that’d be quite interesting. Is there any work on characterizing the embedding space of GPT2?
Adam Scherlis did some preliminary exploration here:
https://www.lesswrong.com/posts/BMghmAxYxeSdAteDc/an-exploration-of-gpt-2-s-embedding-weights
Here’s a more thorough investigation of the overall shape of said embeddings with interactive figures:
https://bert-vs-gpt2.dbvis.de/
There’s also a lot of academic work on the geometry of LM embeddings, e.g.:
https://openreview.net/forum?id=xYGNO86OWDH (BERT, ERNIE)
https://arxiv.org/abs/2209.02535 (GPT-2-medium)
(Plus a mountain more on earlier text/token embeddings like Word2Vec.)
https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation is related to the embedding space