projection of “▁queen” on span( “▁king”, “▁man”, “▁woman”) is [‘▁king’, ‘▁King’, ‘▁woman’, ‘▁queen’, ‘▁rey’, ‘▁Queen’, ‘peror’, ‘▁prince’, ‘▁roi’, ‘▁König’]
“▁queen” is the closest match only if you exclude any version of king and woman. But this seems to be only because “▁queen” is already the 2:nd closes match for “▁king”. Involving “▁man” and “▁woman” is only making things worse.
I then tried looking up exactly what the word2vec result is, and I’m still not sure.
Wikipedia sites Mikolov et al. (2013). This paper is for embeddings from RNN language models, not word2vec, which is ok for my purposes, because I’m also not using word2vec. More problematic is that I don’t know how to interpret how strong their results are. I think the relevant result is this
We see that the RNN vectors capture significantly more syntactic regularity than the LSA vectors, and do remarkably well in an absolute sense, answering more than one in three questions correctly.
which don’t seem very strong. Also I can’t find any explanation of what LSA is.
I also found this other paper which is about word2vec embeddings and have this promising figure
But the caption is just a citation to this third paper, which don’t have that figure!
I’ve not yet read the two last papers in detail, and I’m not sure if or when I’ll get back to this investigation.
If someone knows more about exactly what the word2vec embedding results are, please tell me.
I have two hypothesises for what is going on. I’m leaning towards 1, but very unsure.
1)
king—man + woman = queen
is true for word2vec embeddings but not in LLaMa2 7B embeddings because word2vec has much fewer embedding dimensions.
LLaMa2 7B has 4096 embedding dimensions.
This paper uses a variety of word2vec with 50, 150 and 300 embedding dimensions.
Possibly when you have thousands of embedding dimensions, these dimensions will encode lots of different connotations of these words. These connotations will probably not line up with the simple relation [king—man + woman = queen], and therefore we get [king—man + woman ≠ queen] for high dimensional embeddings.
2)
king—man + woman = queen
Isn’t true for word2vec either. If you do it with word2vec embeddings you get more or less the same result I did with LLaMa2 7B.
(As I’m writing this, I’m realising that just getting my hands on some word2vec embeddings and testing this for myself, seems much easier than to decode what the papers I found is actually saying.)
“▁king”—“▁man” + “▁woman” ≠ “▁queen” (for LLaMa2 7B token embeddings)
I tired to replicate the famous “king”—“man” + “woman” = “queen” result from word2vec using LLaMa2 token embeddings. To my surprise it dit not work.
I.e, if I look for the token with biggest cosine similarity to “▁king”—“▁man” + “▁woman” it is not “▁queen”.
Top ten cosine similarly for
“▁king”—“▁man” + “▁woman”
is [‘▁king’, ‘▁woman’, ‘▁King’, ‘▁queen’, ‘▁women’, ‘▁Woman’, ‘▁Queen’, ‘▁rey’, ‘▁roi’, ‘peror’]
“▁king” + “▁woman”
is [‘▁king’, ‘▁woman’, ‘▁King’, ‘▁Woman’, ‘▁women’, ‘▁queen’, ‘▁man’, ‘▁girl’, ‘▁lady’, ‘▁mother’]
“▁king”
is [‘▁king’, ‘▁King’, ‘▁queen’, ‘▁rey’, ‘peror’, ‘▁roi’, ‘▁prince’, ‘▁Kings’, ‘▁Queen’, ‘▁König’]
“▁woman”
is [‘▁woman’, ‘▁Woman’, ‘▁women’, ‘▁man’, ‘▁girl’, ‘▁mujer’, ‘▁lady’, ‘▁Women’, ‘oman’, ‘▁female’]
projection of “▁queen” on span( “▁king”, “▁man”, “▁woman”)
is [‘▁king’, ‘▁King’, ‘▁woman’, ‘▁queen’, ‘▁rey’, ‘▁Queen’, ‘peror’, ‘▁prince’, ‘▁roi’, ‘▁König’]
“▁queen” is the closest match only if you exclude any version of king and woman. But this seems to be only because “▁queen” is already the 2:nd closes match for “▁king”. Involving “▁man” and “▁woman” is only making things worse.
I then tried looking up exactly what the word2vec result is, and I’m still not sure.
Wikipedia sites Mikolov et al. (2013). This paper is for embeddings from RNN language models, not word2vec, which is ok for my purposes, because I’m also not using word2vec. More problematic is that I don’t know how to interpret how strong their results are. I think the relevant result is this
which don’t seem very strong. Also I can’t find any explanation of what LSA is.
I also found this other paper which is about word2vec embeddings and have this promising figure
But the caption is just a citation to this third paper, which don’t have that figure!
I’ve not yet read the two last papers in detail, and I’m not sure if or when I’ll get back to this investigation.
If someone knows more about exactly what the word2vec embedding results are, please tell me.
I have two hypothesises for what is going on. I’m leaning towards 1, but very unsure.
1)
king—man + woman = queen
is true for word2vec embeddings but not in LLaMa2 7B embeddings because word2vec has much fewer embedding dimensions.
LLaMa2 7B has 4096 embedding dimensions.
This paper uses a variety of word2vec with 50, 150 and 300 embedding dimensions.
Possibly when you have thousands of embedding dimensions, these dimensions will encode lots of different connotations of these words. These connotations will probably not line up with the simple relation [king—man + woman = queen], and therefore we get [king—man + woman ≠ queen] for high dimensional embeddings.
2)
king—man + woman = queen
Isn’t true for word2vec either. If you do it with word2vec embeddings you get more or less the same result I did with LLaMa2 7B.
(As I’m writing this, I’m realising that just getting my hands on some word2vec embeddings and testing this for myself, seems much easier than to decode what the papers I found is actually saying.)
It isn’t true for word2vec either. This article from 2019 describes exactly what you found: King—Man + Woman = King?.
Thanks!