Soon the two are lost in a maze of words defined in other words, the problem that Steven Harnad once described as trying to learn Chinese from a Chinese/Chinese dictionary.
Of course, it turned out that LLMs do this just fine, thank you.
I don’t think LLMs do the equivalent of that. It’s more like, learning Chinese from a Chinese/Chinese dictionary stapled to a Chinese encyclopedia.
It is not obvious to me that using a Chinese/Chinese dictionary, purged of example sentences, would let you learn, even in theory, even things a simple n-grams or word2vec model trained on a non-dictionary corpus does and encodes into embeddings. For example, would a Chinese/Chinese dictionary let you plot cities by longitude & latitude? (Most dictionaries do not try to list all names, leaving that to things like atlases or gazetteers, because they are about the language, and not a specific place like China, after all.)
Note that the various examples from machine translation you might think of, such as learning translation while having zero parallel sentences/translations, are usually using corpuses much richer than just an intra-language dictionary.
Of course, it turned out that LLMs do this just fine, thank you.
I don’t think LLMs do the equivalent of that. It’s more like, learning Chinese from a Chinese/Chinese dictionary stapled to a Chinese encyclopedia.
It is not obvious to me that using a Chinese/Chinese dictionary, purged of example sentences, would let you learn, even in theory, even things a simple n-grams or word2vec model trained on a non-dictionary corpus does and encodes into embeddings. For example, would a Chinese/Chinese dictionary let you plot cities by longitude & latitude? (Most dictionaries do not try to list all names, leaving that to things like atlases or gazetteers, because they are about the language, and not a specific place like China, after all.)
Note that the various examples from machine translation you might think of, such as learning translation while having zero parallel sentences/translations, are usually using corpuses much richer than just an intra-language dictionary.
I don’t doubt that LLMs could do this, but has this exact thing actually been done somewhere?
I’ve not read the paper but something like https://arxiv.org/html/2402.19167v1 seems like the appropriate experiment.