I don’t think LLMs do the equivalent of that. It’s more like, learning Chinese from a Chinese/Chinese dictionary stapled to a Chinese encyclopedia.
It is not obvious to me that using a Chinese/Chinese dictionary, purged of example sentences, would let you learn, even in theory, even things a simple n-grams or word2vec model trained on a non-dictionary corpus does and encodes into embeddings. For example, would a Chinese/Chinese dictionary let you plot cities by longitude & latitude? (Most dictionaries do not try to list all names, leaving that to things like atlases or gazetteers, because they are about the language, and not a specific place like China, after all.)
Note that the various examples from machine translation you might think of, such as learning translation while having zero parallel sentences/translations, are usually using corpuses much richer than just an intra-language dictionary.
I don’t think LLMs do the equivalent of that. It’s more like, learning Chinese from a Chinese/Chinese dictionary stapled to a Chinese encyclopedia.
It is not obvious to me that using a Chinese/Chinese dictionary, purged of example sentences, would let you learn, even in theory, even things a simple n-grams or word2vec model trained on a non-dictionary corpus does and encodes into embeddings. For example, would a Chinese/Chinese dictionary let you plot cities by longitude & latitude? (Most dictionaries do not try to list all names, leaving that to things like atlases or gazetteers, because they are about the language, and not a specific place like China, after all.)
Note that the various examples from machine translation you might think of, such as learning translation while having zero parallel sentences/translations, are usually using corpuses much richer than just an intra-language dictionary.