orthonormal comments on Extensions and Intensions

orthonormal 14 Oct 2024 0:55 UTC
5 points
1
Soon the two are lost in a maze of words defined in other words, the problem that Steven Harnad once described as trying to learn Chinese from a Chinese/Chinese dictionary.
Of course, it turned out that LLMs do this just fine, thank you.
- gwern 14 Oct 2024 20:48 UTC
  5 points
  2
  Parent
  I don’t think LLMs do the equivalent of that. It’s more like, learning Chinese from a Chinese/Chinese dictionary stapled to a Chinese encyclopedia.
  
  It is not obvious to me that using a Chinese/Chinese dictionary, purged of example sentences, would let you learn, even in theory, even things a simple n-grams or word2vec model trained on a non-dictionary corpus does and encodes into embeddings. For example, would a Chinese/Chinese dictionary let you plot cities by longitude & latitude? (Most dictionaries do not try to list all names, leaving that to things like atlases or gazetteers, because they are about the language, and not a specific place like China, after all.)
  
  Note that the various examples from machine translation you might think of, such as learning translation while having zero parallel sentences/translations, are usually using corpuses much richer than just an intra-language dictionary.
- Adele Lopez 14 Oct 2024 2:04 UTC
  2 points
  0
  Parent
  I don’t doubt that LLMs could do this, but has this exact thing actually been done somewhere?
  - Martin Randall 14 Oct 2024 21:15 UTC
    3 points
    0
    Parent
    I’ve not read the paper but something like https://arxiv.org/html/2402.19167v1 seems like the appropriate experiment.