Testing “True” Language Understanding in LLMs: A Simple Proposal
The Core Idea
What if we could test whether language models truly understand meaning, rather than just matching patterns? Here’s a simple thought experiment:
Create two artificial languages (A and B) that bijectively map to the same set of basic concepts R’
Ensure these languages are designed independently (no parallel texts)
Test if an LLM can translate between them without ever seeing translations
If successful, this would suggest the model has learned to understand the underlying meanings, not just statistical patterns between languages. Theoretically, if Language A and Language B each form true mappings (MA and MB) to the same concept space R’, then the model should be able to perform translation through the composition MA·MB^(-1), effectively going from Language A to concepts and then to Language B, without ever seeing parallel examples. This emergent translation capability would be a strong indicator of genuine semantic understanding, as it requires the model to have internalized the relationship between symbols and meanings in each language independently.
Why This Matters
This approach could help distinguish between:
Surface-level pattern matching
Genuine semantic understanding
Internal concept representation
It’s like testing if someone really understands two languages versus just memorizing a translation dictionary.
Some Initial Thoughts
Potential Setup
Start with a small, controlled set of basic concepts (colors, numbers, simple actions)
Design Language A with one set of rules/structure
Design Language B with completely different rules/structure
Both languages should map clearly to the same concepts without ambiguity
Example (Very Simplified)
Concept: “red circle”
Language A: “zix-kol” (where “zix” = red, “kol” = circle)
Language B: “nare-tup” (where “nare” = red, “tup” = circle)
Without ever showing the model that “zix-kol” = “nare-tup”, can it figure out the translation by understanding that both phrases refer to the same concept?
Open Questions
How do we ensure the languages are truly independent?
What’s the minimum concept space needed for a meaningful test?
How do we efficiently validate successful translations?
Limitations
As an undergraduate student outside the AI research community, I acknowledge:
This is an initial thought experiment
Implementation would require significant resources and expertise
Many practical challenges would need to be addressed
Call for Discussion
I’m sharing this idea in hopes that:
Researchers with relevant expertise might find it interesting
It could contribute to discussions about AI understanding
Others might develop or improve upon the concept
About Me
I’m an engineering student interested in AI understanding and alignment. While I may not have the resources to develop this idea fully, I hope sharing it might spark useful discussions or inspire more developed approaches.
Feedback Welcome
If you have thoughts, suggestions, or see potential in this idea, I’d love to hear from you. Please feel free to comment or reach out.
I believe this has been done in Google’s Multilingual Neural Machine Translation (GNMT) system that enables zero-shot translations (translating between language pairs without direct training examples). This system leverages shared representations across languages, allowing the model to infer translations for unseen language pairs.
I made basically the same proposal here, but phrased as a task of translating between a long alien message and human languages: https://www.lesswrong.com/posts/J3zA3T9RTLkKYNgjw/is-llm-translation-without-rosetta-stone-possible See also the comments, which contain a reference to a paper with a related approach on unsupervised machine translation. Also this comment echoes your post: