I write a message in Klingon to a friend. You intercept it. You’ve never heard of the Klingon language before and have no information about it whatsoever. Is it possible to “decrypt” the message and produce an English translation, assuming that the message is long enough?
The answer to this is far from clear. Part of the question is more linguistic and philosophical than cryptanalytic. There are historical examples of languages that have been successfully deciphered. The two most famous are probably Egyptian hieroglyphics and Linear B. While the first example we had additional texts (especially the Rosetta stone) that allowed us to connect it to an existing language, in the case of Linear B, there was no similar linking text. Linear B turned out to be a form of Greek, but this wasn’t used at all in the decipherment until the very last stages when this had become very apparent (as I understand it, most of the researchers at the time thought that it was not a form of Greek). But there would probably be many words in Linear B that we would not be able to translate today if not for the fact that they have recognizable Greek cognates and counterparts.
But the case of Linear B is very different than the hypothetical case of “Klingon”. Actual Klingon is very similar to a variety of human languages. It isn’t obvious that a language belonging to a genuinely different species with absolutely no cultural or physical context would be decipherable in any useful way. Human languages generally share some basic aspects and it isn’t obvious that those basics would be shared by a language from another species.
If one had more than just language one might be able to decipher things based on the connections to physical reality. So for example, one might be able to recognize a version of the periodic table even if it were arranged in a very different fashion (humans have made a large number of different forms, some three dimensional). And if the text contained material designed to assist in understanding then the situation might be easier even if the language is radically different from anything humans have encountered before. For examples it might have something like “1 . 2 … 3 … 4 …. 5 …..” up to some large number and then having something like “Primes 2.. 3… 5..… 7.......” going out to some distance. Note here I’ve assumed that primes are a concept that another species would even find to be interesting enough to consider. But many simple sequences would be reasonable starters, such as squares or powers of 2. This doesn’t address the situation you cared about which was addressing the cryptanalytic analog between language and the universe. Presumably the universe isn’t trying to be deciphered. And the statement of your remark seems to imply that the message isn’t intended to be deciphered.
You mention a direct historical example which suggests that cryptanalysis of unknown languages can be very tough. In World War II, the United States employed so called “code talkers” who used obscure, generally Native American languages, as secret codes. The use of Navajo in this fashion is the most famous although some other languages were used as well. In this case, even when the Japanese knew towards the end of the war what languages were being used they were unable to crack the codes. However, by the end of the war the codes were not just simple spoken Navajo but had (somewhat weak) cryptography added in and the combination seems to have been what really created the trouble. Note also that in this case the Japanese had a large amount of physical context for the messages since they knew that they were military messages and even knew which messages corresponded (very roughly) to which events.
So the bottom line is that there’s evidence both ways, and it might depend a lot on how different an alien language would be from humans and whether or not the text is intended to be deciphered.
By “Klingon” I was literally referring to the artificial language Klingon as invented by humans, but I really meant it as a stand-in for “any natural language you both don’t know and don’t have any reference texts for.”
Oh. Well that renders most of my response irrelevant. Then the answer is “probably yes”. Getting the basics of the grammar won’t take much effort. So one can tentatively identify which words are verbs, nouns, adjectives, etc. Assuming that you aren’t going out of your way to be terse with your speech, there will be a fair bit of redundancy that should in a large enough text become obvious. For example, if we’ve identified how one says “and” in the language or at least the version for nouns, then might be able to identify the plural form for verbs, or something close to that. Moreover, if we see a word that frequently shows up near the word for “and” we could tentatively guess that that was a word for two as a cardinal number. Similarly, one might be able to get three as a cardinal number. This gets a handle on your number system.
In the direct context of Navajo which you used as your other example, one also has a correspondence with physical events which if one had that data could potentially help a lot.
So if a time-traveling mischief maker gave the NSA a copy of “The Klingon Hamlet” in 1980 (minus all the English text), would they have been able to “decrypt” it?
So if a time-traveling mischief maker gave the NSA a copy of “The Klingon Hamlet” in 1980, would they have been able to “decrypt” it?
That’s a difficult case. It would depend on the resources thrown into it. If it were formatted as a play it wouldn’t take long for someone to notice that, and the five acts would to an English speaker suggest Shakespeare. (This isn’t a hypothesis someone would immediately hit upon but it is the sort of thing that someone would eventually think of.) At that point things will be much easier since we have an effective Rosetta stone. However, note that even with the Rosetta stone, the deciphering of Egyptian hieroglyphs took a very long time even after the main breakthroughs.
If the text were not the text of Hamlet but were a similar random text of the same length, then it is almost certainly not long enough to be decipherable in any reasonable span of time. I don’t however have any idea how much longer the text would need to be.
The answer to this is far from clear. Part of the question is more linguistic and philosophical than cryptanalytic. There are historical examples of languages that have been successfully deciphered. The two most famous are probably Egyptian hieroglyphics and Linear B. While the first example we had additional texts (especially the Rosetta stone) that allowed us to connect it to an existing language, in the case of Linear B, there was no similar linking text. Linear B turned out to be a form of Greek, but this wasn’t used at all in the decipherment until the very last stages when this had become very apparent (as I understand it, most of the researchers at the time thought that it was not a form of Greek). But there would probably be many words in Linear B that we would not be able to translate today if not for the fact that they have recognizable Greek cognates and counterparts.
But the case of Linear B is very different than the hypothetical case of “Klingon”. Actual Klingon is very similar to a variety of human languages. It isn’t obvious that a language belonging to a genuinely different species with absolutely no cultural or physical context would be decipherable in any useful way. Human languages generally share some basic aspects and it isn’t obvious that those basics would be shared by a language from another species.
If one had more than just language one might be able to decipher things based on the connections to physical reality. So for example, one might be able to recognize a version of the periodic table even if it were arranged in a very different fashion (humans have made a large number of different forms, some three dimensional). And if the text contained material designed to assist in understanding then the situation might be easier even if the language is radically different from anything humans have encountered before. For examples it might have something like “1 . 2 … 3 … 4 …. 5 …..” up to some large number and then having something like “Primes 2.. 3… 5..… 7.......” going out to some distance. Note here I’ve assumed that primes are a concept that another species would even find to be interesting enough to consider. But many simple sequences would be reasonable starters, such as squares or powers of 2. This doesn’t address the situation you cared about which was addressing the cryptanalytic analog between language and the universe. Presumably the universe isn’t trying to be deciphered. And the statement of your remark seems to imply that the message isn’t intended to be deciphered.
You mention a direct historical example which suggests that cryptanalysis of unknown languages can be very tough. In World War II, the United States employed so called “code talkers” who used obscure, generally Native American languages, as secret codes. The use of Navajo in this fashion is the most famous although some other languages were used as well. In this case, even when the Japanese knew towards the end of the war what languages were being used they were unable to crack the codes. However, by the end of the war the codes were not just simple spoken Navajo but had (somewhat weak) cryptography added in and the combination seems to have been what really created the trouble. Note also that in this case the Japanese had a large amount of physical context for the messages since they knew that they were military messages and even knew which messages corresponded (very roughly) to which events.
So the bottom line is that there’s evidence both ways, and it might depend a lot on how different an alien language would be from humans and whether or not the text is intended to be deciphered.
By “Klingon” I was literally referring to the artificial language Klingon as invented by humans, but I really meant it as a stand-in for “any natural language you both don’t know and don’t have any reference texts for.”
Oh. Well that renders most of my response irrelevant. Then the answer is “probably yes”. Getting the basics of the grammar won’t take much effort. So one can tentatively identify which words are verbs, nouns, adjectives, etc. Assuming that you aren’t going out of your way to be terse with your speech, there will be a fair bit of redundancy that should in a large enough text become obvious. For example, if we’ve identified how one says “and” in the language or at least the version for nouns, then might be able to identify the plural form for verbs, or something close to that. Moreover, if we see a word that frequently shows up near the word for “and” we could tentatively guess that that was a word for two as a cardinal number. Similarly, one might be able to get three as a cardinal number. This gets a handle on your number system.
In the direct context of Navajo which you used as your other example, one also has a correspondence with physical events which if one had that data could potentially help a lot.
So if a time-traveling mischief maker gave the NSA a copy of “The Klingon Hamlet” in 1980 (minus all the English text), would they have been able to “decrypt” it?
That’s a difficult case. It would depend on the resources thrown into it. If it were formatted as a play it wouldn’t take long for someone to notice that, and the five acts would to an English speaker suggest Shakespeare. (This isn’t a hypothesis someone would immediately hit upon but it is the sort of thing that someone would eventually think of.) At that point things will be much easier since we have an effective Rosetta stone. However, note that even with the Rosetta stone, the deciphering of Egyptian hieroglyphs took a very long time even after the main breakthroughs.
If the text were not the text of Hamlet but were a similar random text of the same length, then it is almost certainly not long enough to be decipherable in any reasonable span of time. I don’t however have any idea how much longer the text would need to be.