Occam’s razor, as it is popularly known, states that “the simplest answer is most likely to be correct”1. It has been noted in other discussion threads that the phrase “simplest description” is somewhat misleading, and that it actually means something along the lines of “description that is easiest to express concisely using natural language”. Occam’s razor typically comes into play when we are trying to explain some observed phenomenon, or, in terms of model-building, when we are trying to come up with a model for our observations. The verbal complexity of a new model will depend on the models that already exist in the observer’s mind, since, as humans, we express new ideas in terms of concepts with which we are already familiar.
Thus, when applied to natural language, Occam’s razor encourages descriptions that are most in line with the observer’s existing worldview, and discourages descriptions that seem implausible given the observer’s current worldview. Since our worldviews are typically very accurate2, this makes sense as a heuristic.
As an example, if a ship sank in the ocean, a simple explanation would be “a storm destroyed it”, and a complicated explanation would be “a green scaly sea-dragon with three horns destroyed it”. The first description is simple because we frequently experience storms, and so we have a word for them, whereas most of us never experience green scaly sea-dragons with three horns, and so we have to describe them explicitly. If the opposite were the case, we’d have some word for the dragons (maybe they’d be called “blicks”), and we would have to describe storms explicitly. Then the descriptions above could be reworded as “rain falling from the sky, accompanied by strong gusts of wind and possibly gigantic electrical discharges, destroyed the ship” and “a blick destroyed the ship”, respectively.
What I’m getting at is that different explanations will have different complexities for different people; the complexity of a description to a person will depend on that person’s collection of life-experiences, and everyone has a different set of life-experiences. This leads to an interesting question: are there universally easy-to-describe concepts? (By universally I mean cross-culturally.) It seems reasonable to claim that a concept C is easy-to-describe for a culture if that culture’s language contains a word that means C; it should be a fairly common word and everyone in the culture should know what it means.
So are there concepts that every language has a word for? Apparently, yes. In fact, the linguist Morris Swadesh came up with exactly such a list of core vocabulary terms. Unsurprisingly from an information theoretic perspective, the English versions of the words on this list are extremely short: most are one syllable, and the consonant clusters are small.
Presumably, if you wanted to communicate an idea to someone from a very different culture, and you could express that idea in terms of the core concepts, then you could explain your idea to that person. (Such an expression would likely require symbolism/metaphor/simile, but these are valid ways of expressing ideas.) Alternatively, imagine trying to explain a complicated idea to a small child; this would probably involve expressing the concept in terms of more concrete, fundamental ideas and objects.
Where does this core vocabulary come from? Is it just that these are the only concepts that basically all humans will be familiar with? Or is there a deeper explanation, like an a priori encoding of these concepts in our brains?
I bring all of this up because it is relevant to the question of whether we could communicate with an artificial intelligence if we built one, and whether this AI would understand the world similarly to how we do (I consider the latter a prerequisite for the former). Presumably an AI would reason about and attempt to model its environment, and presumably it would prefer models with simpler descriptions, if only because such models would be more computationally efficient to reason with. But an AI might have a different definition of “simple description” than we do as humans, and therefore it might come up with very different explanations and understandings of the world, or at the very least a different hierarchy of concepts. This would make communication between humans and AIs difficult.
If we encoded the core vocabulary a priori in the AI’s mind as some kind of basis of atomic concepts, would the AI develop an understanding of the world that was more in line with ours than it would if we didn’t? And would this make it easier for us to communicate intellectually with the AI?
1 Note that Occam’s razor does not say that the simplest answer is actually correct; it just gives us a distribution over models. If we want to build a model, we’ll be considering p(model|data), which by Bayes’ rule is equal to p(data|model)p(model)/p(data). Occam’s razor is one way of specifying p(model). Apologies if this footnote is obvious, but I see this misinterpretation all over the place on the internet.
2 This may seem like a bold statement, but I’m talking about in terms of every-day life sort of things. If you blindfolded me and put me in front of a random tree during summer, in my general geographic region, and asked me what it looked like, I could give you a description of that tree and it would probably be very similar to the actual thing. This is because my worldview about trees is very accurate, i.e. my internal model of trees has very good predictive power.
Occam’s Razor, Complexity of Verbal Descriptions, and Core Concepts
Occam’s razor, as it is popularly known, states that “the simplest answer is most likely to be correct”1. It has been noted in other discussion threads that the phrase “simplest description” is somewhat misleading, and that it actually means something along the lines of “description that is easiest to express concisely using natural language”. Occam’s razor typically comes into play when we are trying to explain some observed phenomenon, or, in terms of model-building, when we are trying to come up with a model for our observations. The verbal complexity of a new model will depend on the models that already exist in the observer’s mind, since, as humans, we express new ideas in terms of concepts with which we are already familiar.
Thus, when applied to natural language, Occam’s razor encourages descriptions that are most in line with the observer’s existing worldview, and discourages descriptions that seem implausible given the observer’s current worldview. Since our worldviews are typically very accurate2, this makes sense as a heuristic.
As an example, if a ship sank in the ocean, a simple explanation would be “a storm destroyed it”, and a complicated explanation would be “a green scaly sea-dragon with three horns destroyed it”. The first description is simple because we frequently experience storms, and so we have a word for them, whereas most of us never experience green scaly sea-dragons with three horns, and so we have to describe them explicitly. If the opposite were the case, we’d have some word for the dragons (maybe they’d be called “blicks”), and we would have to describe storms explicitly. Then the descriptions above could be reworded as “rain falling from the sky, accompanied by strong gusts of wind and possibly gigantic electrical discharges, destroyed the ship” and “a blick destroyed the ship”, respectively.
What I’m getting at is that different explanations will have different complexities for different people; the complexity of a description to a person will depend on that person’s collection of life-experiences, and everyone has a different set of life-experiences. This leads to an interesting question: are there universally easy-to-describe concepts? (By universally I mean cross-culturally.) It seems reasonable to claim that a concept C is easy-to-describe for a culture if that culture’s language contains a word that means C; it should be a fairly common word and everyone in the culture should know what it means.
So are there concepts that every language has a word for? Apparently, yes. In fact, the linguist Morris Swadesh came up with exactly such a list of core vocabulary terms. Unsurprisingly from an information theoretic perspective, the English versions of the words on this list are extremely short: most are one syllable, and the consonant clusters are small.
Presumably, if you wanted to communicate an idea to someone from a very different culture, and you could express that idea in terms of the core concepts, then you could explain your idea to that person. (Such an expression would likely require symbolism/metaphor/simile, but these are valid ways of expressing ideas.) Alternatively, imagine trying to explain a complicated idea to a small child; this would probably involve expressing the concept in terms of more concrete, fundamental ideas and objects.
Where does this core vocabulary come from? Is it just that these are the only concepts that basically all humans will be familiar with? Or is there a deeper explanation, like an a priori encoding of these concepts in our brains?
I bring all of this up because it is relevant to the question of whether we could communicate with an artificial intelligence if we built one, and whether this AI would understand the world similarly to how we do (I consider the latter a prerequisite for the former). Presumably an AI would reason about and attempt to model its environment, and presumably it would prefer models with simpler descriptions, if only because such models would be more computationally efficient to reason with. But an AI might have a different definition of “simple description” than we do as humans, and therefore it might come up with very different explanations and understandings of the world, or at the very least a different hierarchy of concepts. This would make communication between humans and AIs difficult.
If we encoded the core vocabulary a priori in the AI’s mind as some kind of basis of atomic concepts, would the AI develop an understanding of the world that was more in line with ours than it would if we didn’t? And would this make it easier for us to communicate intellectually with the AI?
1 Note that Occam’s razor does not say that the simplest answer is actually correct; it just gives us a distribution over models. If we want to build a model, we’ll be considering p(model|data), which by Bayes’ rule is equal to p(data|model)p(model)/p(data). Occam’s razor is one way of specifying p(model). Apologies if this footnote is obvious, but I see this misinterpretation all over the place on the internet.
2 This may seem like a bold statement, but I’m talking about in terms of every-day life sort of things. If you blindfolded me and put me in front of a random tree during summer, in my general geographic region, and asked me what it looked like, I could give you a description of that tree and it would probably be very similar to the actual thing. This is because my worldview about trees is very accurate, i.e. my internal model of trees has very good predictive power.