we could bring in ideas from coding theory, talk about Kraft’s inequality, et cetera
Could you expand on this? Following Wikipedia, Kraft’s inequality seems to be saying that if we’re translating a message from an alphabet with n symbols to an alphabet with r symbols by means of representing the symbols s_i in the first alphabet by words l_i spelled in the second alphabet, then in order for the message to be uniquely decodable, it must be the case that
%5E{\ell_i}%20\leq%201)
However, I don’t understand how this is relevant to the question of whether some human choices of language are crazy. For example, when people object to the use of the word God in reference to what they would prefer to call a superintelligence, it’s not because they believe that using the word God would somehow violate Kraft’s inequality, thereby rendering the intended message ambiguous. There’s nothing information-theoretically wrong with the string God; rather, the claim is that that string is already taken to refer to a different concept. Do you agree, or have I misread you?
Hm hm hm, I’m having trouble sorting this out. The full idea I think I failed to correctly reference is that giving certain concepts short “description lengths”—where description length doesn’t mean number of letters, but something like semantic familiarity—in your language is equivalent to saying that the concepts signified by those words represent things-in-the-world that show up more often. But really the whole analogy is of course flawed from the start because we need to talk about decision theoretically important things-in-the-world, not probabilistically likely things-in-the-world, though in many cases the latter is the starting point for the former. Like, if we use a language that uses the concept of God a lot but not the concept of superintelligence—and here it’s not the length of the strings that matter, but like the semantic length, or like, how easy or hard it is to automatically induce the connotations of the word; and that is the non-obvious and maybe just wrong part of the analogy—then that implies that you think that God shows up more in the world than superintelligence. I was under the impression that one could start talking about the latter using Kraft’s inequality but upon closer inspection I’m not sure; what jumped out at me was simply: “More specifically, Kraft’s inequality limits the lengths of codewords in a prefix code: if one takes an exponential function of each length, the resulting values must look like a probability mass function. Kraft’s inequality can be thought of in terms of a constrained budget to be spent on codewords, with shorter codewords being more expensive.” Do you see what I’m trying to get at now with my loose analogy? If so, might you help me reason through or debug the reference?
The full idea I think I failed to correctly reference is that giving certain concepts short “description lengths” [...] in your language is equivalent to saying that the concepts signified by those words represent things-in-the-world that show up more often. [...] “More specifically, [...] Kraft’s inequality can be thought of in terms of a constrained budget to be spent on codewords, with shorter codewords being more expensive.”
Sure. Short words are more expensive because there are fewer of them; because short words are scarce, we want to use them to refer to frequently-used concepts. Is that what you meant? I still don’t see how this is relevant to the preceding discussion (see the grandparent).
Also, for clearer communication, you might consider directly saying things like “Short words are more expensive because there are fewer of them” rather than making opaque references to things like Kraft’s inequality. Technical jargon is useful insofar as it helps communicate ideas; references that may be appropriate in the context of a technical discussion about information theory may not be appropriate in other contexts.
That’s not quite what I mean, no. It’s not the length of the words that I actually care about, really, and thus upon reflection it is clear that the analogy is too opaque. What I care about is the choice of which concepts to have set aside as concepts-that-need-little-explanation—”ultimate convergent algorithm for arbitrary superintelligence’s’” here, “God” at some theological hangout—and how that reflects which things-in-the-world one has implicitly claimed are more or less common (but really it’d be too hard to disentangle from things-in-the-world one has implicitly claimed are more or less important). It’s the differential “length” of the concepts that I’m trying to talk about. The syntactic length, i.e. the number of letters, doesn’t interest me.
Referencing Kraft’s inequality was my way of saying “this is the general type of reasoning that I have cached as perhaps relevant to the kind of inquiry it would be useful to do”. But I think you’re right that it’s too opaque to be useful.
Edit: To try to explain the intuition a little more, it’s like applying the “scarce short strings” theme to the concepts directly, where the words are just paintbrush handles. That is how I think one might try to argue that language choices can be objectively “irrational” anyway.
I don’t think the analogy holds. The reason Kraft’s inequality works is that the number of possible strings of length n over a b-symbol alphabet is exactly b^n. This places a bound on the number of short words you can have. Whereas if we’re going to talk about the “amount of mental content” we pack into a single “concept-needing-little-explanation,” I don’t see any analogous bound: I don’t see any reason in principle why a mind of arbitrary size couldn’t have an arbitrary number of complicated “short” concepts.
For concreteness, consider that in technical disciplines, we often speak and think in terms of “short” concepts that would take a lot of time to explain to outsiders. For example, eigenvalues. The idea of an eigenvalue is “short” in the sense that we treat it as a basic conceptual unit, but “complicated” in the sense that it’s built out of a lot of prerequisite knowledge about linear transformations. Why couldn’t a mind create an arbitrary number of such conceptual “chunks”? Or if my model of what it means for a concept to be “short” is wrong, then what do you mean?
I note that my thinking here feels confused; this topic may be too advanced for me to discuss sanely.
On top of that there’s this whole thing where people are constantly using social game theory to reason about what choice of words does or doesn’t count as defecting against local norms, what the consequences would be of failing to punish non-punishers of people who use words in a way that differs from ways that are privileged by social norms, et cetera, which make a straight up information theoretic approach somewhat off-base for even more reasons other than just the straightforward ambiguities imposed by considering implicit utilities as well as probabilities. And that doesn’t even mention the heuristics and biases literature or neuroscience, which take the theoretical considerations and laugh at them.
Could you expand on this? Following Wikipedia, Kraft’s inequality seems to be saying that if we’re translating a message from an alphabet with n symbols to an alphabet with r symbols by means of representing the symbols s_i in the first alphabet by words l_i spelled in the second alphabet, then in order for the message to be uniquely decodable, it must be the case that
%5E{\ell_i}%20\leq%201)However, I don’t understand how this is relevant to the question of whether some human choices of language are crazy. For example, when people object to the use of the word God in reference to what they would prefer to call a superintelligence, it’s not because they believe that using the word God would somehow violate Kraft’s inequality, thereby rendering the intended message ambiguous. There’s nothing information-theoretically wrong with the string God; rather, the claim is that that string is already taken to refer to a different concept. Do you agree, or have I misread you?
Hm hm hm, I’m having trouble sorting this out. The full idea I think I failed to correctly reference is that giving certain concepts short “description lengths”—where description length doesn’t mean number of letters, but something like semantic familiarity—in your language is equivalent to saying that the concepts signified by those words represent things-in-the-world that show up more often. But really the whole analogy is of course flawed from the start because we need to talk about decision theoretically important things-in-the-world, not probabilistically likely things-in-the-world, though in many cases the latter is the starting point for the former. Like, if we use a language that uses the concept of God a lot but not the concept of superintelligence—and here it’s not the length of the strings that matter, but like the semantic length, or like, how easy or hard it is to automatically induce the connotations of the word; and that is the non-obvious and maybe just wrong part of the analogy—then that implies that you think that God shows up more in the world than superintelligence. I was under the impression that one could start talking about the latter using Kraft’s inequality but upon closer inspection I’m not sure; what jumped out at me was simply: “More specifically, Kraft’s inequality limits the lengths of codewords in a prefix code: if one takes an exponential function of each length, the resulting values must look like a probability mass function. Kraft’s inequality can be thought of in terms of a constrained budget to be spent on codewords, with shorter codewords being more expensive.” Do you see what I’m trying to get at now with my loose analogy? If so, might you help me reason through or debug the reference?
Sure. Short words are more expensive because there are fewer of them; because short words are scarce, we want to use them to refer to frequently-used concepts. Is that what you meant? I still don’t see how this is relevant to the preceding discussion (see the grandparent).
Also, for clearer communication, you might consider directly saying things like “Short words are more expensive because there are fewer of them” rather than making opaque references to things like Kraft’s inequality. Technical jargon is useful insofar as it helps communicate ideas; references that may be appropriate in the context of a technical discussion about information theory may not be appropriate in other contexts.
That’s not quite what I mean, no. It’s not the length of the words that I actually care about, really, and thus upon reflection it is clear that the analogy is too opaque. What I care about is the choice of which concepts to have set aside as concepts-that-need-little-explanation—”ultimate convergent algorithm for arbitrary superintelligence’s’” here, “God” at some theological hangout—and how that reflects which things-in-the-world one has implicitly claimed are more or less common (but really it’d be too hard to disentangle from things-in-the-world one has implicitly claimed are more or less important). It’s the differential “length” of the concepts that I’m trying to talk about. The syntactic length, i.e. the number of letters, doesn’t interest me.
Referencing Kraft’s inequality was my way of saying “this is the general type of reasoning that I have cached as perhaps relevant to the kind of inquiry it would be useful to do”. But I think you’re right that it’s too opaque to be useful.
Edit: To try to explain the intuition a little more, it’s like applying the “scarce short strings” theme to the concepts directly, where the words are just paintbrush handles. That is how I think one might try to argue that language choices can be objectively “irrational” anyway.
I don’t think the analogy holds. The reason Kraft’s inequality works is that the number of possible strings of length n over a b-symbol alphabet is exactly b^n. This places a bound on the number of short words you can have. Whereas if we’re going to talk about the “amount of mental content” we pack into a single “concept-needing-little-explanation,” I don’t see any analogous bound: I don’t see any reason in principle why a mind of arbitrary size couldn’t have an arbitrary number of complicated “short” concepts.
For concreteness, consider that in technical disciplines, we often speak and think in terms of “short” concepts that would take a lot of time to explain to outsiders. For example, eigenvalues. The idea of an eigenvalue is “short” in the sense that we treat it as a basic conceptual unit, but “complicated” in the sense that it’s built out of a lot of prerequisite knowledge about linear transformations. Why couldn’t a mind create an arbitrary number of such conceptual “chunks”? Or if my model of what it means for a concept to be “short” is wrong, then what do you mean?
I note that my thinking here feels confused; this topic may be too advanced for me to discuss sanely.
On top of that there’s this whole thing where people are constantly using social game theory to reason about what choice of words does or doesn’t count as defecting against local norms, what the consequences would be of failing to punish non-punishers of people who use words in a way that differs from ways that are privileged by social norms, et cetera, which make a straight up information theoretic approach somewhat off-base for even more reasons other than just the straightforward ambiguities imposed by considering implicit utilities as well as probabilities. And that doesn’t even mention the heuristics and biases literature or neuroscience, which take the theoretical considerations and laugh at them.
Ah, I’m needlessly reinventing some aspects of the wheel.