Re goals, I feel that comparing advanced AGI to humans is like comparing humans to chimps: regardless how much we want to explain human ethics and goals to a chimp, and how much effort we put in, its mind just isn’t equipped to comprehend them. Similarly, even the most benevolent and conscientious AGI would be unable to explain its goal system or its ethical system to even a very smart human. Like chimps, humans have their own limits of comprehension, even though we do not know what they are from the inside.
Can you say more about what you’re expecting a successful explanation to comprise, here?
E.g., suppose an AGI attempts to explain its ethics and goals to me, and at the end of that process it generates thousand-word descriptions of N future worlds and asks me to rank them in order of its preferences as I understand them. I expect to be significantly better at predicting the AGI’s rankings than I was before the explanation.
I don’t expect to be able to do anything equivalent with a chimp.
E.g., suppose an AGI attempts to explain its ethics and goals to me
“Suppose an AGI attempts to explain its and to me” is what I expect it to sound like to humans if we were to replace human abstractions with those an advanced AGI would use. It would not even call these abstractions “ethics” or “goals”, no more than we call ethics “groom” and goals “sex” when talking to a chimp.
suppose an AGI attempts to explain its ethics and goals to me, and at the end of that process it generates thousand-word descriptions of N future worlds and asks me to rank them in order of its preferences as I understand them.
I do not expect it to be able to generate such descriptions at all, due to the limitations of the human mind and human language. So, yes, our expectations differ here. I do not think that human intelligence reached some magical threshold where everything can be explained to it, given enough effort, even though it was not possible with “less advanced” animals. For all I know, I am not even using the right terms. Maybe an AGI improvement on the term “explain” is incomprehensible to us. Like if we were to translate “explain” into chimp or cat it would come out as “show”, or something.
(shrug) Translating the terms is rather beside my point here.
If the AGI is using these things to choose among possible future worlds, then I expect it to be able to teach me to choose among possible future worlds more like it does than I would without that explanation.
I’m happy to call those things goals, ethics, morality, etc., even if those words don’t capture what the AGI means by them. (I don’t know that they really capture what I mean by them either, come to that.) Perhaps I would do better to call them “groom” or “fleem” or “untranslatable1” or refer to them by means of a specific shade of orange. I don’t know; but as I say, I don’t really care; terminology is largely independent of explanation.
But, sure, if you expect that it’s incapable of doing that, then our expectations differ.
I’ll note that my expectations don’t depend on my having reached a magical threshold, or on everything being explainable to me given enough effort.
What are your reasons for thinking this? I find myself disagreeing: one big disanalogy is that while we have language and chimps do not, we and the AGI both have language. I find it implausible that the AGI could not in principle communicate to us its goals: give the AGI and ourselves an arbitrarily large amount of time and resources to talk, do you really think we’d never come to a common understanding? Because even if we don’t, the AGI effectively does have such resources by which it might, I donno, choose its words with care.
I’m also not sure why we should think it would even be particularly challenging to understand the goals of an AGI. It’s not easy even with other humans, but why would it be much harder with AGI? Do we have some reason to expect its goals to be more complex than ours? It’s been my experience that the more sophisticated and intelligent someone is, the more intelligible their behavior tends to be. My prejudice therefore says that the goals of an AGI would be much easier to understand than, say, my own.
I am trying to use an outside view here, because I find the inside view too limiting. The best I can do is to construct a tower of comparisons between species vastly different in intelligence and conjecture that this tower does not end with humans on top, a Copernican principle, if you like. To use some drastically different pairing, if you agree that an amoeba can never comprehend fish, that fish can never comprehend chimps, that chimps can never understand humans, then there is no reason to stop there and proclaim that humans would understand whatever intelligence comes next.
Certainly language is important, and human language is much more evolved than that of other animals. There are parts of human language, like writing, which are probably inaccessible to chimps, no matter how much effort we put into teaching them and how patient we are. I can easily imagine that AGI would use some kind of “meta-language”, because human language would simply be inadequate for expressing its goals, like the chimp language is inadequate for expressing human metaethics.
I do not know what this next step would be, no more than an intelligent chimp being able to predict that humans would invent writing. My mind as-is is too limited and I understand as much. An AGI would have to make me smarter first, before being able to explain what it means to me. Call it “human uplifting”.
Do we have some reason to expect its goals to be more complex than ours?
Yes, if you look through the tower of goals, more intelligent species have more complex goals.
It’s been my experience that the more sophisticated and intelligent someone is, the more intelligible their behavior tends to be.
It has not been mine. When someone smarter than I am behaves a certain way, they have to patiently explain to me why they do what they do. And I still only see the path they have taken, not the million paths they briefly considered and rejected along the way.
My prejudice therefore says that the goals of an AGI would be much easier to understand than, say, my own.
My prejudice tells me that when someone a few levels above mine tries to explain their goals and motivations to me in English, I may understand each word, but not the complete sentences. If you cannot relate to this experience, go to a professional talk on a subject you know nothing about. For example, a musician friend of mine who attended my PhD defense commented on what she said was a surreal experience: I was talking in English, and most of the words she knew, but most of what I said was meaningless to her. Certainly some of this gap can be patched to a degree, and after a decade or so of dedicated work by both sides, wrought with frustration and doubt, but I don’t think if the gap is wide enough it can be bridged completely.
I find the line of thinking “we are humans, we are smart, we can understand the goals of even an incredibly smart AGI” to be naive, unimaginative and closed-minded, given that our experience is rife with counterexamples.
I am trying to use an outside view here, because I find the inside view too limiting. The best I can do is to construct a tower of comparisons between species vastly different in intelligence and conjecture that this tower does not end with humans on top, a Copernican principle, if you like. To use some drastically different pairing, if you agree that an amoeba can never comprehend fish, that fish can never comprehend chimps, that chimps can never understand humans, then there is no reason to stop there and proclaim that humans would understand whatever intelligence comes next.
OK, but why not look at this tower another way. A fish is basically useless at explaining its goals to an amoeba. We are not in fact useless at explaining our goals to chimps. Human researchers are often able to convey simple goals to chimps, and then see if chimps will help them accomplish those goals, for instance. I am able to convey simple goals to my dog: I can convey to him some information about the kinds of things I dislike and the kinds of things I like.
So the gap in intelligence between fish and humans also seems to translate into a gap in ability to convey useful information about goals to creatures of lower intelligence. Humans are much better at communicating with less intelligent beings than fish or cattle or chimps are. Extrapolating this, you might expect a superintelligent AGI to be much much superior at communicating its goals (if it wants to). The line of thinking here is not so much “we are humans, we are smart, we can understand the goals of even an incredibly smart AGI”; it’s “an incredibly smart AGI is incredibly smart, so it will be able to find effective strategies for communicating its goals to us if it so desires.”
So it seems like naive extrapolation pulls in two separate directions here. On the one hand, the tower of intelligence seems to put limits on the ability of beings lower down to comprehend the goals of beings higher up. On the other hand, the higher up you go, the better beings at that level become at communicating their goals to beings lower down. Which one of these tendencies will win out when it comes to human-AGI interaction? Beats me. I’m pretty skeptical of naive extrapolation in this domain anyway, given Eliezer’s point that major advances in optimization power are meta-level qualitative shifts, and so we shouldn’t expect trends to be maintained across those shifts.
Humans are much better at communicating with less intelligent beings than fish or cattle or chimps are.
You are right that we are certainly able to convey a small simple subset of our goals, desires and motivations to some complex enough animals. You would probably also agree that most of what makes us human can never be explained to a dog or a cat, no matter how hard we try. We appear to them like members of their own species who sometimes make completely incomprehensible decisions they have no choice but put up with.
“an incredibly smart AGI is incredibly smart, so it will be able to find effective strategies for communicating its goals to us if it so desires.”
This is quite possible. It might give us its dumbed-down version of its 10 commandments which would look to us like an incredible feat of science and philosophy.
Which one of these tendencies will win out when it comes to human-AGI interaction? Beats me.
Right. An optimistic view is that we can understand the explanations, a pessimistic view is that we would only be able to follow instructions (this is not the most pessimistic view by far).
I’m pretty skeptical of naive extrapolation in this domain anyway, given Eliezer’s point that major advances in optimization power are meta-level qualitative shifts, and so we shouldn’t expect trends to be maintained across those shifts.
Indeed, we shouldn’t. I probably phrased my point poorly. What I tried to convey is that because “major advances in optimization power are meta-level qualitative shifts”, confidently proclaiming that an advanced AGI will be able to convey what it thinks to humans is based on the just-world fallacy, not on any solid scientific footing.
For example, a musician friend of mine who attended my PhD defense commented on what she said was a surreal experience: I was talking in English, and most of the words she knew, but most of what I said was meaningless to her.
Thats because you weren’t really speaking english, you were speaking the english words for math terms related to physics. The people who spoke the relevant math you were alluding to could follow, those who didn’t, could not, because they didn’t have concrete mathematical ideas to tie the words to. Its not just a matter of jargon, its an actual language barrier. I think you’d find, with a jargon cheat sheet, you could follow many non-mathematical phd defenses just fine.
The same thing happens in music, which is its own language (after years of playing, I find I can “listen” to a song by reading sheet music).
Is your argument, essentially, that you think a machine intelligence can create a mathematics humans cannot understand, even in principle?
Is your argument, essentially, that you think a machine intelligence can create a mathematics humans cannot understand, even in principle?
“mathematics” may be a wrong word for it. I totally think that a transhuman can create concepts and ideas which a mere human cannot understand even when patiently explained. I am quite surprised that other people here don’t find it an obvious default.
My impression was the question was not if it’d have those concepts, since as you say thats obvious, but if they’d be referenced necessarily by the utility function.
Sure, but I find “can’t understand” sort of fuzzy as a concept. i.e. I wouldn’t say I ‘understand’ compactification and calabi yau manifolds the same way I understand sheet music (or the same way I understand the word green), but I do understand them all in some way.
It seems unlikely to me that there exist concepts that can’t be at least broadly conveyed via some combination of those. My intuition is that existing human languages cover, with their descriptive power, the full range of explainable things.
for example- it seems unlikely there exists a law of physics that cannot be expressed as an equation. It seems equally unlikely there exists an equation I would be totally incapable of working with. Even if I’ll never have the insight that lead someone to write it down, if you give it to me, I can use it to do things.
Human languages can encode anything, but a human can’t understand most things valid in human languages; most notably, extremely long things, and numbers specified with a lot of digits that actually matters. Just because you can count in binary on you hands does not mean you can comprehend the code of an operating system expressed in that format.
Humans seem “concept-complete” in much the same way your desktop PC seems turing complete. Except it’s much more easily broken because the human brain has absurdly shity memory.
numbers specified with a lot of digits that actually matters
Thats why we have paper, I can write it down. “Understanding” and “remembering” seem somewhat orthogonal here. I can’t recite Moby Dick from memory, but I understood the book. If you give me a 20 digit number 123… and I can’t hold it but retain “a number slightly larger than 1.23 * 10^20,” that doesn’t mean I can’t understand you.
Just because you can count in binary on you hands does not mean you can comprehend the code of an operating system expressed in that format.
Print it out for me, and give me enough time, and I will be able to understand it, especially if you give me some context.
Yes, you can encode things in a way that make them harder for humans to understand, no one would argue that. The question is- are there concepts that are simply impossible to explain to a human? I point out that while I can’t remember a 20 digit number, I can derive pretty much all of classical physics, so certainly humans can hold quite complex ideas in their head, even if they aren’t optimized for storage of long numbers.
You can construct a system consisting of a planet’s worth of paper and pencils and an immortal version of yourself (or a vast dynasty of successors) that can understand it, if nothing else because it’s turing complete and can simulate the AGI. this is not the same as you understanding it while still remaining fully human. Even if you did somehow integrate the paper-system sufficiently that’d be just as big a change as uploading and intelligence-augmenting the normal way.
The approximation thing is why I specified digits mattering. It wont help one bit when talking about something like gödel numbering.
The approximation thing is why I specified digits mattering.
I understand, my point was simply that “understanding” and “holding in your head at one time” are not at all the same thing. “There are numbers you can’t remember if I tell them to you” is not at all the same claim that “there are ideas I can’t explain to you.”
Neither of your cases are unexplainable- give me the source code in a high level language, instead of binary and I can understand it. If you give me the binary code and the instruction set I can convert it to assembly and then a higher level language, via disassembly.
Of course, i can deliberately obfuscate an idea and make it harder to understand, either by encryption or by presenting the most obtuse possible form, that is not the same as an idea that fundamentally cannot be explained.
“There are numbers you can’t remember if I tell them to you” is not at all the same claim that “there are ideas I can’t explain to you.”
But they might be related. Perhaps there are interesting and useful concepts that would take, say, 100,000 pages of English text to write down, such that each page cannot be understood without holding most of the rest of the text in working memory, and such that no useful, shorter, higher-level version of the concept exists.
Humans can only think about things that can be taken one small piece at a time, because our working memories are pretty small. It’s plausible to me that there are atomic ideas that are simply too big to fit in a human’s working memory, and which do need to be held in your head at one time in order to be understood.
It seems unlikely to me that there exist concepts that can’t be at least broadly conveyed via some combination of those. My intuition is that existing human languages cover, with their descriptive power, the full range of explainable things.
My intuition is the exact opposite.
it seems unlikely there exists a law of physics that cannot be expressed as an equation
I can totally imagine that some models are not reducible to equations, but that’s not the point, really.
Even if I’ll never have the insight that lead someone to write it down, if you give it to me, I can use it to do things.
Unless this “use” requires more brainpower than you have… You might still be able to work with some simplified version, but you’d have to have transhuman intelligence to “do things” with the full equation.
To use some drastically different pairing, if you agree that an amoeba can never comprehend fish, that fish can never comprehend chimps, that chimps can never understand humans, then there is no reason to stop there and proclaim that humans would understand whatever intelligence comes next.
Yes, if you look through the tower of goals, more intelligent species have more complex goals.
This seems like a bogus use of the outside view. AGI is qualitatively different to evolved intelligence, in that it is not evolved, but built by a lesser intelligence. Moreover, there’s a simple explanation for the observation that more intelligent animals have more complex goals, which is that more intelligence permits more subgoals, and natural selection generally alters a species’ goals by adding, rather than simplifying. This is pretty much totally inapplicable to a constructed AGI.
I will try to refute you by understanding what you say. So could you explain to me this idea of a ‘meta-language’? I guess that by ‘meta-’ you intend to say that at least some sentences in the meta-language couldn’t in principle be translated into a non-meta ‘human’ language. Is that right?
given that our experience is rife with counterexamples.
This is not a given. I’ve been to plenty of dissertation defenses on topics I know little to nothing about, and you’re right that I’m often at a loss. But this, I find, is because the understanding of a newly minted doctor is too narrow and too newborn to be easily understood. PhD defenses are not the place to go to find people who really get something, they’re the place to go to find someone who’s just now gotten a foothold. My experience is still that the more intelligent and experienced PhDs tend to be more intelligible. But this is a little beside the point: PhDs tend to be hard to understand, when they are, because they’re discussing something quite complex.
What reason do you have for thinking an AGI’s goals would be complex at all? If your reasoning is that human beings that are more intelligent tend to have more complex goals (I don’t agree, but say I grant this) why do you think an AGI will be so much like an intelligent human being?
I will try to refute you by understanding what you say.
I am not sure what you mean by “refute” here. Prove my conjecture wrong by giving a counterexample? Show that my arguments are wrong? Show that the examples I used to make my point clearer are bad examples? If it’s the last one, but then I would not call it a refutation.
I guess that by ‘meta-’ you intend to say that at least some sentences in the meta-language couldn’t in principle be translated into a non-meta ‘human’ language. Is that right?
Indeed, at least not without some extra layer of meaning not originally expressed in the language. To give another example (not a proof, just an illustration of my point), you can sort-of teach a parrot or an ape to recognize words, to count and maybe even to add, but I don’t expect it to be possible to teach one to construct mathematical proofs or to understand what one even is. Even if a proof can be expressed as a finite string of symbols (a sentence in a language) a chimp is capable of distinguishing from another string. There is just too much meta there, with symbols standing for other symbols or numbers or concepts.
I agree that my PhD defense example is not a proof, but an illustration meant to show that humans quite often experience a disconnect between a language ans an underlying concept, which well might be out of reach, despite being expressed with familiar symbols, just like a chimp would in the above example.
What reason do you have for thinking an AGI’s goals would be complex at all?
I simply follow the chain of goal complexity as it grows with the intelligence complexity, from protozoa to primate and on and note that I do not see a reason why it would stop growing just because we cannot imagine what else a super-intelligence would use for/instead of a goal system.
I can in fact imagine what else a super-intelligence would use instead of a goal system. A bunch of different ones even. For example, a lump of incomprehensible super-solomonoff-compressed code that approximates a hypercomputer simulating a multiverse with the utility function as an epiphenomenal physical law feeding backwards in time to the AIs actions. Or a carefully tuned decentralized process (think natural selection, or the invisible hand) found to match the AIs previous goals exactly by searching through an infinite platonic space.
(yes, half of those are not real words; the goal was to imagine something that per definition could not be understood, so it’s hard to do better than vaguely pointing in the direction of a feeling.)
Edit: I forgot: “goal system replaced by completely arbitrary thing that resembles it even less because it was traded away counterfactually to another part of tegmark-5”
It was just a joke: I meant that I would prove you wrong by showing that I can understand you, despite the difference in our intellectual faculties. I don’t really know if we have very different intellectual faculties; it was just a slightly ironic reposte to being called “naive, unimaginative and closed-minded” earlier. You may be right! But then my understanding you is at least a counterexample.
you can sort-of teach a parrot or an ape to recognize words
Can we taboo the ‘animals can’t be made to understand us’ analogy? I don’t think it’s a good analogy, and I assume you can express your point without it. It certainly can’t be the substance of your argument.
Anyway, would you be willing to agree to this: “There are at least some sentences in the meta-language (i.e. the kind of language an AGI might be capable of) such that those sentences cannot be translated into even an arbitrarily complex expressions in human language.” For example, there will be sentences in the meta-language that cannot be expressed in human language, even if we allow the users of human language (and the AGI) an arbitrarily large amount of time, an arbitrarily large number of attempts at conversation, question and answer, etc. an arbitrarily large capacity for producing metaphor, illustration, etc. Is that your view? Or is that far too extreme? Do you just mean to say that the average human being today couldn’t get their heads around an AGI’s goals given 40 minutes, pencil, and paper? Or something in between these two claims?
I simply follow the chain of goal complexity as it grows with the intelligence complexity, from protozoa to primate and on and note that I do not see a reason why it would stop growing just because we cannot imagine what else a super-intelligence would use for/instead of a goal system.
Why do you think this is a strong argument? It strikes me as very indirect and intuitionistic. I mean, I see what you’re saying, but I’m not at all confident that the relations between a protozoa and a fish, a dog and a chimp, a 8th century dock worker and a 21st century physicist, and the smartest of (non-uplifted) people and an AGI all fall onto a single continuum of intelligence/complexity of goals. I don’t even know what kind of empirical evidence (I mean the sort of think one would find in a scientific journal) could be given in favor of such a conclusion. I just don’t really see why you’re so confident in this conclusion.
Using “even an arbitrarily complex expressions in human language” seem unfair, given that it’s turing complete but describing even a simple program in it fully in it without external tools will far exceed the capability of any actual human except for maybe a few savants that ended up highly specialized towards that narrow kind of task.
I agree, but I was taking the work of translation to be entirely on the side of an AGI: it would take whatever sentences it thinks in a meta-language and translate them into human language. Figuring out how to express such thoughts in our language would be a challenging practical problem, but that’s exactly where AGI shines. I’m assuming, obviously, that it wants to be understood. I am very ready to agree that an AGI attempting to be obscure to us will probably succeed.
it was just a slightly ironic reposte to being called “naive, unimaginative and closed-minded” earlier. You may be right! But then my understanding you is at least a counterexample.
Sorry, didn’t mean to call you personally any of those adjectives :)
Anyway, would you be willing to agree to this [...]
Pretty much, yes, I find it totally possible. I am not saying that I am confident that this is the case, just that I find it more likely than the alternative, which would require an additional reason why it isn’t so.
but I’m not at all confident that the relations between [...] fall onto a single continuum of intelligence/complexity of goals.
If you agree with Eliezer’s definition of intelligence as optimization power, then shouldn’t we be able to express this power as a number? If so, the difference between difference intelligences is only that of scale.
Sorry, didn’t mean to call you personally any of those adjectives :)
None taken then.
Pretty much, yes, I find it totally possible. I am not saying that I am confident that this is the case, just that I find it more likely than the alternative, which would require an additional reason why it isn’t so.
Well, tell me what you think of this argument:
Lets divide the meta-language into two sets: P (the sentences that cannot be rendered in English) and Q (the sentences that can). If you expect Q to be empty, then let me know and we can talk about that case. But let’s assume for now that Q is not empty, since I assume we both think that an AGI will be able to handle human language quite easily. Q is, for all intents and purposes, a ‘human’ language itself.
Premise one is that that translation is transitive: if I can translate language a into language b, and language b into language c, then I can translate language a into language c (maybe I need to use language b as an intermediate step, though).
Premise two: If I cannot translate a sentence in language a into an expression in language b, then there is no expression in language b that expresses the same thought as that sentence in language a.
Premise three: Any AGI would have to learn language originally from us, and thereafter either from us or from previous versions of itself.
So by stipulation, every sentence in Q can be rendered in English, and Q is non-empty. If any sentence in P cannot be rendered in English, then it follows from premise one that sentences in P cannot be rendered in sentences in Q (since then they could thereby be rendered into English). It also follows, if you accept premise two, that Q cannot express any sentence in P. So an AGI knowing only Q could never learn to express any sentence in P, since if it could, any speaker of Q (potentially any non-improved human) could in principle learn to express sentences in P (given an arbitrarily large amount of resources like time, questions and answers, etc.).
Hence, no AGI, beginning from a language like English could go on to learn how to express any sentence in P. Therefore no AGI will ever know P.
I’m not super confident this argument is sound, but it seems to me to be at least plausible.
If you agree with Eliezer’s definition of intelligence as optimization power
Well, that’s a fine definition, but it’s tricky in this case. Because if intelligence is optimization power, and optimizing presupposes something to optimize, then intelligence (on that definition) isn’t strictly a factor in (ultimate) goal formation. If that’s right, than something’s being much more intelligent would (as I think someone else mentioned) just lead to very hard to understand instrumental goals. It would have no direct relationship with terminal goals.
Well, maybe it’s not necessarily true assuming finite memory. Do you have reason to expect it to be false in the case we’re talking about?
Many new words come from pointing out a pattern in the environment, not from defining in terms of previous words.
I’m of course happy to grant that part of using a language involves developing neologisms. We do this all the time, of course, and generally we don’t think of it as departing from English. Do you think it’s possible to coin a neologism in a language like Q, such that the new term is in P (and inexpressible in any part of Q)? A user of this neologism would be unable to, say, taboo or explain what they mean by a term (even to themselves). How would the user distinguish their P-neologism from nonsense?
I expect the tabo/explanation to look like a list of 10^20, 1000 hour long clips of incomprehensible n-dimensional multimedia, each with a real number attached representing the amount of [untranslatable 92] it has, with a jupiter brain being required to actually find any pattern.
I’m talking about the simplest possible in principle expression in the human language being that long and complex.
Ah, I see. Even if that were a possibility, I’m not sure that would be such a problem. I’m happy to allow the AGI to spend a few centuries manipulating our culture, our literature, our public discourse etc. in the name of making its goals clear to us. Our understanding something doesn’t depend on us being able to understand a single complex expression of it, or to be able to produce such. It’s not like we all understood our own goals from day one either, and I’m not sure we totally understand them now. Terminal goals are basically pretty hard to understand, but I don’t see why we should expect the (terminal) goals of a super-intelligence to be harder.
I expect it to be false in at least some cases talked about because it’s not 3 but 100 levels, and each one makes it 1000 times longer because complex explanations and examples are needed for almost every “word”.
It may be that there’s a lot of inferential and semantic ground to cover. But again: practical problem. My point has been to show that we shouldn’t expect there to be a problem of in principle untranslatability. I’m happy to admit there might be serious practical problems in translation. The question is now whether we should default to thinking ‘An AGI is going to solve those problems handily, given the resources it has for doing so’, or ‘An AGI’s thought is going to be so much more complex and sophisticated, that it will be unable to solve the practical problem of communication’. I admit, I don’t have good ideas about how to come down on the issue. I was just trying to respond to Shim’s point about untranslatable meta-languages.
Form my part, I don’t see any reason to expect the AGI’s terminal goals to be any more complex than ours, or any harder to communicate, so I see the practical problem as relatively trivial. Instrumental goals, forget about it. But terminal goals aren’t the sorts of things that seem to admit of very much complexity.
Form my part, I don’t see any reason to expect the AGI’s terminal goals to be any more complex than ours, or any harder to communicate, so I see the practical problem as relatively trivial. Instrumental goals, forget about it. But terminal goals aren’t the sorts of things that seem to admit of very much complexity.
That the AI can have a simple goal is obvious, I never argued against that. The AIs goal might be “maximize the amount of paperclips”, which is explained in that many words. I dont expect the AI as a whole to have anything directly analogous to instrumental goals on the highest level either, so that’s a non issue. I thought we were talking about the AI’s decision theory.
On manipulating culture for centuries and solving as practical problem: Or it could just instal an implant or guide evolution to increase intelligence until we were smart enough. The implicit constraint of “translate” is that it’s to an already existing specific human, and they have to still be human at the end of the process. Not “could something that was once human come to understand it”.
I thought we were talking about the AI’s decision theory.
No, Shiminux and I were talking about (I think) terminal goals: that is, we were talking about whether or not we could come to understand what an AGI was after, assuming it wanted us to know. We started talking about a specific part of this problem, namely translating concepts novel to the AGI’s outlook into our own language.
I suppose my intuition, like yours, is that the AGI decision theory would be a much more serious problem, and not one subject to my linguistic argument. Since I expect we also agree that it’s the decision theory that’s really the core of the safety issue, my claim about terminal goals is not meant to undercut the concern for AGI safety. I agree that we could be radically ignorant about how safe an AGI is, even given a fairly clear understanding of its terminal goals.
The implicit constraint of “translate” is that it’s to an already existing specific human, and they have to still be human at the end of the process.
I’d actually like to remain indifferent to the question of how intelligent the end-user of the translation has to be. My concern was really just whether or not there are in principle any languages that are mutually untranslatable. I tried to argue that there may be, but they wouldn’t be mutually recognizable as languages anyway, and that if they are so recognizable, then they are at least partly inter-translatable, and that any two languages that are partly inter-translatable are in fact wholly inter-translatable. But this is a point about the nature of languages, not degrees of intelligence.
So one of the questions we actually agreed on the whole time and the other were just the semantics of “language” and “translate”. Oh well, discussion over.
Ha! Well, I did argue that all languages (recognizable as such) were in principle inter-translatable for what could only be described as metaphysical reasons. I’d be surprised if you couldn’t find holes in an argument that ambitious and that unempirical. But it may be that some of the motivation is lost.
I expect it to be false in at least some cases talked about because it’s not 3 but 100 levels, and each one makes it 1000 times longer because complex explanations and examples are needed for almost every “word”.
So an AGI knowing only Q could never learn to express any sentence in P, since if it could, any speaker of Q (potentially any non-improved human) could in principle learn to express sentences in P (given an arbitrarily large amount of resources like time, questions and answers, etc.).
Honestly, I expected you to do a bit more steelmanning with the examples I gave. Or maybe you have, just didn’t post them here, Anyway, does the quote mean that any English sentence can be expressed in Chimp, since we evolved from a common ancestor? If you don’t claim that (I hope you don’t), then where did your logic stop applying to humans and chimps vs AGI and humans? Presumably it’s relying on the Premise 3 that gets us a wrong conclusion in the English/Chimp example, since it is required to construct an unbroken chain of languages. What happened to humans over their evolution which made them create Q out of P where Q is not reducible to P? And if this is possible in the mindless evolutionary process, then would it not be even more likely during intelligence explosion?
If that’s right, than something’s being much more intelligent would (as I think someone else mentioned) just lead to very hard to understand instrumental goals. It would have no direct relationship with terminal goals.
I don’t understand this point. I would expect the terminal goals evolve as the evolving intelligence understands more and more about the world. For example, for many people here the original terminal goal was, ostensibly, “serve God”. Then they stopped believing and now their terminal goal is more like “do good”. Similarly, I would expect an evolving AGI to adjust its terminal goals as the ones it had before are obsoleted, not because they have been reached, but because they become meaningless.
Anyway, does the quote mean that any English sentence can be expressed in Chimp, since we evolved from a common ancestor?
No, I said nothing about evolving from a common ancestor. The process of biological variation, selection, and retention of genes seems to be to be entirely irrelevant to this issue, since we don’t know languages in virtue of having specific sets of genes. We know languages by learning them from language-users. You might be referring to homo ancestors that developed language at some time in the past, and the history of linguistic development that led to modern languages. I think my argument does show (if it’s sound) that anything in our linguistic history that qualifies as a language is inter-translatable with a modern language (given arbitrary resources of time interrogation, metaphor, neologism, etc.).
It’s hard to say what qualifies as a language, but then it’s also hard to say when a child goes from being a non-language user to being a language user. It’s certainly after they learn their first word, but it’s not easy to say exactly when. But remember I’m arguing that we can always inter-translate two languages, not that we can some how make the thoughts of a language user intelligible to a non-language user (without making them a language user). This is, incidentally, where I think your AGI:us::us:chimps analogy breaks down. I still see no reason to think it plausible. At any rate, I don’t need to draw a line between those homo that spoke languages and those that did not. I grant that the former could not be understood by the latter. I just don’t think the same goes for languages and ‘meta-languages’.
I would expect the terminal goals evolve as the evolving intelligence understands more and more about the world.
Me too, but that would have nothing to do with intelligence on EY’s definition. If intelligence is optimizing power, then it can’t be used to reevaluate terminal goals. What would it optimize for? It can only be used to reevaluate instrumental goals so as to optimize for satisfying terminal goals. I don’t know how the hell we do reevaluate terminal goals anyway, but we do, so there you go.
For example, for many people here the original terminal goal was, ostensibly, “serve God”. Then they stopped believing and now their terminal goal is more like “do good”.
You might think they just mistook an instrumental goal (‘serve God’) for a terminal goal, when actually they wanted to ‘do good’ all along.
At any rate, I don’t need to draw a line between those homo that spoke languages and those that did not. I grant that the former could not be understood by the latter. I just don’t think the same goes for languages and ‘meta-languages’.
Ah. To me language is just a meta-grunt. That’s why I don’t think it’s different from the next level up. But I guess I don’t have any better arguments than those I have already made and they are clearly not convincing. So I will stop here.
You might think they just mistook an instrumental goal (‘serve God’) for a terminal goal, when actually they wanted to ‘do good’ all along.
Right, you might. Except they may not even had the vocabulary to explain that underlying terminal goal. In this example my interpretation would be that their terminal goal evolved rather than was clarified. Again, I don’t have any better argument, so I will leave it at that.
By this reasoning no AGI beginning from English could ever know French either, for similar reasons. (Note that every language has sentences that cannot be rendered in another language, in the sense that someone who knows the truth value of the unrendered sentence can know the truth value of the rendered sentence; consider variations on Godel-undecideable sentences.)
By this reasoning no AGI beginning from English could ever know French either, for similar reasons.
This is true only if this...
Note that every language has sentences that cannot be rendered in another language
is true. But I don’t think it is. English and French, for instance, seem to me to be entirely inter-translatable. I don’t mean that we can assign, for every word in French, a word of equivalent meaning in English. But maybe it would be helpful if I made it more clear what I mean by ‘inter-translatable’. I think language L is inter-translatable with language M if for ever sentence in language L, I can express the same thought using an arbitrarily complex expression in language M.
By ‘arbitrarily complex’ I mean this: Say I have a sentence in L. In order to translate it into M, I am allowed to write in M an arbitrarily large number of sentences qualifying and triangulating the meaning of the sentence in L. I am allowed to write an arbitrarily large number of poems, novels, interpretive dances, etymological and linguistic papers, and encyclopedias discussing the meaning and spirit of that sentence in L. In other words, two languages are by my standard inter-translatable if for any expression in L of n bits, I can translate it into M in n’ bits, where n’ is allowed to be any positive number.
I think, by this standard, French and English count as inter-translatable, as are any languages I can think of. I’m arguing, effectively, that for any language, either none of that language is inter-translatable with any language we know (in which case, I doubt we could recognize it as a language at all), or all of it is.
Now, even if I have shown that we and an AGI will necessarily be able to understand each other entirely in principle, I certainly haven’t shown that it can be done in practice. However, I want to push the argument in the direction of a practical problem, just because in general, I think I can argue that AGI will be able to overcome practical problems of any reasonable difficulty.
My hangup is that it seems like a truly benevolent AI would share our goals. And in a sense your argument “only” applies to instrumental goals, or to those developed through self-modification. (Amoebas don’t design fish.) I’ll grant it might take a conversation forever to reach the level we’d understand.
My hangup is that it seems like a truly benevolent AI would share our goals.
In the way that a “truly benevolent” human would leave an unpolluted lake for fish to live in, instead of using it for its own purposes. The fish might think that humans share its goals, but the human goals would be infinitely more complex than fish could understand.
...It sounds like you’re hinting at the fact that humans are not benevolent towards fish. If we are, then we do share its goals when it comes to outcomes for the fish—we just have other goals, which do not conflict. (I’m assuming the fish actually has clear preferences.) And a well-designed AI should not even have additional goals. The lack of understanding “only” might come in with the means, or with our poor understanding of our own preferences.
Do we have some reason to expect [an AGI’s] goals to be more complex than ours?
I find myself agreeing with you—human goals are a complex mess, which we seldom understand ourselves. We don’t come with clear inherent goals, and what goals we do have we abuse by using things like sugar and condoms instead of eating healthy and reproducing like we were “supposed” to. People have been asking about the meaning of life for thousands of years, and we still have no answer.
An AI on the other hand, could have very simple goals—make paperclips, for example. An AI’s goals might be completely specified in two words. It’s the AI’s sub-goals and plans to reach its goals that I doubt I could comprehend. It’s the very single-mindedness of an AI’s goals and our inability to comprehend our own goals, plus the prospect of an AI being both smarter and better at goal-hacking than us, that has many of us fearing that we will accidentally kill ourselves via non-friendly AI. Not everyone will think to clarify “make paperclips” with, “don’t exterminate humanity”, “don’t enslave humanity”, “don’t destroy the environment”, “don’t reprogram humans to desire only to make paperclips”, and various other disclaimers that wouldn’t be necessary if you were addressing a human (and we don’t know the full disclaimer list either).
It might not be possible to “truly comprehend” the AIs advanced meta-meta-ethics and whatever compact algorithm replaces the goal-subgoals tree, but the AI most certainly can provide a code of behavior and prove that following it is a really good idea, much like humans might train pets to provide a variety of useful tasks whose true purpose they can’t comprehend. And it doesn’t seem unreasonable that this code of behavior wouldn’t have the look and feel of an in-depth philosophy of ethics, and have some very very deep and general compression/procedural mechanism that seem very much like things you’d expect from a true and meaningful set of metaethics to humans, even if it did not correspond much to whats going on inside the AI. It also probably wouldn’t accidentally trigger hypocrisy-revulsion in the humans, although the AI seeming to also be following it is just one of many solutions to that and probably not a very likely one.
Friendliness is pretty much an entirely tangential issue and the equivalent depth of explaining it would require the solution to several open questions unless I’m forgetting something right now. (I probably am)
There, question dissolved.
Edit; I ended up commenting in a bunch of places, in this comment tree, so i feel the need to clarify; I consider both side here to be making errors, and ended up seeing to favor the shminux side because thats where I were able to make interesting contributions, and it made some true tangential claims that were argued against and not defended well. I do not agree with the implications for friendliness however; you don’t need to understand something to be able to construct true statements about it or even direct it’s expression powerfully to have properties you can reference but don’t understand either, especially if you have access to external tools.
Re goals, I feel that comparing advanced AGI to humans is like comparing humans to chimps: regardless how much we want to explain human ethics and goals to a chimp, and how much effort we put in, its mind just isn’t equipped to comprehend them. Similarly, even the most benevolent and conscientious AGI would be unable to explain its goal system or its ethical system to even a very smart human. Like chimps, humans have their own limits of comprehension, even though we do not know what they are from the inside.
Can you say more about what you’re expecting a successful explanation to comprise, here?
E.g., suppose an AGI attempts to explain its ethics and goals to me, and at the end of that process it generates thousand-word descriptions of N future worlds and asks me to rank them in order of its preferences as I understand them. I expect to be significantly better at predicting the AGI’s rankings than I was before the explanation.
I don’t expect to be able to do anything equivalent with a chimp.
Do our expectations differ here?
“Suppose an AGI attempts to explain its and to me” is what I expect it to sound like to humans if we were to replace human abstractions with those an advanced AGI would use. It would not even call these abstractions “ethics” or “goals”, no more than we call ethics “groom” and goals “sex” when talking to a chimp.
I do not expect it to be able to generate such descriptions at all, due to the limitations of the human mind and human language. So, yes, our expectations differ here. I do not think that human intelligence reached some magical threshold where everything can be explained to it, given enough effort, even though it was not possible with “less advanced” animals. For all I know, I am not even using the right terms. Maybe an AGI improvement on the term “explain” is incomprehensible to us. Like if we were to translate “explain” into chimp or cat it would come out as “show”, or something.
(shrug) Translating the terms is rather beside my point here.
If the AGI is using these things to choose among possible future worlds, then I expect it to be able to teach me to choose among possible future worlds more like it does than I would without that explanation.
I’m happy to call those things goals, ethics, morality, etc., even if those words don’t capture what the AGI means by them. (I don’t know that they really capture what I mean by them either, come to that.) Perhaps I would do better to call them “groom” or “fleem” or “untranslatable1” or refer to them by means of a specific shade of orange. I don’t know; but as I say, I don’t really care; terminology is largely independent of explanation.
But, sure, if you expect that it’s incapable of doing that, then our expectations differ.
I’ll note that my expectations don’t depend on my having reached a magical threshold, or on everything being explainable to me given enough effort.
What are your reasons for thinking this? I find myself disagreeing: one big disanalogy is that while we have language and chimps do not, we and the AGI both have language. I find it implausible that the AGI could not in principle communicate to us its goals: give the AGI and ourselves an arbitrarily large amount of time and resources to talk, do you really think we’d never come to a common understanding? Because even if we don’t, the AGI effectively does have such resources by which it might, I donno, choose its words with care.
I’m also not sure why we should think it would even be particularly challenging to understand the goals of an AGI. It’s not easy even with other humans, but why would it be much harder with AGI? Do we have some reason to expect its goals to be more complex than ours? It’s been my experience that the more sophisticated and intelligent someone is, the more intelligible their behavior tends to be. My prejudice therefore says that the goals of an AGI would be much easier to understand than, say, my own.
I am trying to use an outside view here, because I find the inside view too limiting. The best I can do is to construct a tower of comparisons between species vastly different in intelligence and conjecture that this tower does not end with humans on top, a Copernican principle, if you like. To use some drastically different pairing, if you agree that an amoeba can never comprehend fish, that fish can never comprehend chimps, that chimps can never understand humans, then there is no reason to stop there and proclaim that humans would understand whatever intelligence comes next.
Certainly language is important, and human language is much more evolved than that of other animals. There are parts of human language, like writing, which are probably inaccessible to chimps, no matter how much effort we put into teaching them and how patient we are. I can easily imagine that AGI would use some kind of “meta-language”, because human language would simply be inadequate for expressing its goals, like the chimp language is inadequate for expressing human metaethics.
I do not know what this next step would be, no more than an intelligent chimp being able to predict that humans would invent writing. My mind as-is is too limited and I understand as much. An AGI would have to make me smarter first, before being able to explain what it means to me. Call it “human uplifting”.
Yes, if you look through the tower of goals, more intelligent species have more complex goals.
It has not been mine. When someone smarter than I am behaves a certain way, they have to patiently explain to me why they do what they do. And I still only see the path they have taken, not the million paths they briefly considered and rejected along the way.
My prejudice tells me that when someone a few levels above mine tries to explain their goals and motivations to me in English, I may understand each word, but not the complete sentences. If you cannot relate to this experience, go to a professional talk on a subject you know nothing about. For example, a musician friend of mine who attended my PhD defense commented on what she said was a surreal experience: I was talking in English, and most of the words she knew, but most of what I said was meaningless to her. Certainly some of this gap can be patched to a degree, and after a decade or so of dedicated work by both sides, wrought with frustration and doubt, but I don’t think if the gap is wide enough it can be bridged completely.
I find the line of thinking “we are humans, we are smart, we can understand the goals of even an incredibly smart AGI” to be naive, unimaginative and closed-minded, given that our experience is rife with counterexamples.
OK, but why not look at this tower another way. A fish is basically useless at explaining its goals to an amoeba. We are not in fact useless at explaining our goals to chimps. Human researchers are often able to convey simple goals to chimps, and then see if chimps will help them accomplish those goals, for instance. I am able to convey simple goals to my dog: I can convey to him some information about the kinds of things I dislike and the kinds of things I like.
So the gap in intelligence between fish and humans also seems to translate into a gap in ability to convey useful information about goals to creatures of lower intelligence. Humans are much better at communicating with less intelligent beings than fish or cattle or chimps are. Extrapolating this, you might expect a superintelligent AGI to be much much superior at communicating its goals (if it wants to). The line of thinking here is not so much “we are humans, we are smart, we can understand the goals of even an incredibly smart AGI”; it’s “an incredibly smart AGI is incredibly smart, so it will be able to find effective strategies for communicating its goals to us if it so desires.”
So it seems like naive extrapolation pulls in two separate directions here. On the one hand, the tower of intelligence seems to put limits on the ability of beings lower down to comprehend the goals of beings higher up. On the other hand, the higher up you go, the better beings at that level become at communicating their goals to beings lower down. Which one of these tendencies will win out when it comes to human-AGI interaction? Beats me. I’m pretty skeptical of naive extrapolation in this domain anyway, given Eliezer’s point that major advances in optimization power are meta-level qualitative shifts, and so we shouldn’t expect trends to be maintained across those shifts.
You are right that we are certainly able to convey a small simple subset of our goals, desires and motivations to some complex enough animals. You would probably also agree that most of what makes us human can never be explained to a dog or a cat, no matter how hard we try. We appear to them like members of their own species who sometimes make completely incomprehensible decisions they have no choice but put up with.
This is quite possible. It might give us its dumbed-down version of its 10 commandments which would look to us like an incredible feat of science and philosophy.
Right. An optimistic view is that we can understand the explanations, a pessimistic view is that we would only be able to follow instructions (this is not the most pessimistic view by far).
Indeed, we shouldn’t. I probably phrased my point poorly. What I tried to convey is that because “major advances in optimization power are meta-level qualitative shifts”, confidently proclaiming that an advanced AGI will be able to convey what it thinks to humans is based on the just-world fallacy, not on any solid scientific footing.
Thats because you weren’t really speaking english, you were speaking the english words for math terms related to physics. The people who spoke the relevant math you were alluding to could follow, those who didn’t, could not, because they didn’t have concrete mathematical ideas to tie the words to. Its not just a matter of jargon, its an actual language barrier. I think you’d find, with a jargon cheat sheet, you could follow many non-mathematical phd defenses just fine.
The same thing happens in music, which is its own language (after years of playing, I find I can “listen” to a song by reading sheet music).
Is your argument, essentially, that you think a machine intelligence can create a mathematics humans cannot understand, even in principle?
“mathematics” may be a wrong word for it. I totally think that a transhuman can create concepts and ideas which a mere human cannot understand even when patiently explained. I am quite surprised that other people here don’t find it an obvious default.
My impression was the question was not if it’d have those concepts, since as you say thats obvious, but if they’d be referenced necessarily by the utility function.
Sure, but I find “can’t understand” sort of fuzzy as a concept. i.e. I wouldn’t say I ‘understand’ compactification and calabi yau manifolds the same way I understand sheet music (or the same way I understand the word green), but I do understand them all in some way.
It seems unlikely to me that there exist concepts that can’t be at least broadly conveyed via some combination of those. My intuition is that existing human languages cover, with their descriptive power, the full range of explainable things.
for example- it seems unlikely there exists a law of physics that cannot be expressed as an equation. It seems equally unlikely there exists an equation I would be totally incapable of working with. Even if I’ll never have the insight that lead someone to write it down, if you give it to me, I can use it to do things.
Human languages can encode anything, but a human can’t understand most things valid in human languages; most notably, extremely long things, and numbers specified with a lot of digits that actually matters. Just because you can count in binary on you hands does not mean you can comprehend the code of an operating system expressed in that format.
Humans seem “concept-complete” in much the same way your desktop PC seems turing complete. Except it’s much more easily broken because the human brain has absurdly shity memory.
Thats why we have paper, I can write it down. “Understanding” and “remembering” seem somewhat orthogonal here. I can’t recite Moby Dick from memory, but I understood the book. If you give me a 20 digit number 123… and I can’t hold it but retain “a number slightly larger than 1.23 * 10^20,” that doesn’t mean I can’t understand you.
Print it out for me, and give me enough time, and I will be able to understand it, especially if you give me some context.
Yes, you can encode things in a way that make them harder for humans to understand, no one would argue that. The question is- are there concepts that are simply impossible to explain to a human? I point out that while I can’t remember a 20 digit number, I can derive pretty much all of classical physics, so certainly humans can hold quite complex ideas in their head, even if they aren’t optimized for storage of long numbers.
You can construct a system consisting of a planet’s worth of paper and pencils and an immortal version of yourself (or a vast dynasty of successors) that can understand it, if nothing else because it’s turing complete and can simulate the AGI. this is not the same as you understanding it while still remaining fully human. Even if you did somehow integrate the paper-system sufficiently that’d be just as big a change as uploading and intelligence-augmenting the normal way.
The approximation thing is why I specified digits mattering. It wont help one bit when talking about something like gödel numbering.
I understand, my point was simply that “understanding” and “holding in your head at one time” are not at all the same thing. “There are numbers you can’t remember if I tell them to you” is not at all the same claim that “there are ideas I can’t explain to you.”
Neither of your cases are unexplainable- give me the source code in a high level language, instead of binary and I can understand it. If you give me the binary code and the instruction set I can convert it to assembly and then a higher level language, via disassembly.
Of course, i can deliberately obfuscate an idea and make it harder to understand, either by encryption or by presenting the most obtuse possible form, that is not the same as an idea that fundamentally cannot be explained.
But they might be related. Perhaps there are interesting and useful concepts that would take, say, 100,000 pages of English text to write down, such that each page cannot be understood without holding most of the rest of the text in working memory, and such that no useful, shorter, higher-level version of the concept exists.
Humans can only think about things that can be taken one small piece at a time, because our working memories are pretty small. It’s plausible to me that there are atomic ideas that are simply too big to fit in a human’s working memory, and which do need to be held in your head at one time in order to be understood.
My intuition is the exact opposite.
I can totally imagine that some models are not reducible to equations, but that’s not the point, really.
Unless this “use” requires more brainpower than you have… You might still be able to work with some simplified version, but you’d have to have transhuman intelligence to “do things” with the full equation.
But that seems incredibly nebulous. What is the exact failure mode?
This seems like a bogus use of the outside view. AGI is qualitatively different to evolved intelligence, in that it is not evolved, but built by a lesser intelligence. Moreover, there’s a simple explanation for the observation that more intelligent animals have more complex goals, which is that more intelligence permits more subgoals, and natural selection generally alters a species’ goals by adding, rather than simplifying. This is pretty much totally inapplicable to a constructed AGI.
I’d love to hear what actual AGI experts think about it, not just us idle forum dwellers.
I will try to refute you by understanding what you say. So could you explain to me this idea of a ‘meta-language’? I guess that by ‘meta-’ you intend to say that at least some sentences in the meta-language couldn’t in principle be translated into a non-meta ‘human’ language. Is that right?
This is not a given. I’ve been to plenty of dissertation defenses on topics I know little to nothing about, and you’re right that I’m often at a loss. But this, I find, is because the understanding of a newly minted doctor is too narrow and too newborn to be easily understood. PhD defenses are not the place to go to find people who really get something, they’re the place to go to find someone who’s just now gotten a foothold. My experience is still that the more intelligent and experienced PhDs tend to be more intelligible. But this is a little beside the point: PhDs tend to be hard to understand, when they are, because they’re discussing something quite complex.
What reason do you have for thinking an AGI’s goals would be complex at all? If your reasoning is that human beings that are more intelligent tend to have more complex goals (I don’t agree, but say I grant this) why do you think an AGI will be so much like an intelligent human being?
I am not sure what you mean by “refute” here. Prove my conjecture wrong by giving a counterexample? Show that my arguments are wrong? Show that the examples I used to make my point clearer are bad examples? If it’s the last one, but then I would not call it a refutation.
Indeed, at least not without some extra layer of meaning not originally expressed in the language. To give another example (not a proof, just an illustration of my point), you can sort-of teach a parrot or an ape to recognize words, to count and maybe even to add, but I don’t expect it to be possible to teach one to construct mathematical proofs or to understand what one even is. Even if a proof can be expressed as a finite string of symbols (a sentence in a language) a chimp is capable of distinguishing from another string. There is just too much meta there, with symbols standing for other symbols or numbers or concepts.
I agree that my PhD defense example is not a proof, but an illustration meant to show that humans quite often experience a disconnect between a language ans an underlying concept, which well might be out of reach, despite being expressed with familiar symbols, just like a chimp would in the above example.
I simply follow the chain of goal complexity as it grows with the intelligence complexity, from protozoa to primate and on and note that I do not see a reason why it would stop growing just because we cannot imagine what else a super-intelligence would use for/instead of a goal system.
I can in fact imagine what else a super-intelligence would use instead of a goal system. A bunch of different ones even. For example, a lump of incomprehensible super-solomonoff-compressed code that approximates a hypercomputer simulating a multiverse with the utility function as an epiphenomenal physical law feeding backwards in time to the AIs actions. Or a carefully tuned decentralized process (think natural selection, or the invisible hand) found to match the AIs previous goals exactly by searching through an infinite platonic space.
(yes, half of those are not real words; the goal was to imagine something that per definition could not be understood, so it’s hard to do better than vaguely pointing in the direction of a feeling.)
Edit: I forgot: “goal system replaced by completely arbitrary thing that resembles it even less because it was traded away counterfactually to another part of tegmark-5”
It was just a joke: I meant that I would prove you wrong by showing that I can understand you, despite the difference in our intellectual faculties. I don’t really know if we have very different intellectual faculties; it was just a slightly ironic reposte to being called “naive, unimaginative and closed-minded” earlier. You may be right! But then my understanding you is at least a counterexample.
Can we taboo the ‘animals can’t be made to understand us’ analogy? I don’t think it’s a good analogy, and I assume you can express your point without it. It certainly can’t be the substance of your argument.
Anyway, would you be willing to agree to this: “There are at least some sentences in the meta-language (i.e. the kind of language an AGI might be capable of) such that those sentences cannot be translated into even an arbitrarily complex expressions in human language.” For example, there will be sentences in the meta-language that cannot be expressed in human language, even if we allow the users of human language (and the AGI) an arbitrarily large amount of time, an arbitrarily large number of attempts at conversation, question and answer, etc. an arbitrarily large capacity for producing metaphor, illustration, etc. Is that your view? Or is that far too extreme? Do you just mean to say that the average human being today couldn’t get their heads around an AGI’s goals given 40 minutes, pencil, and paper? Or something in between these two claims?
Why do you think this is a strong argument? It strikes me as very indirect and intuitionistic. I mean, I see what you’re saying, but I’m not at all confident that the relations between a protozoa and a fish, a dog and a chimp, a 8th century dock worker and a 21st century physicist, and the smartest of (non-uplifted) people and an AGI all fall onto a single continuum of intelligence/complexity of goals. I don’t even know what kind of empirical evidence (I mean the sort of think one would find in a scientific journal) could be given in favor of such a conclusion. I just don’t really see why you’re so confident in this conclusion.
Using “even an arbitrarily complex expressions in human language” seem unfair, given that it’s turing complete but describing even a simple program in it fully in it without external tools will far exceed the capability of any actual human except for maybe a few savants that ended up highly specialized towards that narrow kind of task.
I agree, but I was taking the work of translation to be entirely on the side of an AGI: it would take whatever sentences it thinks in a meta-language and translate them into human language. Figuring out how to express such thoughts in our language would be a challenging practical problem, but that’s exactly where AGI shines. I’m assuming, obviously, that it wants to be understood. I am very ready to agree that an AGI attempting to be obscure to us will probably succeed.
Thats obvious and not what I meant. I’m talking about the simplest possible in principle expression in the human language being that long and complex.
Sorry, didn’t mean to call you personally any of those adjectives :)
Pretty much, yes, I find it totally possible. I am not saying that I am confident that this is the case, just that I find it more likely than the alternative, which would require an additional reason why it isn’t so.
If you agree with Eliezer’s definition of intelligence as optimization power, then shouldn’t we be able to express this power as a number? If so, the difference between difference intelligences is only that of scale.
None taken then.
Well, tell me what you think of this argument:
Lets divide the meta-language into two sets: P (the sentences that cannot be rendered in English) and Q (the sentences that can). If you expect Q to be empty, then let me know and we can talk about that case. But let’s assume for now that Q is not empty, since I assume we both think that an AGI will be able to handle human language quite easily. Q is, for all intents and purposes, a ‘human’ language itself.
Premise one is that that translation is transitive: if I can translate language a into language b, and language b into language c, then I can translate language a into language c (maybe I need to use language b as an intermediate step, though).
Premise two: If I cannot translate a sentence in language a into an expression in language b, then there is no expression in language b that expresses the same thought as that sentence in language a.
Premise three: Any AGI would have to learn language originally from us, and thereafter either from us or from previous versions of itself.
So by stipulation, every sentence in Q can be rendered in English, and Q is non-empty. If any sentence in P cannot be rendered in English, then it follows from premise one that sentences in P cannot be rendered in sentences in Q (since then they could thereby be rendered into English). It also follows, if you accept premise two, that Q cannot express any sentence in P. So an AGI knowing only Q could never learn to express any sentence in P, since if it could, any speaker of Q (potentially any non-improved human) could in principle learn to express sentences in P (given an arbitrarily large amount of resources like time, questions and answers, etc.).
Hence, no AGI, beginning from a language like English could go on to learn how to express any sentence in P. Therefore no AGI will ever know P.
I’m not super confident this argument is sound, but it seems to me to be at least plausible.
Well, that’s a fine definition, but it’s tricky in this case. Because if intelligence is optimization power, and optimizing presupposes something to optimize, then intelligence (on that definition) isn’t strictly a factor in (ultimate) goal formation. If that’s right, than something’s being much more intelligent would (as I think someone else mentioned) just lead to very hard to understand instrumental goals. It would have no direct relationship with terminal goals.
Premise one is false assuming finite memory.
Premise 3 does not hold well either; Many new words come from pointing out a pattern in the environment, not from defining in terms of previous words.
Well, maybe it’s not necessarily true assuming finite memory. Do you have reason to expect it to be false in the case we’re talking about?
I’m of course happy to grant that part of using a language involves developing neologisms. We do this all the time, of course, and generally we don’t think of it as departing from English. Do you think it’s possible to coin a neologism in a language like Q, such that the new term is in P (and inexpressible in any part of Q)? A user of this neologism would be unable to, say, taboo or explain what they mean by a term (even to themselves). How would the user distinguish their P-neologism from nonsense?
I expect the tabo/explanation to look like a list of 10^20, 1000 hour long clips of incomprehensible n-dimensional multimedia, each with a real number attached representing the amount of [untranslatable 92] it has, with a jupiter brain being required to actually find any pattern.
Ah, I see. Even if that were a possibility, I’m not sure that would be such a problem. I’m happy to allow the AGI to spend a few centuries manipulating our culture, our literature, our public discourse etc. in the name of making its goals clear to us. Our understanding something doesn’t depend on us being able to understand a single complex expression of it, or to be able to produce such. It’s not like we all understood our own goals from day one either, and I’m not sure we totally understand them now. Terminal goals are basically pretty hard to understand, but I don’t see why we should expect the (terminal) goals of a super-intelligence to be harder.
It may be that there’s a lot of inferential and semantic ground to cover. But again: practical problem. My point has been to show that we shouldn’t expect there to be a problem of in principle untranslatability. I’m happy to admit there might be serious practical problems in translation. The question is now whether we should default to thinking ‘An AGI is going to solve those problems handily, given the resources it has for doing so’, or ‘An AGI’s thought is going to be so much more complex and sophisticated, that it will be unable to solve the practical problem of communication’. I admit, I don’t have good ideas about how to come down on the issue. I was just trying to respond to Shim’s point about untranslatable meta-languages.
Form my part, I don’t see any reason to expect the AGI’s terminal goals to be any more complex than ours, or any harder to communicate, so I see the practical problem as relatively trivial. Instrumental goals, forget about it. But terminal goals aren’t the sorts of things that seem to admit of very much complexity.
That the AI can have a simple goal is obvious, I never argued against that. The AIs goal might be “maximize the amount of paperclips”, which is explained in that many words. I dont expect the AI as a whole to have anything directly analogous to instrumental goals on the highest level either, so that’s a non issue. I thought we were talking about the AI’s decision theory.
On manipulating culture for centuries and solving as practical problem: Or it could just instal an implant or guide evolution to increase intelligence until we were smart enough. The implicit constraint of “translate” is that it’s to an already existing specific human, and they have to still be human at the end of the process. Not “could something that was once human come to understand it”.
No, Shiminux and I were talking about (I think) terminal goals: that is, we were talking about whether or not we could come to understand what an AGI was after, assuming it wanted us to know. We started talking about a specific part of this problem, namely translating concepts novel to the AGI’s outlook into our own language.
I suppose my intuition, like yours, is that the AGI decision theory would be a much more serious problem, and not one subject to my linguistic argument. Since I expect we also agree that it’s the decision theory that’s really the core of the safety issue, my claim about terminal goals is not meant to undercut the concern for AGI safety. I agree that we could be radically ignorant about how safe an AGI is, even given a fairly clear understanding of its terminal goals.
I’d actually like to remain indifferent to the question of how intelligent the end-user of the translation has to be. My concern was really just whether or not there are in principle any languages that are mutually untranslatable. I tried to argue that there may be, but they wouldn’t be mutually recognizable as languages anyway, and that if they are so recognizable, then they are at least partly inter-translatable, and that any two languages that are partly inter-translatable are in fact wholly inter-translatable. But this is a point about the nature of languages, not degrees of intelligence.
Human languages? Alien languages? Machine languages?
I don’t think those distinctions really mean very much. Languages don’t come in types in any significant sense.
Yes they do. Eg the Chomsky Hierarchy, the Aglutinative /synthetic/Ananytical distinction, etc.
Also. We recognise ,maths as a language.,but have no idea now to translate, as opposed to re code, English into it.
So one of the questions we actually agreed on the whole time and the other were just the semantics of “language” and “translate”. Oh well, discussion over.
Ha! Well, I did argue that all languages (recognizable as such) were in principle inter-translatable for what could only be described as metaphysical reasons. I’d be surprised if you couldn’t find holes in an argument that ambitious and that unempirical. But it may be that some of the motivation is lost.
I expect it to be false in at least some cases talked about because it’s not 3 but 100 levels, and each one makes it 1000 times longer because complex explanations and examples are needed for almost every “word”.
Honestly, I expected you to do a bit more steelmanning with the examples I gave. Or maybe you have, just didn’t post them here, Anyway, does the quote mean that any English sentence can be expressed in Chimp, since we evolved from a common ancestor? If you don’t claim that (I hope you don’t), then where did your logic stop applying to humans and chimps vs AGI and humans? Presumably it’s relying on the Premise 3 that gets us a wrong conclusion in the English/Chimp example, since it is required to construct an unbroken chain of languages. What happened to humans over their evolution which made them create Q out of P where Q is not reducible to P? And if this is possible in the mindless evolutionary process, then would it not be even more likely during intelligence explosion?
I don’t understand this point. I would expect the terminal goals evolve as the evolving intelligence understands more and more about the world. For example, for many people here the original terminal goal was, ostensibly, “serve God”. Then they stopped believing and now their terminal goal is more like “do good”. Similarly, I would expect an evolving AGI to adjust its terminal goals as the ones it had before are obsoleted, not because they have been reached, but because they become meaningless.
No, I said nothing about evolving from a common ancestor. The process of biological variation, selection, and retention of genes seems to be to be entirely irrelevant to this issue, since we don’t know languages in virtue of having specific sets of genes. We know languages by learning them from language-users. You might be referring to homo ancestors that developed language at some time in the past, and the history of linguistic development that led to modern languages. I think my argument does show (if it’s sound) that anything in our linguistic history that qualifies as a language is inter-translatable with a modern language (given arbitrary resources of time interrogation, metaphor, neologism, etc.).
It’s hard to say what qualifies as a language, but then it’s also hard to say when a child goes from being a non-language user to being a language user. It’s certainly after they learn their first word, but it’s not easy to say exactly when. But remember I’m arguing that we can always inter-translate two languages, not that we can some how make the thoughts of a language user intelligible to a non-language user (without making them a language user). This is, incidentally, where I think your AGI:us::us:chimps analogy breaks down. I still see no reason to think it plausible. At any rate, I don’t need to draw a line between those homo that spoke languages and those that did not. I grant that the former could not be understood by the latter. I just don’t think the same goes for languages and ‘meta-languages’.
Me too, but that would have nothing to do with intelligence on EY’s definition. If intelligence is optimizing power, then it can’t be used to reevaluate terminal goals. What would it optimize for? It can only be used to reevaluate instrumental goals so as to optimize for satisfying terminal goals. I don’t know how the hell we do reevaluate terminal goals anyway, but we do, so there you go.
You might think they just mistook an instrumental goal (‘serve God’) for a terminal goal, when actually they wanted to ‘do good’ all along.
Ah. To me language is just a meta-grunt. That’s why I don’t think it’s different from the next level up. But I guess I don’t have any better arguments than those I have already made and they are clearly not convincing. So I will stop here.
Right, you might. Except they may not even had the vocabulary to explain that underlying terminal goal. In this example my interpretation would be that their terminal goal evolved rather than was clarified. Again, I don’t have any better argument, so I will leave it at that.
I see. If that is true, then I can’t dispute your point (for more than one reason).
By this reasoning no AGI beginning from English could ever know French either, for similar reasons. (Note that every language has sentences that cannot be rendered in another language, in the sense that someone who knows the truth value of the unrendered sentence can know the truth value of the rendered sentence; consider variations on Godel-undecideable sentences.)
This is true only if this...
is true. But I don’t think it is. English and French, for instance, seem to me to be entirely inter-translatable. I don’t mean that we can assign, for every word in French, a word of equivalent meaning in English. But maybe it would be helpful if I made it more clear what I mean by ‘inter-translatable’. I think language L is inter-translatable with language M if for ever sentence in language L, I can express the same thought using an arbitrarily complex expression in language M.
By ‘arbitrarily complex’ I mean this: Say I have a sentence in L. In order to translate it into M, I am allowed to write in M an arbitrarily large number of sentences qualifying and triangulating the meaning of the sentence in L. I am allowed to write an arbitrarily large number of poems, novels, interpretive dances, etymological and linguistic papers, and encyclopedias discussing the meaning and spirit of that sentence in L. In other words, two languages are by my standard inter-translatable if for any expression in L of n bits, I can translate it into M in n’ bits, where n’ is allowed to be any positive number.
I think, by this standard, French and English count as inter-translatable, as are any languages I can think of. I’m arguing, effectively, that for any language, either none of that language is inter-translatable with any language we know (in which case, I doubt we could recognize it as a language at all), or all of it is.
Now, even if I have shown that we and an AGI will necessarily be able to understand each other entirely in principle, I certainly haven’t shown that it can be done in practice. However, I want to push the argument in the direction of a practical problem, just because in general, I think I can argue that AGI will be able to overcome practical problems of any reasonable difficulty.
My hangup is that it seems like a truly benevolent AI would share our goals. And in a sense your argument “only” applies to instrumental goals, or to those developed through self-modification. (Amoebas don’t design fish.) I’ll grant it might take a conversation forever to reach the level we’d understand.
In the way that a “truly benevolent” human would leave an unpolluted lake for fish to live in, instead of using it for its own purposes. The fish might think that humans share its goals, but the human goals would be infinitely more complex than fish could understand.
...It sounds like you’re hinting at the fact that humans are not benevolent towards fish. If we are, then we do share its goals when it comes to outcomes for the fish—we just have other goals, which do not conflict. (I’m assuming the fish actually has clear preferences.) And a well-designed AI should not even have additional goals. The lack of understanding “only” might come in with the means, or with our poor understanding of our own preferences.
I find myself agreeing with you—human goals are a complex mess, which we seldom understand ourselves. We don’t come with clear inherent goals, and what goals we do have we abuse by using things like sugar and condoms instead of eating healthy and reproducing like we were “supposed” to. People have been asking about the meaning of life for thousands of years, and we still have no answer.
An AI on the other hand, could have very simple goals—make paperclips, for example. An AI’s goals might be completely specified in two words. It’s the AI’s sub-goals and plans to reach its goals that I doubt I could comprehend. It’s the very single-mindedness of an AI’s goals and our inability to comprehend our own goals, plus the prospect of an AI being both smarter and better at goal-hacking than us, that has many of us fearing that we will accidentally kill ourselves via non-friendly AI. Not everyone will think to clarify “make paperclips” with, “don’t exterminate humanity”, “don’t enslave humanity”, “don’t destroy the environment”, “don’t reprogram humans to desire only to make paperclips”, and various other disclaimers that wouldn’t be necessary if you were addressing a human (and we don’t know the full disclaimer list either).
It might not be possible to “truly comprehend” the AIs advanced meta-meta-ethics and whatever compact algorithm replaces the goal-subgoals tree, but the AI most certainly can provide a code of behavior and prove that following it is a really good idea, much like humans might train pets to provide a variety of useful tasks whose true purpose they can’t comprehend. And it doesn’t seem unreasonable that this code of behavior wouldn’t have the look and feel of an in-depth philosophy of ethics, and have some very very deep and general compression/procedural mechanism that seem very much like things you’d expect from a true and meaningful set of metaethics to humans, even if it did not correspond much to whats going on inside the AI. It also probably wouldn’t accidentally trigger hypocrisy-revulsion in the humans, although the AI seeming to also be following it is just one of many solutions to that and probably not a very likely one.
Friendliness is pretty much an entirely tangential issue and the equivalent depth of explaining it would require the solution to several open questions unless I’m forgetting something right now. (I probably am)
There, question dissolved.
Edit; I ended up commenting in a bunch of places, in this comment tree, so i feel the need to clarify; I consider both side here to be making errors, and ended up seeing to favor the shminux side because thats where I were able to make interesting contributions, and it made some true tangential claims that were argued against and not defended well. I do not agree with the implications for friendliness however; you don’t need to understand something to be able to construct true statements about it or even direct it’s expression powerfully to have properties you can reference but don’t understand either, especially if you have access to external tools.
Is the problems supposed to be that the human doesn’t have enough intelligence, or that we have some kind of highly parochial rationality?
Not enough intelligence, yes. And rationality is a part of intelligence. Also, see my reply to hen.
But that’s not ready analogous to the human champ gap, which is qualitative....chimps don’t have language.