I am trying to use an outside view here, because I find the inside view too limiting. The best I can do is to construct a tower of comparisons between species vastly different in intelligence and conjecture that this tower does not end with humans on top, a Copernican principle, if you like. To use some drastically different pairing, if you agree that an amoeba can never comprehend fish, that fish can never comprehend chimps, that chimps can never understand humans, then there is no reason to stop there and proclaim that humans would understand whatever intelligence comes next.
OK, but why not look at this tower another way. A fish is basically useless at explaining its goals to an amoeba. We are not in fact useless at explaining our goals to chimps. Human researchers are often able to convey simple goals to chimps, and then see if chimps will help them accomplish those goals, for instance. I am able to convey simple goals to my dog: I can convey to him some information about the kinds of things I dislike and the kinds of things I like.
So the gap in intelligence between fish and humans also seems to translate into a gap in ability to convey useful information about goals to creatures of lower intelligence. Humans are much better at communicating with less intelligent beings than fish or cattle or chimps are. Extrapolating this, you might expect a superintelligent AGI to be much much superior at communicating its goals (if it wants to). The line of thinking here is not so much “we are humans, we are smart, we can understand the goals of even an incredibly smart AGI”; it’s “an incredibly smart AGI is incredibly smart, so it will be able to find effective strategies for communicating its goals to us if it so desires.”
So it seems like naive extrapolation pulls in two separate directions here. On the one hand, the tower of intelligence seems to put limits on the ability of beings lower down to comprehend the goals of beings higher up. On the other hand, the higher up you go, the better beings at that level become at communicating their goals to beings lower down. Which one of these tendencies will win out when it comes to human-AGI interaction? Beats me. I’m pretty skeptical of naive extrapolation in this domain anyway, given Eliezer’s point that major advances in optimization power are meta-level qualitative shifts, and so we shouldn’t expect trends to be maintained across those shifts.
Humans are much better at communicating with less intelligent beings than fish or cattle or chimps are.
You are right that we are certainly able to convey a small simple subset of our goals, desires and motivations to some complex enough animals. You would probably also agree that most of what makes us human can never be explained to a dog or a cat, no matter how hard we try. We appear to them like members of their own species who sometimes make completely incomprehensible decisions they have no choice but put up with.
“an incredibly smart AGI is incredibly smart, so it will be able to find effective strategies for communicating its goals to us if it so desires.”
This is quite possible. It might give us its dumbed-down version of its 10 commandments which would look to us like an incredible feat of science and philosophy.
Which one of these tendencies will win out when it comes to human-AGI interaction? Beats me.
Right. An optimistic view is that we can understand the explanations, a pessimistic view is that we would only be able to follow instructions (this is not the most pessimistic view by far).
I’m pretty skeptical of naive extrapolation in this domain anyway, given Eliezer’s point that major advances in optimization power are meta-level qualitative shifts, and so we shouldn’t expect trends to be maintained across those shifts.
Indeed, we shouldn’t. I probably phrased my point poorly. What I tried to convey is that because “major advances in optimization power are meta-level qualitative shifts”, confidently proclaiming that an advanced AGI will be able to convey what it thinks to humans is based on the just-world fallacy, not on any solid scientific footing.
OK, but why not look at this tower another way. A fish is basically useless at explaining its goals to an amoeba. We are not in fact useless at explaining our goals to chimps. Human researchers are often able to convey simple goals to chimps, and then see if chimps will help them accomplish those goals, for instance. I am able to convey simple goals to my dog: I can convey to him some information about the kinds of things I dislike and the kinds of things I like.
So the gap in intelligence between fish and humans also seems to translate into a gap in ability to convey useful information about goals to creatures of lower intelligence. Humans are much better at communicating with less intelligent beings than fish or cattle or chimps are. Extrapolating this, you might expect a superintelligent AGI to be much much superior at communicating its goals (if it wants to). The line of thinking here is not so much “we are humans, we are smart, we can understand the goals of even an incredibly smart AGI”; it’s “an incredibly smart AGI is incredibly smart, so it will be able to find effective strategies for communicating its goals to us if it so desires.”
So it seems like naive extrapolation pulls in two separate directions here. On the one hand, the tower of intelligence seems to put limits on the ability of beings lower down to comprehend the goals of beings higher up. On the other hand, the higher up you go, the better beings at that level become at communicating their goals to beings lower down. Which one of these tendencies will win out when it comes to human-AGI interaction? Beats me. I’m pretty skeptical of naive extrapolation in this domain anyway, given Eliezer’s point that major advances in optimization power are meta-level qualitative shifts, and so we shouldn’t expect trends to be maintained across those shifts.
You are right that we are certainly able to convey a small simple subset of our goals, desires and motivations to some complex enough animals. You would probably also agree that most of what makes us human can never be explained to a dog or a cat, no matter how hard we try. We appear to them like members of their own species who sometimes make completely incomprehensible decisions they have no choice but put up with.
This is quite possible. It might give us its dumbed-down version of its 10 commandments which would look to us like an incredible feat of science and philosophy.
Right. An optimistic view is that we can understand the explanations, a pessimistic view is that we would only be able to follow instructions (this is not the most pessimistic view by far).
Indeed, we shouldn’t. I probably phrased my point poorly. What I tried to convey is that because “major advances in optimization power are meta-level qualitative shifts”, confidently proclaiming that an advanced AGI will be able to convey what it thinks to humans is based on the just-world fallacy, not on any solid scientific footing.