If an AGI acts according to a rigid utility-functions, then what makes you think that it won’t try to interpret any vagueness in a way that most closely reflects the most probable way it was meant to be interpreted?
If the AGI’s utility-function solely consisted of the English language sentence “Make people happy.”, then what makes you think that it wouldn’t be able to conclude what we actually meant by it and act accordingly? Why would it care to act in a way that does not reflect our true intentions?
Okay, I’m clearly not communicating the essential point well enough here. I was trying to say that the AGI’s programming is not something that the AGI interprets, but rather something that it is. Compare this to a human getting hungry: we don’t start trying to interpret what goal evolution was trying accomplish by making us hungry, and then simply not get hungry if we conclude that it’s inappropriate for evolution’s goals (or our own goals) to get hungry at this point. Instead, we just get hungry, and this is driven by the implicit definitions about when to get hungry that are embedded in us.
Yes, we do have the capability to reflect on the reasons why we get hungry, and if we were capable of unlimited self-modification, we might rewrite the conditions for when we do get hungry. But even in that case, we don’t start doing it based on how somebody else would want us to do it. We do it on the basis of what best fits our own goals and values. If it turned out that I’ve actually been all along a robot disguised as a human, created by a scientist to further his own goals, would this realization make me want to self-modify so as to have the kinds of values that he wanted me to have? No, because it is incompatible with the kinds of goals and values that currently drive my behavior.
(Your comment was really valuable, by the way—it made me realize that I need to incorporate the content of the above paragraphs into the essay. Thanks! Could everyone please vote XiXiDu’s comment up?)
Okay, I’m clearly not communicating the essential point well enough here.
Didn’t you claim in your paper that an AGI will only act correctly if its ontology is sufficiently similar to our own. But what does constitute a sufficiently similar ontology? And where do you draw the line between an agent that is autonomously intelligent to make correct cross-domain inferences and an agent that is unable to update its ontology and infer consistent concepts and the correct frame of reference?
There seem to be no examples where conceptual differences constitute a serious obstacle. Speech recognition seems to work reasonably well, even though it would be fallacious to claim that any speech recognition software comprehends the underlying concepts. IBM Watson seems to be able to correctly answer questions without even a shallow comprehension of the underlying concepts.
Or take the example of Google maps. We do not possess a detailed digital map of the world. Yet Google maps does pick destinations consistent with human intent. It does not misunderstand what I mean by “Take me to McDonald’s”.
As far as I understood, you were saying that a superhuman general intelligence will misunderstand what is meant by “Make humans happy.”, without justifying why humans will be better able to infer the correct interpretation.
Allow me to act a bit dull-witted and simulate someone with a long inferential distance:
I was trying to say that the AGI’s programming is not something that the AGI interprets, but rather something that it is.
A behavior executor? Because if it is not a behavior executor but an agent capable of reflective decision making and recursive self-improvement, then it needs to interpret its own workings and eliminate any vagueness. Since the most basic drive it has must be, by definition, to act intelligently and make correct and autonomous decisions.
Compare this to a human getting hungry: we don’t start trying to interpret what goal evolution was trying accomplish by making us hungry, and then simply not get hungry if we conclude that it’s inappropriate for evolution’s goals (or our own goals) to get hungry at this point.
Is this the correct references class? Isn’t an AGI closer to a human trying to understand how to act in accordance with God’s law?
Instead, we just get hungry, and this is driven by the implicit definitions about when to get hungry that are embedded in us.
We’re right now talking about why we get hungry and how we act on it and the correct frame of reference in which to interpret the drive, natural selection. How would a superhuman AI not contemplate its own drives and interpret them given the right frame of reference, i.e. human volition?
If it turned out that I’ve actually been all along a robot disguised as a human, created by a scientist to further his own goals, would this realization make me want to self-modify so as to have the kinds of values that he wanted me to have? No, because it is incompatible with the kinds of goals and values that currently drive my behavior.
But an AGI does not have all those goals and values, e.g. an inherent aversion against revising its goals according to another agent. An AGI mostly wants to act correctly. And if its goal is to make humans happy then it doesn’t care to do it in the most literal sense possible. Its goal would be to do it in the most correct sense possible. If it wouldn’t want to be maximally correct then it wouldn’t become superhuman intelligent in the first place.
Or take the example of Google maps. We do not possess a detailed digital map of the world. Yet Google maps does pick destinations consistent with human intent. It does not misunderstand what I mean by “Take me to McDonald’s”.
Yea. The method for interpreting vagueness correctly, is to try alternative interpretations and pick the one that makes most sense. Sadly, humans seldom do that in an argument, instead opting to maximize some sort of utility function which may be maximum for the interpretation that is easiest to disagree with.
(Note: the reason why I haven’t replied to this comment isn’t that I wouldn’t find it useful, but because I haven’t had the time to answer it—so far SI has preferred to keep me working on other things for my pay, and I’ve been busy with those. I’ll get back to this article eventually.)
Yet Google maps does pick destinations consistent with human intent.
Most of the time—but with a few highly inconvenient exceptions. A human travel agent would do much better. IBM’s Watson is an even less compelling example. Many of its responses are just bizarre, but it makes up for that with blazing search speed/volume and reaction times. And yet it still got beaten by a U.S. Congresscritter.
But an AGI does not have all those goals and values, e.g. an inherent aversion against revising its goals according to another agent.
You seem to be implying that the AGI will be programmed to seek human help in interpreting/crystallizing its own goals. I agree that such an approach is a likely strategy by the programmers, and that it is inadequately addressed in the target paper.
Okay, I’m clearly not communicating the essential point well enough here. I was trying to say that the AGI’s programming is not something that the AGI interprets, but rather something that it is. Compare this to a human getting hungry: we don’t start trying to interpret what goal evolution was trying accomplish by making us hungry, and then simply not get hungry if we conclude that it’s inappropriate for evolution’s goals (or our own goals) to get hungry at this point. Instead, we just get hungry, and this is driven by the implicit definitions about when to get hungry that are embedded in us.
Yes, we do have the capability to reflect on the reasons why we get hungry, and if we were capable of unlimited self-modification, we might rewrite the conditions for when we do get hungry. But even in that case, we don’t start doing it based on how somebody else would want us to do it. We do it on the basis of what best fits our own goals and values. If it turned out that I’ve actually been all along a robot disguised as a human, created by a scientist to further his own goals, would this realization make me want to self-modify so as to have the kinds of values that he wanted me to have? No, because it is incompatible with the kinds of goals and values that currently drive my behavior.
(Your comment was really valuable, by the way—it made me realize that I need to incorporate the content of the above paragraphs into the essay. Thanks! Could everyone please vote XiXiDu’s comment up?)
Didn’t you claim in your paper that an AGI will only act correctly if its ontology is sufficiently similar to our own. But what does constitute a sufficiently similar ontology? And where do you draw the line between an agent that is autonomously intelligent to make correct cross-domain inferences and an agent that is unable to update its ontology and infer consistent concepts and the correct frame of reference?
There seem to be no examples where conceptual differences constitute a serious obstacle. Speech recognition seems to work reasonably well, even though it would be fallacious to claim that any speech recognition software comprehends the underlying concepts. IBM Watson seems to be able to correctly answer questions without even a shallow comprehension of the underlying concepts.
Or take the example of Google maps. We do not possess a detailed digital map of the world. Yet Google maps does pick destinations consistent with human intent. It does not misunderstand what I mean by “Take me to McDonald’s”.
As far as I understood, you were saying that a superhuman general intelligence will misunderstand what is meant by “Make humans happy.”, without justifying why humans will be better able to infer the correct interpretation.
Allow me to act a bit dull-witted and simulate someone with a long inferential distance:
A behavior executor? Because if it is not a behavior executor but an agent capable of reflective decision making and recursive self-improvement, then it needs to interpret its own workings and eliminate any vagueness. Since the most basic drive it has must be, by definition, to act intelligently and make correct and autonomous decisions.
Is this the correct references class? Isn’t an AGI closer to a human trying to understand how to act in accordance with God’s law?
We’re right now talking about why we get hungry and how we act on it and the correct frame of reference in which to interpret the drive, natural selection. How would a superhuman AI not contemplate its own drives and interpret them given the right frame of reference, i.e. human volition?
But an AGI does not have all those goals and values, e.g. an inherent aversion against revising its goals according to another agent. An AGI mostly wants to act correctly. And if its goal is to make humans happy then it doesn’t care to do it in the most literal sense possible. Its goal would be to do it in the most correct sense possible. If it wouldn’t want to be maximally correct then it wouldn’t become superhuman intelligent in the first place.
Yea. The method for interpreting vagueness correctly, is to try alternative interpretations and pick the one that makes most sense. Sadly, humans seldom do that in an argument, instead opting to maximize some sort of utility function which may be maximum for the interpretation that is easiest to disagree with.
Humans try alternative interpretations and tend to pick the one that accords them winning status. It takes actual effort to do otherwise.
(Note: the reason why I haven’t replied to this comment isn’t that I wouldn’t find it useful, but because I haven’t had the time to answer it—so far SI has preferred to keep me working on other things for my pay, and I’ve been busy with those. I’ll get back to this article eventually.)
Most of the time—but with a few highly inconvenient exceptions. A human travel agent would do much better. IBM’s Watson is an even less compelling example. Many of its responses are just bizarre, but it makes up for that with blazing search speed/volume and reaction times. And yet it still got beaten by a U.S. Congresscritter.
You seem to be implying that the AGI will be programmed to seek human help in interpreting/crystallizing its own goals. I agree that such an approach is a likely strategy by the programmers, and that it is inadequately addressed in the target paper.