I’m not optimistic that behind every vague and ambiguous command, there is something specific that a person ‘really means’. It seems more likely there is something they would in fact try to mean, if they thought about it a bunch more, but this is mostly defined by further facts about their brains, rather than the sentence and what they thought or felt as they said it. It seems at least misleading to call this ‘what they meant’. Thus even when ‘—and do what I mean’ is appended to other kinds of goals than generic CEV-style ones, I would expect the execution to look much like a generic investigation of human values, such as that implicit in CEV.
Excellent point (and so much for “original intent” theories of jurisprudence).
I disagree that it would be surprising for an AI to be very good at flying planes in general, but very bad at going to the right places in them. However it seems instructive to think about why this is.
Suppose the funder of autopilot-AI research gives continuing funding to AI projects whose planes don’t crash, but happily funds a variety of flight paths and destinations. “Plane doesn’t crash” might be analogous to natural language understanding, image processing, or other area of cognition; various “destinations” include economic and military uses of these technologies.
and so much for “original intent” theories of jurisprudence
I don’t buy this.
It is probably true that when people say things there isn’t something specific that they “really mean” down to all the details. But there are some aspects of it where they have specific meaning; the range which is left vague by their inability to understand the details is limited. When we say that we want an AI to do what we mean, what that really signifies is that we want the AI to do what we mean to the extent that we do mean something specific, and to do something within the range of things we consider acceptable where our meaning covers a range.
Asking an AI to cure illness, then, should never result in the AI killing everyone (so that they are no longer ill), but it might result in the AI producing a drug that cures an illness while causing some side effects that could also be described as an “illness”. Humans may not be able to agree on exactly how many side effects are acceptable, but as long as the AI’s result produces side effects that are acceptable to at least some humans, and does not kill everyone, we would probably think the AI is doing what we mean to the best of its ability.
Well yeah, I agree that meaning does have a core, and is only fuzzy at the edges. So, can a robust natural language ability, assuming we can get that in an AI, be leveraged to get the AI to do what we mean? Could the fundamental “drives” of AI “motivation” themselves be written in natural language? I haven’t done any significant work with compilers, so forgive me if this question is naive.
Excellent point (and so much for “original intent” theories of jurisprudence).
Suppose the funder of autopilot-AI research gives continuing funding to AI projects whose planes don’t crash, but happily funds a variety of flight paths and destinations. “Plane doesn’t crash” might be analogous to natural language understanding, image processing, or other area of cognition; various “destinations” include economic and military uses of these technologies.
I don’t buy this.
It is probably true that when people say things there isn’t something specific that they “really mean” down to all the details. But there are some aspects of it where they have specific meaning; the range which is left vague by their inability to understand the details is limited. When we say that we want an AI to do what we mean, what that really signifies is that we want the AI to do what we mean to the extent that we do mean something specific, and to do something within the range of things we consider acceptable where our meaning covers a range.
Asking an AI to cure illness, then, should never result in the AI killing everyone (so that they are no longer ill), but it might result in the AI producing a drug that cures an illness while causing some side effects that could also be described as an “illness”. Humans may not be able to agree on exactly how many side effects are acceptable, but as long as the AI’s result produces side effects that are acceptable to at least some humans, and does not kill everyone, we would probably think the AI is doing what we mean to the best of its ability.
Well yeah, I agree that meaning does have a core, and is only fuzzy at the edges. So, can a robust natural language ability, assuming we can get that in an AI, be leveraged to get the AI to do what we mean? Could the fundamental “drives” of AI “motivation” themselves be written in natural language? I haven’t done any significant work with compilers, so forgive me if this question is naive.