and so much for “original intent” theories of jurisprudence
I don’t buy this.
It is probably true that when people say things there isn’t something specific that they “really mean” down to all the details. But there are some aspects of it where they have specific meaning; the range which is left vague by their inability to understand the details is limited. When we say that we want an AI to do what we mean, what that really signifies is that we want the AI to do what we mean to the extent that we do mean something specific, and to do something within the range of things we consider acceptable where our meaning covers a range.
Asking an AI to cure illness, then, should never result in the AI killing everyone (so that they are no longer ill), but it might result in the AI producing a drug that cures an illness while causing some side effects that could also be described as an “illness”. Humans may not be able to agree on exactly how many side effects are acceptable, but as long as the AI’s result produces side effects that are acceptable to at least some humans, and does not kill everyone, we would probably think the AI is doing what we mean to the best of its ability.
Well yeah, I agree that meaning does have a core, and is only fuzzy at the edges. So, can a robust natural language ability, assuming we can get that in an AI, be leveraged to get the AI to do what we mean? Could the fundamental “drives” of AI “motivation” themselves be written in natural language? I haven’t done any significant work with compilers, so forgive me if this question is naive.
I don’t buy this.
It is probably true that when people say things there isn’t something specific that they “really mean” down to all the details. But there are some aspects of it where they have specific meaning; the range which is left vague by their inability to understand the details is limited. When we say that we want an AI to do what we mean, what that really signifies is that we want the AI to do what we mean to the extent that we do mean something specific, and to do something within the range of things we consider acceptable where our meaning covers a range.
Asking an AI to cure illness, then, should never result in the AI killing everyone (so that they are no longer ill), but it might result in the AI producing a drug that cures an illness while causing some side effects that could also be described as an “illness”. Humans may not be able to agree on exactly how many side effects are acceptable, but as long as the AI’s result produces side effects that are acceptable to at least some humans, and does not kill everyone, we would probably think the AI is doing what we mean to the best of its ability.
Well yeah, I agree that meaning does have a core, and is only fuzzy at the edges. So, can a robust natural language ability, assuming we can get that in an AI, be leveraged to get the AI to do what we mean? Could the fundamental “drives” of AI “motivation” themselves be written in natural language? I haven’t done any significant work with compilers, so forgive me if this question is naive.