What does “the AI” mean? Computers don’t come with the ability to interpret English. You still need to either translate “do the thing that this body of writing describes” into formal language, or program a method of translating English instructions in general while avoiding gotcha interpretations (e.g, “this body of writing” surely “describes”, in passing, an AI that kills us all or simply does nothing). Intelligence as I imagine it requires a goal to have meaning; if that goal is just ‘do something that literally fulfills your instruction according to dictionary meanings’, the most efficient way to accomplish that is to find some mention of an AI that does nothing and imitate it. Whereas if we program in some drive to fulfill what humans should have asked for, that sounds a lot like CEV. I don’t find it obvious, to put it mildly, that your extra step adds anything.
I assume “AI capable of understanding and treating as its final goal some natural language piece of text”, which is of course hard to create. I don’t think this presupposes that the AI automatically interprets instructions as we wish them to be interpreted, which is the part that we add by supplying a long natural languge description of ways that we might specify this, and problems we would want to avoid in doing so.
I will try this one more time. I’m assuming the AI needs a goal to do anything, including “understand”. The question of what a piece of text “means” does not, I think, have a definite answer that human philosophers would agree on.
You could try to program the AI to determine meaning by asking whether the writer (of the text) would verbally agree with the interpretation in some hypothetical situation. In which case, congratulations: you’ve rediscovered part of CEV. As with full CEV, the process of extrapolation is everything. (If the AI is allowed to ask what you’d agree to under torture or direct brain-modification, once it gets the ability to do those, then it can take anything whatsoever as its goal.)
Okay, you’re right, this does presuppose correctly performing volition extrapolation (or pointing the AI to the right concept of volition). It doesn’t presuppose full CEV over multiple people, or knowing whether you want to specify CEV or MR, which slightly simplifies the underlying problem.
What does “the AI” mean? Computers don’t come with the ability to interpret English. You still need to either translate “do the thing that this body of writing describes” into formal language, or program a method of translating English instructions in general while avoiding gotcha interpretations (e.g, “this body of writing” surely “describes”, in passing, an AI that kills us all or simply does nothing). Intelligence as I imagine it requires a goal to have meaning; if that goal is just ‘do something that literally fulfills your instruction according to dictionary meanings’, the most efficient way to accomplish that is to find some mention of an AI that does nothing and imitate it. Whereas if we program in some drive to fulfill what humans should have asked for, that sounds a lot like CEV. I don’t find it obvious, to put it mildly, that your extra step adds anything.
I assume “AI capable of understanding and treating as its final goal some natural language piece of text”, which is of course hard to create. I don’t think this presupposes that the AI automatically interprets instructions as we wish them to be interpreted, which is the part that we add by supplying a long natural languge description of ways that we might specify this, and problems we would want to avoid in doing so.
I will try this one more time. I’m assuming the AI needs a goal to do anything, including “understand”. The question of what a piece of text “means” does not, I think, have a definite answer that human philosophers would agree on.
You could try to program the AI to determine meaning by asking whether the writer (of the text) would verbally agree with the interpretation in some hypothetical situation. In which case, congratulations: you’ve rediscovered part of CEV. As with full CEV, the process of extrapolation is everything. (If the AI is allowed to ask what you’d agree to under torture or direct brain-modification, once it gets the ability to do those, then it can take anything whatsoever as its goal.)
Okay, you’re right, this does presuppose correctly performing volition extrapolation (or pointing the AI to the right concept of volition). It doesn’t presuppose full CEV over multiple people, or knowing whether you want to specify CEV or MR, which slightly simplifies the underlying problem.