I assume “AI capable of understanding and treating as its final goal some natural language piece of text”, which is of course hard to create. I don’t think this presupposes that the AI automatically interprets instructions as we wish them to be interpreted, which is the part that we add by supplying a long natural languge description of ways that we might specify this, and problems we would want to avoid in doing so.
I will try this one more time. I’m assuming the AI needs a goal to do anything, including “understand”. The question of what a piece of text “means” does not, I think, have a definite answer that human philosophers would agree on.
You could try to program the AI to determine meaning by asking whether the writer (of the text) would verbally agree with the interpretation in some hypothetical situation. In which case, congratulations: you’ve rediscovered part of CEV. As with full CEV, the process of extrapolation is everything. (If the AI is allowed to ask what you’d agree to under torture or direct brain-modification, once it gets the ability to do those, then it can take anything whatsoever as its goal.)
Okay, you’re right, this does presuppose correctly performing volition extrapolation (or pointing the AI to the right concept of volition). It doesn’t presuppose full CEV over multiple people, or knowing whether you want to specify CEV or MR, which slightly simplifies the underlying problem.
I assume “AI capable of understanding and treating as its final goal some natural language piece of text”, which is of course hard to create. I don’t think this presupposes that the AI automatically interprets instructions as we wish them to be interpreted, which is the part that we add by supplying a long natural languge description of ways that we might specify this, and problems we would want to avoid in doing so.
I will try this one more time. I’m assuming the AI needs a goal to do anything, including “understand”. The question of what a piece of text “means” does not, I think, have a definite answer that human philosophers would agree on.
You could try to program the AI to determine meaning by asking whether the writer (of the text) would verbally agree with the interpretation in some hypothetical situation. In which case, congratulations: you’ve rediscovered part of CEV. As with full CEV, the process of extrapolation is everything. (If the AI is allowed to ask what you’d agree to under torture or direct brain-modification, once it gets the ability to do those, then it can take anything whatsoever as its goal.)
Okay, you’re right, this does presuppose correctly performing volition extrapolation (or pointing the AI to the right concept of volition). It doesn’t presuppose full CEV over multiple people, or knowing whether you want to specify CEV or MR, which slightly simplifies the underlying problem.