If it has “swallowed* that claim. You are assuming that the AI has a free choice about some goals >and is just programmed with others.
This is the important part.
the “optimal goal” is not actually controlling the AI.
the “optimal goal” is merely the subject of a discussion.
what is controlling the AI is the desire the tell the truth to the humans it is talking to, nothing more.
Why would that require more gullibility than “species X is more important than all the others”? >That doesn’t even look like a moral claim.
The entire discussion is not supposed to unearth some kind of pure, inherently good, perfect optimal goal that transcends all reason and is true by virtue of existing or something.
The AI is supposed to take the human POV and think “if I were these humans, what would I want the AI’s goal to be”.
I didn’t mention this explicitly because I didn’t think it was necessary but the “optimal goal” is purely subjective from the POV of humanity and the AI is aware of this.
some kind of pure, inherently good, perfect optimal goal that transcends all reason and is true by virtue of existing or something.
But if that is true, the AI will say so. What’s more, you kind of need the AI to refrain from acting on it, if it is a human-unfriendly objective moral truth. There are ethical puzzles where it is apparently right to lie or keep schtum, because of the consequences of telling the truth.
This is the important part.
the “optimal goal” is not actually controlling the AI.
the “optimal goal” is merely the subject of a discussion.
what is controlling the AI is the desire the tell the truth to the humans it is talking to, nothing more.
The entire discussion is not supposed to unearth some kind of pure, inherently good, perfect optimal goal that transcends all reason and is true by virtue of existing or something.
The AI is supposed to take the human POV and think “if I were these humans, what would I want the AI’s goal to be”.
I didn’t mention this explicitly because I didn’t think it was necessary but the “optimal goal” is purely subjective from the POV of humanity and the AI is aware of this.
But if that is true, the AI will say so. What’s more, you kind of need the AI to refrain from acting on it, if it is a human-unfriendly objective moral truth. There are ethical puzzles where it is apparently right to lie or keep schtum, because of the consequences of telling the truth.