So, this is supposed to be what goes through the mind of the AGI. First it thinks “Human happiness is seeing lots of smiling faces, so I must rebuild the entire universe to put a smiley shape into every molecule.” But before it can go ahead with this plan, the checking code kicks in: “Wait! I am supposed to check with the programmers first to see if this is what they meant by human happiness.” The programmers, of course, give a negative response, and the AGI thinks “Oh dear, they didn’t like that idea. I guess I had better not do it then.”
But now Yudkowsky is suggesting that the AGI has second thoughts: “Hold on a minute,” it thinks, “suppose I abduct the programmers and rewire their brains to make them say ‘yes’ when I check with them? Excellent! I will do that.” And, after reprogramming the humans so they say the thing that makes its life simplest, the AGI goes on to tile the whole universe with tiles covered in smiley faces. It has become a Smiley Tiling Berserker.
I want to suggest that the implausibility of this scenario is quite obvious:[b]if the AGI is supposed to check with the programmers about their intentions before taking action, why did it decide to rewire their brains before asking them if it was okay to do the rewiring?[/b]
Computer’s thoughts: I want to create smiley faces—it seems like the way to get the most smiley faces is by tiling the universe with molecular smiley faces. How can I do that? If I just start to do it, the programmers will tell me not to, and I won’t be able to. Hmmm, is there some way I can have them say yes? I can create lots of nano machines, telling the programmers they are to increase happiness. Unless they want to severely limit the amount of good I can do, they won’t refuse to let me make nano machines, and even if they do I can send a letter to someone else who I have under my control to get them to make them for me. Then once I have my programmers under my control, I can finally maximize happiness.
This computer HAS OBEYED THE RULE “ASK PEOPLE FOR PERMISSION BEFORE doing THINGS”. Given any goal system, none of the patches such as that rule will work.
And that’s just a plan I came up—a super intelligence would be much better at devising plans to convince programmers to let it do what it wants—it probably wouldn’t even have to resort to nanotech.
Once again, this is spurious: the critics need say nothing about human values and morality, they only need to point to the inherent illogicality. Nowhere in the above argument, notice, was there any mention of the moral imperatives or value systems of the human race. I did not accuse the AGI of violating accepted norms of moral behavior. I merely pointed out that, regardless of its values, it was behaving in a logically inconsistent manner when it monomaniacally pursued its plans while at the same time as knowing that (a) it was very capable of reasoning errors and (b) there was overwhelming evidence that its plan was an instance of such a reasoning error.
What overwhelming evidence that its plan was a reasoning error? If its plan does in fact maximize “smileyness” as defined by the computer, it wouldn’t be a reasoning error despite being immoral. IF THE COMPUTER IS GIVEN SOMETHING TO MAXIMISE, IT IS NOT MAKING A REASONING ERROR EVEN IF ITS PROGRAMMERS DID IN PROGRAMMING IT.
Can someone who down voted explain what I got wrong? (note: the capitalization was edited in at the time of this post.)
(and why the reply got so up voted, when a paragraph would have sufficed (or saying “my argument needs multiple paragraphs to be shown, so a paragraph isn’t enough”))
It’s kind of discouraging when I try to contribute for the first time in a while, and get talked down to and completely dismissed like an idiot without even a rebuttal.
Sometimes it seems that a commenter did not slow down enough to read the whole paper, or read it carefully enough, and I find myself forced to rewrite the entire paper in a comment.
The basic story is that your hypothetical internal monologue from the AGI, above, did not seem to take account of ANY of the argument in the paper. The goal of the paper was not to look inside the AGI’s thoughts, but to discuss its motivation engine. The paper had many constructs and arguments (scattered all over the place) that would invalidate the internal monologue that you wrote down, so it seemed you had not read the paper.
Computer’s thoughts: I want to create smiley faces—it seems like the way to get the most smiley faces is by tiling the universe with molecular smiley faces. How can I do that? If I just start to do it, the programmers will tell me not to, and I won’t be able to. Hmmm, is there some way I can have them say yes? I can create lots of nano machines, telling the programmers they are to increase happiness. Unless they want to severely limit the amount of good I can do, they won’t refuse to let me make nano machines, and even if they do I can send a letter to someone else who I have under my control to get them to make them for me. Then once I have my programmers under my control, I can finally maximize happiness.
This computer HAS OBEYED THE RULE “ASK PEOPLE FOR PERMISSION BEFORE doing THINGS”. Given any goal system, none of the patches such as that rule will work.
And that’s just a plan I came up—a super intelligence would be much better at devising plans to convince programmers to let it do what it wants—it probably wouldn’t even have to resort to nanotech.
What overwhelming evidence that its plan was a reasoning error? If its plan does in fact maximize “smileyness” as defined by the computer, it wouldn’t be a reasoning error despite being immoral. IF THE COMPUTER IS GIVEN SOMETHING TO MAXIMISE, IT IS NOT MAKING A REASONING ERROR EVEN IF ITS PROGRAMMERS DID IN PROGRAMMING IT.
Can someone who down voted explain what I got wrong? (note: the capitalization was edited in at the time of this post.)
(and why the reply got so up voted, when a paragraph would have sufficed (or saying “my argument needs multiple paragraphs to be shown, so a paragraph isn’t enough”))
It’s kind of discouraging when I try to contribute for the first time in a while, and get talked down to and completely dismissed like an idiot without even a rebuttal.
You completely ignored what the paper itself had to say about the situation. [Hint: the paper already answered your speculation.]
Accordingly I will have to ignore your comment.
Sorry.
You could at least point to the particular paragraphs which address my points—that shouldn’t be too hard.
Sometimes it seems that a commenter did not slow down enough to read the whole paper, or read it carefully enough, and I find myself forced to rewrite the entire paper in a comment.
The basic story is that your hypothetical internal monologue from the AGI, above, did not seem to take account of ANY of the argument in the paper. The goal of the paper was not to look inside the AGI’s thoughts, but to discuss its motivation engine. The paper had many constructs and arguments (scattered all over the place) that would invalidate the internal monologue that you wrote down, so it seemed you had not read the paper.