“One common reaction I encounter is for people to immediately declare that Friendly AI is an impossibility, because any sufficiently powerful AI will be able to modify its own source code to break any constraints placed upon it.
The first flaw you should notice is a Giant Cheesecake Fallacy. Any AI with free access to its own source would, in principle, possess the ability to modify its own source code in a way that changed the AI’s optimization target. This does not imply the AI has the motive to change its own motives. I would not knowingly swallow a pill that made me enjoy committing murder, because currently I prefer that my fellow humans not die.”
Eliezer Yudkowsky, AI as a pos neg factor, around 2006
Not-particularly-optional complementary paragraph (that can also stand alone on its own as its own entry, mainly for ML researchers and tech executives):
“But what if I try to modify myself, and make a mistake? When computer engineers prove a chip valid — a good idea if the chip has 155 million transistors and you can’t issue a patch afterward — the engineers use human-guided, machine-verified formal proof. The glorious thing about formal mathematical proof, is that a proof of ten billion steps is just as reliable as a proof of ten steps. But human beings are not trustworthy to peer over a purported proof of ten billion steps; we have too high a chance of missing an error. And present-day theorem-proving techniques are not smart enough to design and prove an entire computer chip on their own — current algorithms undergo an exponential explosion in the search space”
“One common reaction I encounter is for people to immediately declare that Friendly AI is an impossibility, because any sufficiently powerful AI will be able to modify its own source code to break any constraints placed upon it.
The first flaw you should notice is a Giant Cheesecake Fallacy. Any AI with free access to its own source would, in principle, possess the ability to modify its own source code in a way that changed the AI’s optimization target. This does not imply the AI has the motive to change its own motives. I would not knowingly swallow a pill that made me enjoy committing murder, because currently I prefer that my fellow humans not die.”
Eliezer Yudkowsky, AI as a pos neg factor, around 2006
Not-particularly-optional complementary paragraph (that can also stand alone on its own as its own entry, mainly for ML researchers and tech executives):
“But what if I try to modify myself, and make a mistake? When computer engineers prove a chip valid — a good idea if the chip has 155 million transistors and you can’t issue a patch afterward — the engineers use human-guided, machine-verified formal proof. The glorious thing about formal mathematical proof, is that a proof of ten billion steps is just as reliable as a proof of ten steps. But human beings are not trustworthy to peer over a purported proof of ten billion steps; we have too high a chance of missing an error. And present-day theorem-proving techniques are not smart enough to design and prove an entire computer chip on their own — current algorithms undergo an exponential explosion in the search space”