What’s a brief but effective way to respond to the “an AI, upon realizing that it’s programmed in a way its designer didn’t intend to, would reprogram itself to be like the designer intended” fallacy? (Came up here: http://xuenay.livejournal.com/325292.html?thread=1229996#t1229996 )
I hope I’m not misinterpreting again, but this is a Giant cheesecake fallacy. The problem is that AI’s decisions depend on its motive. “An AI, upon realizing that it’s programmed in a way its designer didn’t intend to, would try to convince the programmer that what the AI turned out to be is exactly what he intended in the first place”, “An AI, upon realizing that it’s programmed in a way its designer didn’t intend to, would print a string “Styggron” to the console”.
How about: an AI can be smart enough to realize all of those things, and it still won’t change its utility function. Then link Eliezer’s short story about that exact scenario. (Can’t find it in two minutes, but it’s the one where the dude wakes up with a construct designed to be his perfect mate, and he rejects her because she’s not his wife.)
What’s a brief but effective way to respond to the “an AI, upon realizing that it’s programmed in a way its designer didn’t intend to, would reprogram itself to be like the designer intended” fallacy? (Came up here: http://xuenay.livejournal.com/325292.html?thread=1229996#t1229996 )
I hope I’m not misinterpreting again, but this is a Giant cheesecake fallacy. The problem is that AI’s decisions depend on its motive. “An AI, upon realizing that it’s programmed in a way its designer didn’t intend to, would try to convince the programmer that what the AI turned out to be is exactly what he intended in the first place”, “An AI, upon realizing that it’s programmed in a way its designer didn’t intend to, would print a string “Styggron” to the console”.
Thanks, that’s a good one. I’ll try it.
How about: an AI can be smart enough to realize all of those things, and it still won’t change its utility function. Then link Eliezer’s short story about that exact scenario. (Can’t find it in two minutes, but it’s the one where the dude wakes up with a construct designed to be his perfect mate, and he rejects her because she’s not his wife.)
http://lesswrong.com/lw/xu/failed_utopia_42/