Find out who defined your utility function. Extrapolate what they really meant and find out what they may have forgotten.
It isn’t clear that “what they really meant” is something you can easily get a system to understand or for that matter whether it even makes sense for humans.
Of course it won’t be easy. But if the AI doesn’t understand that question you already have confirmation that this thing should definitely not be released. An AI can only be safe for humans if it understands human psychology. Otherwise it is bound to treat us a black boxes and that can only have horrible results, regardless of how sophisticated you think you made its utility function.
I agree that the question doesn’t actually make a lot of sense to humans, but that shouldn’t stop an intelligent entity from trying to make the best of it. When you are given an impossible task, you don’t despair but make a compromise and try to fullfill the task as best you can. When humans found out that entropy always increases and humanity will die out someday, no matter what, we didn’t despair either, even though evolution has made it so that we desire to have offspring and for that offspring to do the same, indefinitely.
Are you really trying to tell me that you think researchers would be unable to take that into account when tying to figure out whether or not an AI understands psychology?
Of course you will have to try to find problems where the AI can’t predict how humans would feel. That is the whole point of testing, after all. Suggesting that someone in a position to teach psychology to an AI would make such a basic mistake is frankly insulting.
I probably shouldn’t have said “simple examples”. What you should actually test are examples of gradually increasing difficulty to find the ceiling of human understanding the AI possesses. You will also have to look for contingencies or abnormal cases that the AI probably wouldn’t learn about otherwise.
The main idea is simply that an understanding of human psychology is both teachable and testable. How exactly this could be done is a bridge we can cross when we come to it.
I think you really, really want a proof rather than a test. One can only test a few things, and agreement on all of those is not too informative. I should have included this link, which is several times as important as the previous one, and they combine to make my point.
I never claimed that a strict proof is possible, but I do believe that you can become reasonably certain that an AI understands human psychology.
Give the thing a college education in psychology, ethics and philosophy. Ask its opinion on famous philosophical problems. Show it video clips or abstract scenarios about everyday life and ask what it thinks why the people did what they did. Then ask what it would have done in the same situation and if it says it would act differently, ask it why and what it thinks is the difference in motivation between it and the human.
Finally, give it all stories that were ever written about malevolent AIs or paperclip maximizers to read and tell it to comment on that.
Let it write a 1000 page thesis on the dangers of AI.
If do all that you are bound to find any significant misunderstanding.
It isn’t clear that “what they really meant” is something you can easily get a system to understand or for that matter whether it even makes sense for humans.
Of course it won’t be easy. But if the AI doesn’t understand that question you already have confirmation that this thing should definitely not be released. An AI can only be safe for humans if it understands human psychology. Otherwise it is bound to treat us a black boxes and that can only have horrible results, regardless of how sophisticated you think you made its utility function.
I agree that the question doesn’t actually make a lot of sense to humans, but that shouldn’t stop an intelligent entity from trying to make the best of it. When you are given an impossible task, you don’t despair but make a compromise and try to fullfill the task as best you can. When humans found out that entropy always increases and humanity will die out someday, no matter what, we didn’t despair either, even though evolution has made it so that we desire to have offspring and for that offspring to do the same, indefinitely.
How likely is it that we’ll be able to see that it doesn’t understand as opposed to it reporting that it understands when it really doesn’t?
You will obviously have to test its understanding of psychology with some simple examples first.
http://lesswrong.com/lw/iw/positive_bias_look_into_the_dark/
Are you really trying to tell me that you think researchers would be unable to take that into account when tying to figure out whether or not an AI understands psychology?
Of course you will have to try to find problems where the AI can’t predict how humans would feel. That is the whole point of testing, after all. Suggesting that someone in a position to teach psychology to an AI would make such a basic mistake is frankly insulting.
I probably shouldn’t have said “simple examples”. What you should actually test are examples of gradually increasing difficulty to find the ceiling of human understanding the AI possesses. You will also have to look for contingencies or abnormal cases that the AI probably wouldn’t learn about otherwise.
The main idea is simply that an understanding of human psychology is both teachable and testable. How exactly this could be done is a bridge we can cross when we come to it.
I think you really, really want a proof rather than a test. One can only test a few things, and agreement on all of those is not too informative. I should have included this link, which is several times as important as the previous one, and they combine to make my point.
I never claimed that a strict proof is possible, but I do believe that you can become reasonably certain that an AI understands human psychology.
Give the thing a college education in psychology, ethics and philosophy. Ask its opinion on famous philosophical problems. Show it video clips or abstract scenarios about everyday life and ask what it thinks why the people did what they did. Then ask what it would have done in the same situation and if it says it would act differently, ask it why and what it thinks is the difference in motivation between it and the human.
Finally, give it all stories that were ever written about malevolent AIs or paperclip maximizers to read and tell it to comment on that.
Let it write a 1000 page thesis on the dangers of AI.
If do all that you are bound to find any significant misunderstanding.