The “I obeyed the explicit content of the contract but didn’t give you what you want, sucks to be you” attitude exists in some humans (who are intelligent enough to know the implied meaning of the contract), so why wouldn’t it also exist in AIs?
Sure, but why would anyone likely build such an AI? Which is at the core of what Ben Goertzel argues, we do not pull minds from design space at random.
A tool does what it is supposed to do. If you add a lot of intelligence, why would it suddenly do something completely nuts like taking over the universe, something that was obviously not the intended purpose?
I think a better analogy with an AI would be a sociopathic decorator that doesn’t care about being a good decorator, but does care about fulfulling contracts, and cares about nothing not stated in the contract.
I don’t think it would make sense to create an AGI that does not care about the implications and context of its goals but only follows the definitions verbatim. That doesn’t seem to be very intelligent behavior. And that’s exactly a quality an AGI capable of self-improvement needs, a sense for context and implications.
Many of our tools are supposed to be web browsers, email clients, etc., but have a history of suddenly doing something completely nuts like taking over the whole computer, which was obviously not the intended purpose. Programming is hard that way—the result will only follow your program, verbatim. Attempts to give programs a greater sense of context and implications aren’t new—they’re called “higher level languages”. They feel less like hand-holding a dumb machine and more like describing a thought process, and you can even design the language to make whole classes of lower-level bugs unwriteable, but machines still end up doing what they’re instructed, verbatim (where “what they’re instructed” can now also include the output of compiler bugs).
The trouble is that you can’t rule out every class of bugs. It’s hard (impossible?) to distinguish a priori between what might be a bug and what might just be a different programmers’ intention, even though we’ve been wishing for the ability to do so for over a century. “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?”
Thank you. I’ve been trying to argue that “the computer does what you tell it to” is a much more chaotic situation than those who want to build FAI seem to believe, and you lay it out better than I have.
Sure, but why would anyone likely build such an AI?
Because computer programs do what they’re programmed to do, without taking into account the actual intention of the user.
Creating an AGI that does take into account what people really want (bearing in mind that the AGI is massively more intelligent than the people wanting the things) is, it seems to me, what the whole Friendly thing is about. If you know how to do that, you’ve solved Friendliness.
Edit: With added complications such as people not knowing what they want, people having conflicting goals, people wanting different things once there’s a powerful AI doing stuff, etc etc
Sure, but why would anyone likely build such an AI? Which is at the core of what Ben Goertzel argues, we do not pull minds from design space at random.
A tool does what it is supposed to do. If you add a lot of intelligence, why would it suddenly do something completely nuts like taking over the universe, something that was obviously not the intended purpose?
I don’t think it would make sense to create an AGI that does not care about the implications and context of its goals but only follows the definitions verbatim. That doesn’t seem to be very intelligent behavior. And that’s exactly a quality an AGI capable of self-improvement needs, a sense for context and implications.
Many of our tools are supposed to be web browsers, email clients, etc., but have a history of suddenly doing something completely nuts like taking over the whole computer, which was obviously not the intended purpose. Programming is hard that way—the result will only follow your program, verbatim. Attempts to give programs a greater sense of context and implications aren’t new—they’re called “higher level languages”. They feel less like hand-holding a dumb machine and more like describing a thought process, and you can even design the language to make whole classes of lower-level bugs unwriteable, but machines still end up doing what they’re instructed, verbatim (where “what they’re instructed” can now also include the output of compiler bugs).
The trouble is that you can’t rule out every class of bugs. It’s hard (impossible?) to distinguish a priori between what might be a bug and what might just be a different programmers’ intention, even though we’ve been wishing for the ability to do so for over a century. “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?”
Thank you. I’ve been trying to argue that “the computer does what you tell it to” is a much more chaotic situation than those who want to build FAI seem to believe, and you lay it out better than I have.
Yet, people around here seem to believe that the AI will develop an accurate model of the world even if its input isn’t all that accurate.
Who believes what, exactly?
Because computer programs do what they’re programmed to do, without taking into account the actual intention of the user.
Creating an AGI that does take into account what people really want (bearing in mind that the AGI is massively more intelligent than the people wanting the things) is, it seems to me, what the whole Friendly thing is about. If you know how to do that, you’ve solved Friendliness.
Edit: With added complications such as people not knowing what they want, people having conflicting goals, people wanting different things once there’s a powerful AI doing stuff, etc etc