Just as a simple example, an AI could maximally satisfy a goal by changing human preferences so as to make us desire for it to satisfy that goal. This would be entirely consistent with constraints on not disobeying humans or their desires, while not at all in accordance with our current preferences or desired path of development.
Yes, but why would it do that? You seem to think that such unbounded creativity arises naturally in any given artificial general intelligence. What makes you think that rather than being impassive it would go on learning enough neuroscience to tweak human goals? If the argument is that AI’s do all kinds of bad things because they do not care, why do they care to do a bad thing then rather than no-thing?
If you told the AI to make humans happy. It would first have to learn what humans are, what happiness means. Yet after learning all that you still expect it to not know that we don’t like to be turned into broccoli? I don’t think this is reasonable.
If you told the AI to make humans happy. It would first have to learn what humans are, what happiness means.
Yes, and humans would happily teach it that.
However, some people think that this can be reduced to saying that we should just make AIs try to make people smile… which could result in anything from world-wide happiness drugs to surgically altering our faces into permanent smiles to making lots of tiny models of perfectly-smiling humans.
It’s not that the AI is evil, it’s that programmers are stupid. See the previous articles here about memetic immunity: when you teach hunter-gatherer tribes about Christianity, they interrpret the bible literally and do all sorts of things that “real” Christians don’t. An AI isn’t going to be smart enough to not take you seriously when you tell it that:
its goal is to make humanity happy,
humanity consists of things that look like this [providing a picture], and
that being happy means you smile a lot
You don’t need to be very creative or smart to come up with LOTS of ways for this command sequence to have bugs with horrible consequences, if the AI has any ability to influence the world.
Most people, though, don’t grok this, because their brain filters off those possibilities. Of course, no human could be simultaneously so stupid as to make this mistake, while also being smart enough to actually do something dangerous. But that kind of simultaneous smartness/stupidity is how computers are by default.
(And if you say, “ah, but if we make an AI that’s like a human, it won’t have this problem”, then you have to bear in mind that this sort of smart/stupidness is endemic to human children as well. IOW, it’s a symptom of inadequate shared background, rather than being something specific to current-day computers or some particular programming paradigm.)
However, some people think that this can be reduced to saying that we should just make AIs try to make people smile… which could result in anything from world-wide happiness drugs to surgically altering our faces into permanent smiles to making lots of tiny models of perfectly-smiling humans.
But you implicitly assume that it is given the incentive to develop the cognitive flexibility and comprehension to act in a real-world environment and do those things but at the same time you propose that the same people who are capable of giving it such extensive urges fail on another goal in such a blatant and obvious way. How does that make sense?
See the previous articles here about memetic immunity: when you teach hunter-gatherer tribes about Christianity, they interrpret the bible literally and do all sorts of things that “real” Christians don’t. An AI isn’t going to be smart enough to not take you seriously when you tell it that...
The difference between the hunter-gatherer and the AI is that the hunter-gatherer already posses a wide range of conceptual frameworks and incentives. An AI isn’t going to do something without someone to carefully and deliberately telling it do do so and what to do. It won’t just read the Bible and come to the conclusion that it should convert all humans to Christianity. Where would such an incentive come from?
You don’t need to be very creative or smart to come up with LOTS of ways for this command sequence to have bugs with horrible consequences, if the AI has any ability to influence the world.
The AI is certainly very creative and smart if it can influence the world dramatically. You allow it to be that smart, you allow it to care to do so, but you don’t allow it to comprehend what you actually mean? What I’m trying to pinpoint here is that you seem to believe that there are many pathways that lead to superhuman abilities yet all of them fail to comprehend some goals while still being able to self-improve on them.
you implicitly assume that it is given the incentive to develop the cognitive flexibility and comprehension to act in a real-world environment and do those things but at the same time you propose that the same people who are capable of giving it such extensive urges fail on another goal in such a blatant and obvious way. How does that make sense?
Because people make stupid mistakes, especially when programming. And telling your fully-programmed AI what you want it to do still counts as programming.
At this point, I am going to stop my reply, because the remainder of your comment consists of taking things I said out of context and turning them into irrelevancies:
I didn’t say an AI would try to convert people to Christianity—I said that humans without sufficient shared background will interpret things literally, and so would AIs.
I didn’t say the AI needed to be creative or smart, I said you wouldn’t need to be creative or smart to make a list of ways those three simple instructions could be given a disastrous literal interpretation.
you seem to believe that there are many pathways that lead to superhuman abilities yet all of them fail to comprehend some goals while still being able to self-improve on them.
There are many paths to superhuman ability, as humans really aren’t that smart.
This also means that you can easily be superhuman in ability, and still really dumb—in terms of comprehending what humans mean… but don’t actually say.
Great comment. Allow me to emphasize that ‘smile’ here is just an extreme example. Most other descriptions humans give of happiness will end up with results just as bad. Ultimately any specification that we give it will be gamed ruthlessly.
Have you read Omohundro yet? Nick Tarleton repeatedly linked his papers for you in response to comments about this topic, they are quite on target and already written.
I’ve skimmed over it, see my response here. I found out that what I wrote is similar to what Ben Goertzel believes. I’m just trying to account for potential antipredictions, in this particular thread, that should be incorporated into any risk estimations.
Well my idea is not that creative, or even new, meaning that even if I hadn’t just posted it online an AI could still have conceivably read it somewhere else, and I do think creativity is a property of any sufficiently general intelligence that we might create, but those points are secondary.
No one here will argue that an unFriendly AI will do “bad things” because it doesn’t care (about what?). It will do bad things because it cares more about something else. Nor is “bad” an absolute: actions may be bad for some people and not for others, and there are moral systems under which actions can be firmly called “wrong”, but where all alternative actions are also “wrong”. Problems like that arise even for humans; in an AI the effects could be very ugly indeed.
And to clarify, I expect any AI that isn’t completely ignorant, let alone general, to know that we don’t like to be turned into broccoli. My example was of changing what humans want. Wireheading is the obvious candidate of a desire that an AI might want to implant.
What I meant is that the argument is that you have to make it care about humans so as not to harm them. Yet it is assumed that it does a lot without having to care about it, e.g. creating paperclips or self-improvement. My question is, why do people believe that you don’t have to make it care to do those things but you have to make it care to not harm humans. It is clear that if it only cares about one thing, doing that one thing could harm humans. Yet why would it do that one thing to an extent that is either not defined or which it is not deliberately made to care about. The assumptions seems to be that AI’s will do something, anything but being passive. Why isn’t limited behavior, failure and impassivity together not more likely than harming humans as a result of own goals or as a result to follow all goals but the one that limits its scope?
Just as a simple example, an AI could maximally satisfy a goal by changing human preferences so as to make us desire for it to satisfy that goal. This would be entirely consistent with constraints on not disobeying humans or their desires, while not at all in accordance with our current preferences or desired path of development.
Yes, but why would it do that? You seem to think that such unbounded creativity arises naturally in any given artificial general intelligence. What makes you think that rather than being impassive it would go on learning enough neuroscience to tweak human goals? If the argument is that AI’s do all kinds of bad things because they do not care, why do they care to do a bad thing then rather than no-thing?
If you told the AI to make humans happy. It would first have to learn what humans are, what happiness means. Yet after learning all that you still expect it to not know that we don’t like to be turned into broccoli? I don’t think this is reasonable.
Yes, and humans would happily teach it that.
However, some people think that this can be reduced to saying that we should just make AIs try to make people smile… which could result in anything from world-wide happiness drugs to surgically altering our faces into permanent smiles to making lots of tiny models of perfectly-smiling humans.
It’s not that the AI is evil, it’s that programmers are stupid. See the previous articles here about memetic immunity: when you teach hunter-gatherer tribes about Christianity, they interrpret the bible literally and do all sorts of things that “real” Christians don’t. An AI isn’t going to be smart enough to not take you seriously when you tell it that:
its goal is to make humanity happy,
humanity consists of things that look like this [providing a picture], and
that being happy means you smile a lot
You don’t need to be very creative or smart to come up with LOTS of ways for this command sequence to have bugs with horrible consequences, if the AI has any ability to influence the world.
Most people, though, don’t grok this, because their brain filters off those possibilities. Of course, no human could be simultaneously so stupid as to make this mistake, while also being smart enough to actually do something dangerous. But that kind of simultaneous smartness/stupidity is how computers are by default.
(And if you say, “ah, but if we make an AI that’s like a human, it won’t have this problem”, then you have to bear in mind that this sort of smart/stupidness is endemic to human children as well. IOW, it’s a symptom of inadequate shared background, rather than being something specific to current-day computers or some particular programming paradigm.)
But you implicitly assume that it is given the incentive to develop the cognitive flexibility and comprehension to act in a real-world environment and do those things but at the same time you propose that the same people who are capable of giving it such extensive urges fail on another goal in such a blatant and obvious way. How does that make sense?
The difference between the hunter-gatherer and the AI is that the hunter-gatherer already posses a wide range of conceptual frameworks and incentives. An AI isn’t going to do something without someone to carefully and deliberately telling it do do so and what to do. It won’t just read the Bible and come to the conclusion that it should convert all humans to Christianity. Where would such an incentive come from?
The AI is certainly very creative and smart if it can influence the world dramatically. You allow it to be that smart, you allow it to care to do so, but you don’t allow it to comprehend what you actually mean? What I’m trying to pinpoint here is that you seem to believe that there are many pathways that lead to superhuman abilities yet all of them fail to comprehend some goals while still being able to self-improve on them.
Because people make stupid mistakes, especially when programming. And telling your fully-programmed AI what you want it to do still counts as programming.
At this point, I am going to stop my reply, because the remainder of your comment consists of taking things I said out of context and turning them into irrelevancies:
I didn’t say an AI would try to convert people to Christianity—I said that humans without sufficient shared background will interpret things literally, and so would AIs.
I didn’t say the AI needed to be creative or smart, I said you wouldn’t need to be creative or smart to make a list of ways those three simple instructions could be given a disastrous literal interpretation.
There are many paths to superhuman ability, as humans really aren’t that smart.
This also means that you can easily be superhuman in ability, and still really dumb—in terms of comprehending what humans mean… but don’t actually say.
Great comment. Allow me to emphasize that ‘smile’ here is just an extreme example. Most other descriptions humans give of happiness will end up with results just as bad. Ultimately any specification that we give it will be gamed ruthlessly.
Have you read Omohundro yet? Nick Tarleton repeatedly linked his papers for you in response to comments about this topic, they are quite on target and already written.
I’ve skimmed over it, see my response here. I found out that what I wrote is similar to what Ben Goertzel believes. I’m just trying to account for potential antipredictions, in this particular thread, that should be incorporated into any risk estimations.
Thanks.
There is more here now. I learnt that I hold a fundamental different definition of what constitutes an AGI. I guess that solves all issues.
Well my idea is not that creative, or even new, meaning that even if I hadn’t just posted it online an AI could still have conceivably read it somewhere else, and I do think creativity is a property of any sufficiently general intelligence that we might create, but those points are secondary.
No one here will argue that an unFriendly AI will do “bad things” because it doesn’t care (about what?). It will do bad things because it cares more about something else. Nor is “bad” an absolute: actions may be bad for some people and not for others, and there are moral systems under which actions can be firmly called “wrong”, but where all alternative actions are also “wrong”. Problems like that arise even for humans; in an AI the effects could be very ugly indeed.
And to clarify, I expect any AI that isn’t completely ignorant, let alone general, to know that we don’t like to be turned into broccoli. My example was of changing what humans want. Wireheading is the obvious candidate of a desire that an AI might want to implant.
What I meant is that the argument is that you have to make it care about humans so as not to harm them. Yet it is assumed that it does a lot without having to care about it, e.g. creating paperclips or self-improvement. My question is, why do people believe that you don’t have to make it care to do those things but you have to make it care to not harm humans. It is clear that if it only cares about one thing, doing that one thing could harm humans. Yet why would it do that one thing to an extent that is either not defined or which it is not deliberately made to care about. The assumptions seems to be that AI’s will do something, anything but being passive. Why isn’t limited behavior, failure and impassivity together not more likely than harming humans as a result of own goals or as a result to follow all goals but the one that limits its scope?