I greatly dislike the term “friendly AI”. The mechanisms behind “friendly AI” have nothing to do with friendship or mutual benefit. It would be more accurate to call it “slave AI”.
I think it’s more accurate, though the term “safe” has a much larger positive valence than is justified, and is so accurate but misleading. Particularly since it smuggles in EY’s presumptions about whom it’s safe for, and so whom we’re supposed to be rooting for, humans or transhumans. Safer is not always better. I’d rather get the concept of stasis or homogeneity in there. Stasis and homogeneity are, if not the values at the core of EY’s scheme, at least the most salient products of it.
For me Safe AI is one that is not existential risk. “Friendly” reminds me about “friendly user interface”, that is something superficial for core function.
Doesn’t exist? What do you mean by that, and what evidence do you have for believing it? Have you got some special revelation into the moral status of as-yet-hypothetical AIs? Some reason for thinking that it is more likely that beings of superhuman intelligence don’t have moral status than that they do?
The traditional argument is that there’s a vast space of possible optimization processes, and the vast majority of them don’t have humanlike consciousness or ego or emotions. Thus, we wouldn’t assign them human moral standing. AIXI isn’t a person and never will be.
A slightly stronger argument is that there’s no way in hell we’re going to build an AI that has emotions or ego or the ability to be offended by serving others wholeheartedly, because that would be super dangerous, and defeat the purpose of the whole project.
I like your second argument better. The first, I think, holds no water.
There are basically 2 explanations of morality, the pragmatic and the moral.
By pragmatic I mean the explanation that “moral” acts ultimately are a subset of the acts that increase our utility function. This includes evolutionary psychology, kin selection, and group selection explanations of morality. It also includes most pre-modern in-group/out-group moralities, like Athenian or Roman morality, and Nietzsche’s consequentialist “master morality”. A key problem with this approach is that if you say something like, “These African slaves seem to be humans rather like me, and we should treat them better,” that is a malfunctioning of your morality program that will decrease your genetic utility.
The moral explanation posits that there’s a “should” out there in the universe. This includes most modern religious morality, though many old (and contemporary) tribal religions were pragmatic and made practical claims (don’t do this or the gods will be angry), not moral ones.
Modern Western humanistic morality can be interpreted either way. You can say the rule not to hurt people is moral, or you can say it’s an evolved trait that gives higher genetic payoff.
The idea that we give moral standing to things like humans doesn’t work in either approach. If morality is in truth pragmatic, then you’ll assign them moral standing if they have enough power for it to be beneficial for you to do so, and otherwise not, regardless of whether they’re like humans or not. (Whether or not you know that’s what you’re doing.) Explanation of morality of pragmatic easily explains the popularity of slavery.
“Moral” morality, from my shoes, seems incompatible with the idea that we assign moral standing to things for looking or thinking like us. I feel no “oughtness” to “we should treat agents different from us like objects.” For one thing, it implies racism is morally right, and probably an obligation. For another, it’s pretty much exactly what most “moral leaders” have been trying to overcome for the past 2000 years.
It feels to me like what you’re doing is starting out by positing morality is pragmatic, and so we expect by default to assign moral status to things like us because that’s always a pragmatic thing to do and we’ve never had to admit moral status to things not like us. Then you extrapolate it into this novel circumstance, in which it might be beneficial to mutually agree with AIs that each of us has moral status. You’ve already agreed that morals are pragmatic at root, but you are consciously following your own evolved pragmatic programming, which tells you to accept as moral agents things that look like you. So you say, “Okay, I’ll just apply my evolved morality program, which I know is just a set of heuristics for increasing my genetic fitness and has no compelling oughtness to it, in this new situation, regardless of the outcome.” So you’re self-consciously trying to act like an animal that doesn’t know its evolved moral program has no oughtness to it. That’s really strange.
If you mean that humans are stupid and they’ll just apply that evolved heuristic without thinking about it, then that makes sense. But then you’re being descriptive. I assumed you were being prescriptive, though that’s based on my priors rather than on what you said.
I value other human beings, because I value the processes that go on inside my own head, and can recognize the same processes at work in others, thanks to my in-built empathy and theory of the mind. As such, I prefer that good things happen to them rather than bad. There isn’t any universal ‘shouldness’ to it, it’s just the way that I’d rather things be. And, since most other humans have similar values, we can work together, arm in arm. Our values converge rather than diverge. That’s morality.
I extend that value to those of different races and cultures, because I can see that they embody the same conscious processes that I value. I do not extend that same value to brain dead people, fetuses, or chickens, because I don’t see that value present within them. The same goes for a machine that has a very alien cognitive architecture and doesn’t implement the cognitive algorithms that I value.
If you’re describing how you expect you’d act based on your feelings, then why do their algorithms matter? I would think your feelings would respond to their appearance and behavior.
There’s a very large space of possible algorithms, but the space of reasonable behaviors given the same circumstances is quite small. Humans, being irrational, often deviate bizarrely from the behavior I expect in a given circumstance—more so than any AI probably would.
Is “slave” a good word for something where if you screw up enslaving it you almost automatically become its slave (if it had even the least interest in you as anything but raw material)?
Too bad “Won’t kill us all horribly in an instant AI” isn’t very catchy. . .
Not with a lobotomy, no. But with a more sophisticated brain surgery/wipe that caused me to value spending time in your house and making you happy and so forth- then yes, after the operation I would probably consider you a friend, or something quite like it.
Obviously, as a Toggle who has not yet undergone such an operation, I consider it a hostile and unfriendly act. But that has no bearing on what our relationship is after the point in time where you get to arbitrarily decide what our relationship is.
There’s a difference between creating someone with certain values and altering someone’s values. For one thing, it’s possible to prohibit messing with someone’s values, but you can’t create someone without creating them with values. It’s not like you can create an ideal philosophy student of perfect emptiness.
I don’t mean you can feasibly program an AI to do that. I just mean that it’s something you can tell a human to do and they’d know what you mean. I’m talking about deontological ethics, not programming a safe AI.
How about if I get some DNA from Kate Upton, tweak it for high sex drive, low intelligence, low initiative, pliability, and a desperation to please, and then I grow a woman from it? Is she my friend?
If you design someone to serve your needs without asking that you serve theirs, the word “friend” is misleading. Friendship is mutually beneficial. I believe friendship signifies a relationship between two people that can be defined in operational terms, not a qualia that one person has. You can’t make someone actually be your friend just by hypnotizing them to believe they’re your friend.
Belief and feeling is probably part of the definition. It’s hard to imagine saying 2 people are friends without knowing it. But I think the pattern of mutually-beneficial behavior is also part of it.
I greatly dislike the term “friendly AI”. The mechanisms behind “friendly AI” have nothing to do with friendship or mutual benefit. It would be more accurate to call it “slave AI”.
I prefer term “Safe AI” as it more self explaining for the outsider.
I think it’s more accurate, though the term “safe” has a much larger positive valence than is justified, and is so accurate but misleading. Particularly since it smuggles in EY’s presumptions about whom it’s safe for, and so whom we’re supposed to be rooting for, humans or transhumans. Safer is not always better. I’d rather get the concept of stasis or homogeneity in there. Stasis and homogeneity are, if not the values at the core of EY’s scheme, at least the most salient products of it.
Safe AI sounds like it does what you say as long as it isn’t stupid. Friendly AIs are supposed to do whatever’s best.
For me Safe AI is one that is not existential risk. “Friendly” reminds me about “friendly user interface”, that is something superficial for core function.
“Slave” makes it sound like we’re making it do something against its will. “Benevolent AI” would be better.
Your lawnmower isn’t your slave. “Slave” prejudicially loads the concept with anthrocentric morality that does not actually exist.
Useful AI.
Doesn’t exist? What do you mean by that, and what evidence do you have for believing it? Have you got some special revelation into the moral status of as-yet-hypothetical AIs? Some reason for thinking that it is more likely that beings of superhuman intelligence don’t have moral status than that they do?
The traditional argument is that there’s a vast space of possible optimization processes, and the vast majority of them don’t have humanlike consciousness or ego or emotions. Thus, we wouldn’t assign them human moral standing. AIXI isn’t a person and never will be.
A slightly stronger argument is that there’s no way in hell we’re going to build an AI that has emotions or ego or the ability to be offended by serving others wholeheartedly, because that would be super dangerous, and defeat the purpose of the whole project.
I like your second argument better. The first, I think, holds no water.
There are basically 2 explanations of morality, the pragmatic and the moral.
By pragmatic I mean the explanation that “moral” acts ultimately are a subset of the acts that increase our utility function. This includes evolutionary psychology, kin selection, and group selection explanations of morality. It also includes most pre-modern in-group/out-group moralities, like Athenian or Roman morality, and Nietzsche’s consequentialist “master morality”. A key problem with this approach is that if you say something like, “These African slaves seem to be humans rather like me, and we should treat them better,” that is a malfunctioning of your morality program that will decrease your genetic utility.
The moral explanation posits that there’s a “should” out there in the universe. This includes most modern religious morality, though many old (and contemporary) tribal religions were pragmatic and made practical claims (don’t do this or the gods will be angry), not moral ones.
Modern Western humanistic morality can be interpreted either way. You can say the rule not to hurt people is moral, or you can say it’s an evolved trait that gives higher genetic payoff.
The idea that we give moral standing to things like humans doesn’t work in either approach. If morality is in truth pragmatic, then you’ll assign them moral standing if they have enough power for it to be beneficial for you to do so, and otherwise not, regardless of whether they’re like humans or not. (Whether or not you know that’s what you’re doing.) Explanation of morality of pragmatic easily explains the popularity of slavery.
“Moral” morality, from my shoes, seems incompatible with the idea that we assign moral standing to things for looking or thinking like us. I feel no “oughtness” to “we should treat agents different from us like objects.” For one thing, it implies racism is morally right, and probably an obligation. For another, it’s pretty much exactly what most “moral leaders” have been trying to overcome for the past 2000 years.
It feels to me like what you’re doing is starting out by positing morality is pragmatic, and so we expect by default to assign moral status to things like us because that’s always a pragmatic thing to do and we’ve never had to admit moral status to things not like us. Then you extrapolate it into this novel circumstance, in which it might be beneficial to mutually agree with AIs that each of us has moral status. You’ve already agreed that morals are pragmatic at root, but you are consciously following your own evolved pragmatic programming, which tells you to accept as moral agents things that look like you. So you say, “Okay, I’ll just apply my evolved morality program, which I know is just a set of heuristics for increasing my genetic fitness and has no compelling oughtness to it, in this new situation, regardless of the outcome.” So you’re self-consciously trying to act like an animal that doesn’t know its evolved moral program has no oughtness to it. That’s really strange.
If you mean that humans are stupid and they’ll just apply that evolved heuristic without thinking about it, then that makes sense. But then you’re being descriptive. I assumed you were being prescriptive, though that’s based on my priors rather than on what you said.
That’s… an odd way of thinking about morality.
I value other human beings, because I value the processes that go on inside my own head, and can recognize the same processes at work in others, thanks to my in-built empathy and theory of the mind. As such, I prefer that good things happen to them rather than bad. There isn’t any universal ‘shouldness’ to it, it’s just the way that I’d rather things be. And, since most other humans have similar values, we can work together, arm in arm. Our values converge rather than diverge. That’s morality.
I extend that value to those of different races and cultures, because I can see that they embody the same conscious processes that I value. I do not extend that same value to brain dead people, fetuses, or chickens, because I don’t see that value present within them. The same goes for a machine that has a very alien cognitive architecture and doesn’t implement the cognitive algorithms that I value.
If you’re describing how you expect you’d act based on your feelings, then why do their algorithms matter? I would think your feelings would respond to their appearance and behavior.
There’s a very large space of possible algorithms, but the space of reasonable behaviors given the same circumstances is quite small. Humans, being irrational, often deviate bizarrely from the behavior I expect in a given circumstance—more so than any AI probably would.
Is “slave” a good word for something where if you screw up enslaving it you almost automatically become its slave (if it had even the least interest in you as anything but raw material)?
Too bad “Won’t kill us all horribly in an instant AI” isn’t very catchy. . .
A slave with no desire to rebel. And no ability whatsoever to develop such a desire, of course.
It’s doable.
I disagree. I have no problem saying that friendship is the successful resolution of the value alignment problem. It’s not even a metaphor, really.
So if I lock you up in my house, and you try to run away, so I give you a lobotomy so that now you don’t run away, we’ve thereby become friends?
Not with a lobotomy, no. But with a more sophisticated brain surgery/wipe that caused me to value spending time in your house and making you happy and so forth- then yes, after the operation I would probably consider you a friend, or something quite like it.
Obviously, as a Toggle who has not yet undergone such an operation, I consider it a hostile and unfriendly act. But that has no bearing on what our relationship is after the point in time where you get to arbitrarily decide what our relationship is.
There’s a difference between creating someone with certain values and altering someone’s values. For one thing, it’s possible to prohibit messing with someone’s values, but you can’t create someone without creating them with values. It’s not like you can create an ideal philosophy student of perfect emptiness.
Only if you prohibit interacting with him in any way.
I don’t mean you can feasibly program an AI to do that. I just mean that it’s something you can tell a human to do and they’d know what you mean. I’m talking about deontological ethics, not programming a safe AI.
How about if I get some DNA from Kate Upton, tweak it for high sex drive, low intelligence, low initiative, pliability, and a desperation to please, and then I grow a woman from it? Is she my friend?
If you design someone to serve your needs without asking that you serve theirs, the word “friend” is misleading. Friendship is mutually beneficial. I believe friendship signifies a relationship between two people that can be defined in operational terms, not a qualia that one person has. You can’t make someone actually be your friend just by hypnotizing them to believe they’re your friend.
Belief and feeling is probably part of the definition. It’s hard to imagine saying 2 people are friends without knowing it. But I think the pattern of mutually-beneficial behavior is also part of it.
That too, but I would probably stress the free choice part. In particular, I don’t think friendship is possible across a large power gap.