Being you, you should strive towards that which you “really really prefer”.
Being me, I prefer what I “really really prefer”. You’ve not indicated why I “should” strive towards that which I “really really prefer”.
If a particular “moral principle” (whatever you choose to label as such) is suboptimal for you (and you’re not making choices for all of mankind, TDT or no), why would you endorse/glorify a suboptimal course of action?
When you are asking whether I “would” do something, is different than when you ask whether I “should” do something. Morality helps drive my volition, but it isn’t the sole decider.
That’s called a compromise for mutual benefit, and it shifts as the group of agents changes throughout history.
If you want to claim that that’s the historical/evolutionary reasons that the moral instinct evolved, I agree.
If you want to argue that that’s what morality is, then I disagree. Morality can drive someone to sacrifice their lives for others, so it’s obviously NOT always a “compromise for mutual benefit”.
If you want to argue that that’s what morality is, then I disagree.
Everybody defines his/her own variant of what they call “morality”, “right”, “wrong”, I simply suspect that the genesis of the whole “universally good” brouhaha stems from evolutionary evolved applied game theory, the “good of the tribe”. Which is fine. Luckily we could now move past being bound by such homo erectus historic constraints. That doesn’t mean we stop cooperating, we just start being more analytic about it. That would satisfy my preferences, that would be good.
Morality can drive someone to sacrifice their lives for others, so it’s obviously NOT always a “compromise for mutual benefit”.
Well, if the agent prefers sacrificing their existence for others, then doing so would be to their own benefit, no?
Well, if the agent prefers sacrificing their existence for others, then doing so would be to their own benefit, no?
sigh. Yes, given such a moral preference already in place, it somehow becomes to any person’s “benefit” (for a rather useless definition of “benefit”) to follow their morality.
But you previously argued that morality is a “compromise for mutual benefit”, so it would follow that it only is created in order to help partially satisfy some preexisting “benefit”. That benefit can’t be the mere satisfaction of itself.
I’ve called “an attempt at reconciling different preferences” a “compromise for mutual benefit”. Various people call various actions “moral”. The whole notion probably stems from cooperation within a tribe being of overall benefit, evolutionary speaking, but I don’t claim at all that “any moral action is a compromise for mutual benefit”. Who knows who calls what moral. The whole confused notion should be done away with, game theory ain’t be needing no “moral”.
What I am claiming is that there is non-trivial definition of morality (that is, other than “good = following your preferences”) which can convince a perfectly rational agent to change its own utility function to adopt more such “moral preferences”. Change, not merely relabel. The perfectly instrumentally rational agent does that which its utility functions wants. How would you even convince it otherwise? Hopefully this clarifies things a bit.
My own feeling is that if you stop being so dismissive, you’ll actually make some progress towards understanding “who calls what moral”.
What I am claiming is that there is non-trivial definition of morality (that is, other than “good = following your preferences”) which can convince a perfectly rational agent to change its own utility function to adopt more such “moral preferences”
Sure, unless someone already has a desire to be moral, talk of morality will be of no concern to them. I agree with that.
Edit: Because the scenario clarifies my position, allow me to elaborate on it:
Consider a perfectly rational agent. Its epistemic rationality is flawless, that is its model of its environment is impeccable. Its instrumental rationality, without peer. That is, it is really, really good at satisfying its preferences.
It encounters a human. The human talks about what the human wants, some of which the human calls “virtuous” and “good” and is especially adamant about.
You and I, alas, are far from that perfectly rational agent. As you say, if you already have a desire to enact some actions you call morally good, then you don’t need to “change” your utility function, you already have some preferences you call moral.
The question is for those who do not have a desire to do what you call moral (or who insist on their own definition, as nearly everybody does), on what grounds should they even start caring about what you call “moral”? As you say, they shouldn’t, unless it benefits them in some way (e.g. makes their mammal brains feel good about being a Good Person (tm)). So what’s the hubbub?
As you say, they shouldn’t, unless it benefits them in some way
I’ve already said that unless someone already desires to be moral, babbling about morality won’t do anything for them. I didn’t say it “shouldn’t” (please stop confusing these two verbs)
But then you also seem to conflate this with a different issue—of what to do with someone who does want to be moral, but understands morality differently than I do.
Which is an utterly different issue. First of all people often have different definitions to describe the same concepts—that’s because quite clearly the human brain doesn’t work with definitions, but with fuzzy categorizations and instinctive “I know it when I see it” which we then attempt to make into definition when we attempt to communicate said concepts to others.
But the very fact we use the same word “morality”, means we identify some common elements of what “morality” means. If we didn’t mean anything similar to each other, we wouldn’t be using the same word to describe it.
I find that supposedly different moralities seem to have some very common elements to them—e.g. people tend to prefer that other people be moral. People generally agree that moral behaviour by everyone leads to happier, healthier societies. They tend to disagree about what that behaviour is, but the effects they describe tend to be common.
I might disagree with Kasparov about what the best next chess move would be, and that doesn’t mean it’s simply a matter of preference—we have a common understanding that the best moves are the ones that lead to an advantageous position. So, though we disagree on the best move, we have an agreement on the results of the best move.
I didn’t say it “shouldn’t” (please stop confusing these two verbs)
What you did say was “of no concern”, and “won’t do anything for them”, which (unless you assume infinite resources) translates to “shouldn’t”. It’s not “conflating”. Let’s stay constructive.
People generally agree that moral behaviour by everyone leads to happier, healthier societies.
Such as in Islamic societies. Wrong fuzzy morality cloud?
But the very fact we use the same word “morality”, means we identify some common elements of what “morality” means. If we didn’t mean anything similar to each other, we wouldn’t be using the same word to describe it.
Sure. What it does not mean, however, is that in between these fuzzily connected concepts is some actual, correct, universal notion of morality. Or would you take some sort of “mean”, which changes with time and social conventions?
If everybody had some vague ideas about games called chess_1 to chess_N, with N being in the millions, that would not translate to some universally correct and acceptable definition of the game of chess. Fuzzy human concepts can’t be assuemd to yield some iron-clad core just beyond our grasp, if only we could blow the fuzziness away. People for the most part agree what to classify as a chair. That doesn’t mean there is some ideal chair we can strive for.
When checking for best moves in pre-defined chess there are definite criteria. There are non-arbitrary metrics to measure “best” by. Kasparov’s proposed chess move can be better than your proposed chess move, using clear and obvious metrics. The analogy doesn’t pan out:
With the fuzzy clouds of what’s “moral”, an outlier could—maybe—say “well, I’m clearly an outlier”, but that wouldn’t necessitate any change, because there is no objective metric to go by. Preferences aren’t subject to Aumann’s, or to a tyranny of the (current societal) majority.
People generally agree that moral behaviour by everyone leads to happier, healthier societies
Such as in Islamic societies. Wrong fuzzy morality cloud?
No, Islamic societies suffer from the delusion that Allah exists. If Allah existed (an omnipotent creature that punishes you horribly if you fail to obey Quran’s commandments), then Islamic societies would have the right idea.
Remove their false belief in Allah, and I fail to see any great moral difference between our society and Islamic ones.
You’re treating desires as simpler than they often are in humans. Someone can have no desire to be moral because they have a mistaken idea of what morality is or requires, are internally inconsistent, or have mistaken beliefs about how states of the world map to their utility function—to name a few possibilities. So, if someone told me that they have no desire to do what I call moral, I would assume that they have mistaken beliefs about morality, for reasons like the ones I listed. If there were beings that had all the relevant information, were internally consistent, and used words with the same sense that I use them, and they still had no desire to do what I call moral, then there would be on way for me to convince them, but this doesn’t describe humans.
So not doing what you call moral implies “mistaken beliefs”? How, why?
Does that mean, then, that unfriendly AI cannot exist? Or is it just that a superior agent which does not follow your morality is somehow faulty? It might not care much. (Neither should fellow humans who do not adhere to your ‘correct’ set of moral actions. Just saying “everybody needs to be moral” doesn’t change any rational agent’s preferences. Any reasoning?)
So not doing what you call moral implies “mistaken beliefs”? How, why?
For a human, yes. Explaining why this is the case would require several Main-length posts about ethical egoism, human nature and virtue ethics, and other related topics. It’s a lot to go into. I’m happy to answer specific questions, but a proper answer would require describing much of (what I believe to be) morality. I will attempt to give what must be a very incomplete answer.
It’s not about what I call moral, but what is actually moral. There is a variety of reasons (upbringing, culture, bad habits, mental problems, etc) that can cause people to have mistaken beliefs about what’s moral. Much of what is moral is because of what’s good for a person because of human nature. People’s preferences can be internally inconsistent, and actually are inconsistent when they ignore or don’t fully integrate this part of their preferences.
An AI doesn’t have human nature, so it can be internally consistent while not doing what’s moral, but I believe that if a human is immoral, it’s a case of internal inconsistency (or lack of knowledge).
Is it something about the human brain? But brains evolve over time, both from genetic and from environmental influences. Worse, different human subpopulations often evolve (slightly) different paths! So which humans do you claim as a basis from which to define the one and only correct “human morality”?
Noting that humans share many characteristics is an ‘is’, not an ‘ought’. Also, this “common human nature” as exemplified throughout history is … non too pretty as a base for some “universal mandatory morality”. Yes, compared to random other mind designs pulled from mindspace, all human minds appear very similar. Doesn’t imply at all that they all should strive to be similar, or to follow a similar ‘codex’. Where do you get that from? It’s like religion, minus god.
What you’re saying that if you want to be a real human, you have to be moral? What species am I, then?
Declaring that most humans have two legs doesn’t mean that every human should strive to have exactly two legs. Can’t derive an ‘ought’ from an ‘is’.
Yes, human nature is an “is”. It’s important because it shapes people’s preferences, or, more relevantly, it shapes what makes people happy. It’s not that people should strive to have two legs, but that they already have two legs, but are ignoring them. There is no obligation to be human—but you’re already human, and thus human nature is already part of you.
What you’re saying that if you want to be a real human, you have to be moral?
No, I’m saying that because you are human, it is inconsistent of you to not want to be moral.
I feel like the discussion is stalling at this point. It comes down to you saying “if you’re human you should want to be moral, because humans should be moral”, which to me is as non-sequitur as it gets.
There is no obligation to be human—but you’re already human, and thus human nature is already part of you.
Except if my utility function doesn’t encompass what you think is “moral” and I’m human, then “following human morality” doesn’t quite seem to be a prerequisite to be a “true” human, no?
It comes down to you saying “if you’re human you should want to be moral, because humans should be moral”
No, that isn’t what I’m saying. I’m saying that if you’re human, you should want to be moral, because wanting to be moral follows from the desires of a human with consistent preferences, due in part to human nature.
if my utility function doesn’t encompass what you think is “moral” and I’m human
Then I dispute that your utility function is what you think it is.
I’m saying that if you’re human, you should want to be moral, because wanting to be moral follows from the desires of a human with consistent preferences, due in part to human nature.
The error as I see it is that “human nature”, whatever you see as such, is a statement about similarities, it isn’t a statement about how things should be.
It’s like saying “a randomly chosen positive natural number is really big, so all numbers should be really big”. How do you see that differently?
We’ve already established that agents can have consistent preferences without adhering to what you think of as “universal human morality”. Child soldiers are human. Their preferences sure can be brutal, but they can be as internally consistent or inconsistent as those of anyone else. I sure would like to change their preferences, because I’d prefer for them to be different, not because some ‘idealised human spirit’ / ‘psychic unity of mankind’ ideal demands so.
Then I dispute that your utility function is what you think it is.
Proof by demonstration? Well, lock yourself in a cellar with only water and send me a key, I’ll send it back FedEx with instructions to set you free, after a week. Would that suffice? I’d enjoy proving that I know my own utility function better than you know my utility function (now that would be quite weird), I wouldn’t enjoy the suffering. Who knows, might even be healthy overall.
It’s like saying “a randomly chosen positive natural number is really big, so all numbers should be really big”.
You can’t randomly choose a positive natural number using an even distribution. If you use an uneven distribution, whether the result is likely to be big depends on how your distribution compares to your definition of “big”.
Choose from those positive numbers that a C++ int variable can contain, or any other* non-infinite subset of positive natural numbers, then. The point is the observation of “most numbers need more than 1 digit to be expressed” not implying in any way some sort of “need” for the 1-digit numbers to “change”, to satisfy the number fairy, or some abstract concept thereof.
* (For LW purposes: Any other? No, not any other. Choose one with a cardinality of at least 10^6. Heh.)
The erro as I see it is that “human nature”, whatever you see as such, is a statement about similarities, it isn’t a statement about how things should be.
It is a statement about similarities, but it’s about a similarity that shapes what people should do. I don’t know how I can explain it without repeating myself, but I’ll try.
For an analogy, let’s consider beings that aren’t humans. Paperclip maximizers, for example. Except these paperclip maximizers aren’t AIs, but a species that somehow evolved biologically. They’re not perfect reasoners and can have internally inconsistent preferences. These paperclip maximizers can prefer to do something that isn’t paperclip-maximizing, even though that is contrary to their nature—that is, if they were to maximize paperclips, they would prefer it to whatever they were doing earlier. One day, a paperclip maximizer who is maximizing paperclips tells his fellow clippies, “You should maximize paperclips, because if you did, you would prefer to, as it is your nature”. This clippy’s statement is true—the clippies’ nature is such that if they maximized clippies, they would prefer it to other goals. So, regardless of what other clippies are actually doing, the utility-maximizing thing for them to do would be to maximize paperclips.
So it is with humans. Upon discovering/realizing/deriving what is moral and consistently acting/being moral, the agent would find that being moral is better than the alternative. This is in part due to human nature.
We’ve already established that agents can have consistent preferences without adhering to what you think of as “universal human morality”.
Agents, yes. Humans, no. Just like the clippies can’t have consistent preferences if they’re not maximizing paperclips.
Proof by demonstration? Well, lock yourself in a cellar with only water and send me a key, I’ll send it back FedEx with instructions to set you free, after a week. Would that suffice?
What would that prove? Also, I don’t claim that I know the entirety of your utility function better than you do—you know much better than I do what kind of ice cream you prefer, what TV shows you like to watch, etc. But those have little to do with human nature in the sense that we’re talking about it here.
A clippy which isn’t maximizing paperclips is not a clippy.
It’s a clippy because it would maximize paperclips if it had consistent preferences and sufficient knowledge.
That my utility function includes something which you’d probably consider immoral.
I don’t dispute that this is possible. What I dispute is that your utility function would contain that if you were internally consistent (and had knowledge of what being moral is like).
The desires of an agent are defined by its preferences. “This is a paperclip maximizer which does not want to maximize paperclips” is a contradiction in terms. And what do you mean by “consistent”, do you mean “consistent with ‘human nature’? Who cares? Or consistent within themselves? Highly doubtful, what would internal consistency have to do with being an altruist? If there’s anything which is characteristic of “human nature”, it is the inconsistency of their preferences.
A human which doesn’t share what you think of as “correct” values (may I ask, not disparagingly, are you religious?) is still a human. An unusual one, maybe (probably not), but an agent not in “need” of any change towards more “moral” values. Stalin may have been happy the way he was.
I don’t dispute that this is possible. What I dispute is that your utility function would contain that if you were internally consistent (and had knowledge of what being moral is like).
Because of the warm fuzzies? The social signalling? Is being moral awsome, or deeply fulfilling? Are you internally consistent … ?
“This is a paperclip maximizer which does not want to maximize paperclips” is a contradiction in terms.
Call it a quasi-paperclip maximizer, then. I’m not interested in disputing definitions. Whatever you call it, it’s a being whose preferences are not necessarily internally consistent, but when they are, it prefers to maximize paperclips. When its preferences are internally inconsistent, it may prefer to do things and have goals other than maximizing paperclips.
Highly doubtful, what would internal consistency have to do with being an altruist?
There’s no necessary connection between the two, but I’m not equating morality and altruism. Morality is what one should do and/or how one should be, which need not be altruistic.
Humans can have incorrect values and still be human, but in that case they are internally inconsistent., because of the preferences they have due to human nature. I’m not saying that humans should strive to have human nature, I’m saying that they already have it. I doubt that Stalin was happy—just look at how paranoid he was. And no, I’m not religious, and have never been.
Because of the warm fuzzies? The social signalling? Is being moral awsome, or deeply fulfilling?
Yes to the first and third questions, Being moral is awesome and fulfilling. It makes you feel happier, more fulfilled, more stable, and similar feelings. It doesn’t guarantee happiness, but it contributes to it both directly (being moral feels good) and indirectly (it helps you make good decisions). It makes you stronger and more resilient (once you’ve internalized it fully). It’s hard to describe beyond that, but good feels good (TVTropes warning).
I think I’m internally consistent. I’ve been told that I am. It’s unlikely that I’m perfectly consistent, but whatever inconsistencies I have are probably minor. I’m open to having them addressed, whatever they are.
Claiming that Stalin wasn’t happy sounds like a variation of sour grapes where not only can you not be as successful as him, it would be actively uncomfortable for you to believe that someone who lacks compassion can be happy, so you claim that he’s not.
It’s true he was paranoid but it’s also true that in the real world, there are tradeoffs and you don’t see people becoming happy with no downsides whatsoever—claiming that this disqualifies them from being called happy eviscerates the word of meaning.
I’m also not convinced that Stalin’s “paranoia” was paranoia (it seems rationa for someone who doesn’t care about the welfare of others and can increase his safety by instilling fear and treating everyone as enemies to do so). I would also caution against assuming that since Stalin’s paranoia is prominent enough for you to have heard of it, it’s too big a deal for him to have been happy—it’s promiment enough for you to have heard of it because it was a big deal to the people affected by it, which is unrelated to how much it affected his happiness.
Stalin was paranoid even by the standards of major world leaders. Khrushchev wasn’t so paranoid, for example. Stalin saw enemies behind every corner. That is not a happy existence.
Khruschev was deposed. Stalin stayed dictator until he died of natural causes. That suggests that Khruschev wasn’t paranoid enough, while Stalin was appropriately paranoid.
Seeing enemies around every corner meant that sometimes he saw enemies that weren’t there, but it was overall adaptive because it resulted in him not getting defeated by any of the enemies that actually existed. (Furthermore, going against nonexistent enemies can be beneficial insofar as the ruthlessness in going after them discourages real enemies.)
Stalin saw enemies behind every corner. That is not a happy existence.
How does the last sentence follow from the previous one? It’s certainly not as happy an existence as it would have been if he had no enemies, but as I pointed out, nobody’s perfectly happy. There are always tradeoffs and we don’t claim that the fact that someone had to do something to gain his happiness automatically makes that happiness fake.
Stalin refused to believe Hitler would attack him, probably since that would be suicidally stupid on the attacker’s part. Was he paranoid, or did he update?
The desires of an agent are defined by its preferences. “This is a paperclip maximizer which does not want to maximize paperclips” is a contradiction in terms.
I’m not sure “preference” is a powerful enough term to capture an agent’s true goals, however defined. Consider any of the standard preference reversals: a heavy cigarette smoker, for example, might prefer to buy and consume their next pack in a Near context, yet prefer to quit in a Far. The apparent contradiction follows quite naturally from time discounting, yet neither interpretation of the person’s preferences is obviously wrong.
Proof by demonstration? Well, lock yourself in a cellar with only water and send me a key, I’ll send it back FedEx with instructions to set you free, after a week.
That would only prove that you think you want to do that. The issue is that what you think you want and what you actually want do not generally coincide, because of imperfect self-knowledge, bounded thinking time, etc.
I don’t know about child soldiers, but it’s fairly common for amateur philosophers to argue themselves into thinking they “should” be perfectly selfish egoists, or hedonistic utilitarians, because logic or rationality demands it. They are factually mistaken, and to the extent that they think they want to be egoists or hedonists, their “preferences” are inconsistent, because if they noticed the logical flaw in their argument they would change their minds.
That would only prove that you think you want to do that.
Isn’t that when I throw up my arms and say “congratulations, your hypothesis is unfalsifiable, the dragon is permeable to fluor”. What experimental setup would you suggest? Would you say any statement about one’s preferences is moot? It seems that we’re always under bounded thinking time constraints. Maybe the paperclipper really wants to help humankind and be moral, and mistakingly thinks otherwise. Who would know, it optimized its own actions under resource constraints, and then there’s the ‘Löbstacle’ to consider.
Is saying “I like vanilla ice cream” FAI-complete and must never be uttered or relied upon by anyone?
it’s fairly common for amateur philosophers to argue themselves into thinking they “should” be perfectly selfish egoists, or hedonistic utilitarians, because logic or rationality demands it
Or argue themselves into thinking that there is some subset of preferences such every other (human?) agent should voluntarily choose to adopt them, against their better judgment (edit; as it contradicts what they (perceive, after thorough introspection) as their own preferences)? You can add “objective moralists” to the list.
What would it be that is present in every single human’s brain architecture throughout human history that would be compatible with some fixed ordering over actions, called “morally good”? (Otherwise you’d have your immediate counterexample.) The notion seems so obviously ill-defined and misguided (hence my first comment asking Cousin_It).
It’s fine (to me) to espouse preferences that aim to change other humans (say, towards being more altruistic, or towards being less altruistic, or whatever), but to appeal to some objective guiding principle based on “human nature” (which constantly evolves in different strands) or some well-sounding ev-psych applause-light is just a new substitute for the good old Abrahamic heavenly father.
Would you say any statement about one’s preferences is moot? It seems that we’re always under bounded thinking time constraints. Maybe the paperclipper really wants to help humankind and be moral, and mistakingly thinks otherwise. Who would know, it optimized its own actions under resource constraints, and then there’s the ‘Löbstacle’ to consider.
Is saying “I like vanilla ice cream” FAI-complete and must never be uttered or relied upon by anyone?
I wouldn’t say any of those things. Obviously paperclippers don’t “really want to help humankind”, because they don’t have any human notion of morality built-in in the first place. Statements like “I like vanilla ice cream” are also more trustworthy on account of being a function of directly observable things like how you feel when you eat it.
The only point I’m trying to make here is that it is possible to be mistaken about your own utility function. It’s entirely consistent for the vast majority of humans to have a large shared portion of their built-in utility function (built-in by their genes), even though many of them seemingly want to do bad things, and that’s because humans are easily confused and not automatically self-aware.
It is possible to be mistaken about your own utility function.
For sure.
It’s entirely consistent for the vast majority of humans to have a large shared portion of their built-in utility function (built-in by their genes), even though many of them seemingly want to do bad things
I’d agree if humans were like dishwashers. There are templates for dishwashers, ways they are supposed to work. If you came across a broken dishwasher, there could be a case for the dishwasher to be repaired, to go back to “what it’s supposed to be”.
However, that is because there is some external authority (exasparated humans who want to fix their damn dishwasher, dirty dishes are piling up) conceiving of and enforcing such a purpose. The fact that genes and the environment shape utility functions in similar ways is a description, not a prescription. It would not be a case for any “broken” human to go back to “what his genes would want him to be doing”. Just like it wouldn’t be a case against brain uploading.
Some of the discussion seems to me like saying that “deep down in every flawed human, there is ‘a figure of light’, in our community ‘a rational agent following uniform human values with slight deviations accounting for ice-cream taste’, we just need to dig it up”. There is only your brain. With its values. There is no external standard to call its values flawed. There are external standards (rationality = winning) to better its epistemic and instrumental rationality, but those can help the serial killer and the GiveWell activist equally. (Also, both of those can be ‘mistaken’ about their values.)
Being me, I prefer what I “really really prefer”. You’ve not indicated why I “should” strive towards that which I “really really prefer”.
When you are asking whether I “would” do something, is different than when you ask whether I “should” do something. Morality helps drive my volition, but it isn’t the sole decider.
If you want to claim that that’s the historical/evolutionary reasons that the moral instinct evolved, I agree.
If you want to argue that that’s what morality is, then I disagree. Morality can drive someone to sacrifice their lives for others, so it’s obviously NOT always a “compromise for mutual benefit”.
Everybody defines his/her own variant of what they call “morality”, “right”, “wrong”, I simply suspect that the genesis of the whole “universally good” brouhaha stems from evolutionary evolved applied game theory, the “good of the tribe”. Which is fine. Luckily we could now move past being bound by such homo erectus historic constraints. That doesn’t mean we stop cooperating, we just start being more analytic about it. That would satisfy my preferences, that would be good.
Well, if the agent prefers sacrificing their existence for others, then doing so would be to their own benefit, no?
sigh. Yes, given such a moral preference already in place, it somehow becomes to any person’s “benefit” (for a rather useless definition of “benefit”) to follow their morality.
But you previously argued that morality is a “compromise for mutual benefit”, so it would follow that it only is created in order to help partially satisfy some preexisting “benefit”. That benefit can’t be the mere satisfaction of itself.
I’ve called “an attempt at reconciling different preferences” a “compromise for mutual benefit”. Various people call various actions “moral”. The whole notion probably stems from cooperation within a tribe being of overall benefit, evolutionary speaking, but I don’t claim at all that “any moral action is a compromise for mutual benefit”. Who knows who calls what moral. The whole confused notion should be done away with, game theory ain’t be needing no “moral”.
What I am claiming is that there is non-trivial definition of morality (that is, other than “good = following your preferences”) which can convince a perfectly rational agent to change its own utility function to adopt more such “moral preferences”. Change, not merely relabel. The perfectly instrumentally rational agent does that which its utility functions wants. How would you even convince it otherwise? Hopefully this clarifies things a bit.
My own feeling is that if you stop being so dismissive, you’ll actually make some progress towards understanding “who calls what moral”.
Sure, unless someone already has a desire to be moral, talk of morality will be of no concern to them. I agree with that.
Edit: Because the scenario clarifies my position, allow me to elaborate on it:
Consider a perfectly rational agent. Its epistemic rationality is flawless, that is its model of its environment is impeccable. Its instrumental rationality, without peer. That is, it is really, really good at satisfying its preferences.
It encounters a human. The human talks about what the human wants, some of which the human calls “virtuous” and “good” and is especially adamant about.
You and I, alas, are far from that perfectly rational agent. As you say, if you already have a desire to enact some actions you call morally good, then you don’t need to “change” your utility function, you already have some preferences you call moral.
The question is for those who do not have a desire to do what you call moral (or who insist on their own definition, as nearly everybody does), on what grounds should they even start caring about what you call “moral”? As you say, they shouldn’t, unless it benefits them in some way (e.g. makes their mammal brains feel good about being a Good Person (tm)). So what’s the hubbub?
I’ve already said that unless someone already desires to be moral, babbling about morality won’t do anything for them. I didn’t say it “shouldn’t” (please stop confusing these two verbs)
But then you also seem to conflate this with a different issue—of what to do with someone who does want to be moral, but understands morality differently than I do.
Which is an utterly different issue. First of all people often have different definitions to describe the same concepts—that’s because quite clearly the human brain doesn’t work with definitions, but with fuzzy categorizations and instinctive “I know it when I see it” which we then attempt to make into definition when we attempt to communicate said concepts to others.
But the very fact we use the same word “morality”, means we identify some common elements of what “morality” means. If we didn’t mean anything similar to each other, we wouldn’t be using the same word to describe it.
I find that supposedly different moralities seem to have some very common elements to them—e.g. people tend to prefer that other people be moral. People generally agree that moral behaviour by everyone leads to happier, healthier societies. They tend to disagree about what that behaviour is, but the effects they describe tend to be common.
I might disagree with Kasparov about what the best next chess move would be, and that doesn’t mean it’s simply a matter of preference—we have a common understanding that the best moves are the ones that lead to an advantageous position. So, though we disagree on the best move, we have an agreement on the results of the best move.
What you did say was “of no concern”, and “won’t do anything for them”, which (unless you assume infinite resources) translates to “shouldn’t”. It’s not “conflating”. Let’s stay constructive.
Such as in Islamic societies. Wrong fuzzy morality cloud?
Sure. What it does not mean, however, is that in between these fuzzily connected concepts is some actual, correct, universal notion of morality. Or would you take some sort of “mean”, which changes with time and social conventions?
If everybody had some vague ideas about games called chess_1 to chess_N, with N being in the millions, that would not translate to some universally correct and acceptable definition of the game of chess. Fuzzy human concepts can’t be assuemd to yield some iron-clad core just beyond our grasp, if only we could blow the fuzziness away. People for the most part agree what to classify as a chair. That doesn’t mean there is some ideal chair we can strive for.
When checking for best moves in pre-defined chess there are definite criteria. There are non-arbitrary metrics to measure “best” by. Kasparov’s proposed chess move can be better than your proposed chess move, using clear and obvious metrics. The analogy doesn’t pan out:
With the fuzzy clouds of what’s “moral”, an outlier could—maybe—say “well, I’m clearly an outlier”, but that wouldn’t necessitate any change, because there is no objective metric to go by. Preferences aren’t subject to Aumann’s, or to a tyranny of the (current societal) majority.
No, Islamic societies suffer from the delusion that Allah exists. If Allah existed (an omnipotent creature that punishes you horribly if you fail to obey Quran’s commandments), then Islamic societies would have the right idea.
Remove their false belief in Allah, and I fail to see any great moral difference between our society and Islamic ones.
You’re treating desires as simpler than they often are in humans. Someone can have no desire to be moral because they have a mistaken idea of what morality is or requires, are internally inconsistent, or have mistaken beliefs about how states of the world map to their utility function—to name a few possibilities. So, if someone told me that they have no desire to do what I call moral, I would assume that they have mistaken beliefs about morality, for reasons like the ones I listed. If there were beings that had all the relevant information, were internally consistent, and used words with the same sense that I use them, and they still had no desire to do what I call moral, then there would be on way for me to convince them, but this doesn’t describe humans.
So not doing what you call moral implies “mistaken beliefs”? How, why?
Does that mean, then, that unfriendly AI cannot exist? Or is it just that a superior agent which does not follow your morality is somehow faulty? It might not care much. (Neither should fellow humans who do not adhere to your ‘correct’ set of moral actions. Just saying “everybody needs to be moral” doesn’t change any rational agent’s preferences. Any reasoning?)
For a human, yes. Explaining why this is the case would require several Main-length posts about ethical egoism, human nature and virtue ethics, and other related topics. It’s a lot to go into. I’m happy to answer specific questions, but a proper answer would require describing much of (what I believe to be) morality. I will attempt to give what must be a very incomplete answer.
It’s not about what I call moral, but what is actually moral. There is a variety of reasons (upbringing, culture, bad habits, mental problems, etc) that can cause people to have mistaken beliefs about what’s moral. Much of what is moral is because of what’s good for a person because of human nature. People’s preferences can be internally inconsistent, and actually are inconsistent when they ignore or don’t fully integrate this part of their preferences.
An AI doesn’t have human nature, so it can be internally consistent while not doing what’s moral, but I believe that if a human is immoral, it’s a case of internal inconsistency (or lack of knowledge).
Is it something about the human brain? But brains evolve over time, both from genetic and from environmental influences. Worse, different human subpopulations often evolve (slightly) different paths! So which humans do you claim as a basis from which to define the one and only correct “human morality”?
Despite the differences, there is a common human nature. There is a Psychological Unity of Humankind.
Noting that humans share many characteristics is an ‘is’, not an ‘ought’. Also, this “common human nature” as exemplified throughout history is … non too pretty as a base for some “universal mandatory morality”. Yes, compared to random other mind designs pulled from mindspace, all human minds appear very similar. Doesn’t imply at all that they all should strive to be similar, or to follow a similar ‘codex’. Where do you get that from? It’s like religion, minus god.
What you’re saying that if you want to be a real human, you have to be moral? What species am I, then?
Declaring that most humans have two legs doesn’t mean that every human should strive to have exactly two legs. Can’t derive an ‘ought’ from an ‘is’.
Yes, human nature is an “is”. It’s important because it shapes people’s preferences, or, more relevantly, it shapes what makes people happy. It’s not that people should strive to have two legs, but that they already have two legs, but are ignoring them. There is no obligation to be human—but you’re already human, and thus human nature is already part of you.
No, I’m saying that because you are human, it is inconsistent of you to not want to be moral.
I feel like the discussion is stalling at this point. It comes down to you saying “if you’re human you should want to be moral, because humans should be moral”, which to me is as non-sequitur as it gets.
Except if my utility function doesn’t encompass what you think is “moral” and I’m human, then “following human morality” doesn’t quite seem to be a prerequisite to be a “true” human, no?
No, that isn’t what I’m saying. I’m saying that if you’re human, you should want to be moral, because wanting to be moral follows from the desires of a human with consistent preferences, due in part to human nature.
Then I dispute that your utility function is what you think it is.
The error as I see it is that “human nature”, whatever you see as such, is a statement about similarities, it isn’t a statement about how things should be.
It’s like saying “a randomly chosen positive natural number is really big, so all numbers should be really big”. How do you see that differently?
We’ve already established that agents can have consistent preferences without adhering to what you think of as “universal human morality”. Child soldiers are human. Their preferences sure can be brutal, but they can be as internally consistent or inconsistent as those of anyone else. I sure would like to change their preferences, because I’d prefer for them to be different, not because some ‘idealised human spirit’ / ‘psychic unity of mankind’ ideal demands so.
Proof by demonstration? Well, lock yourself in a cellar with only water and send me a key, I’ll send it back FedEx with instructions to set you free, after a week. Would that suffice? I’d enjoy proving that I know my own utility function better than you know my utility function (now that would be quite weird), I wouldn’t enjoy the suffering. Who knows, might even be healthy overall.
You can’t randomly choose a positive natural number using an even distribution. If you use an uneven distribution, whether the result is likely to be big depends on how your distribution compares to your definition of “big”.
Choose from those positive numbers that a C++ int variable can contain, or any other* non-infinite subset of positive natural numbers, then. The point is the observation of “most numbers need more than 1 digit to be expressed” not implying in any way some sort of “need” for the 1-digit numbers to “change”, to satisfy the number fairy, or some abstract concept thereof.
* (For LW purposes: Any other? No, not any other. Choose one with a cardinality of at least 10^6. Heh.)
It is a statement about similarities, but it’s about a similarity that shapes what people should do. I don’t know how I can explain it without repeating myself, but I’ll try.
For an analogy, let’s consider beings that aren’t humans. Paperclip maximizers, for example. Except these paperclip maximizers aren’t AIs, but a species that somehow evolved biologically. They’re not perfect reasoners and can have internally inconsistent preferences. These paperclip maximizers can prefer to do something that isn’t paperclip-maximizing, even though that is contrary to their nature—that is, if they were to maximize paperclips, they would prefer it to whatever they were doing earlier. One day, a paperclip maximizer who is maximizing paperclips tells his fellow clippies, “You should maximize paperclips, because if you did, you would prefer to, as it is your nature”. This clippy’s statement is true—the clippies’ nature is such that if they maximized clippies, they would prefer it to other goals. So, regardless of what other clippies are actually doing, the utility-maximizing thing for them to do would be to maximize paperclips.
So it is with humans. Upon discovering/realizing/deriving what is moral and consistently acting/being moral, the agent would find that being moral is better than the alternative. This is in part due to human nature.
Agents, yes. Humans, no. Just like the clippies can’t have consistent preferences if they’re not maximizing paperclips.
What would that prove? Also, I don’t claim that I know the entirety of your utility function better than you do—you know much better than I do what kind of ice cream you prefer, what TV shows you like to watch, etc. But those have little to do with human nature in the sense that we’re talking about it here.
A clippy which isn’t maximizing paperclips is not a clippy.
A human which isn’t adhering to your moral codex is still a human.
That my utility function includes something which you’d probably consider immoral.
It’s a clippy because it would maximize paperclips if it had consistent preferences and sufficient knowledge.
I don’t dispute that this is possible. What I dispute is that your utility function would contain that if you were internally consistent (and had knowledge of what being moral is like).
The desires of an agent are defined by its preferences. “This is a paperclip maximizer which does not want to maximize paperclips” is a contradiction in terms. And what do you mean by “consistent”, do you mean “consistent with ‘human nature’? Who cares? Or consistent within themselves? Highly doubtful, what would internal consistency have to do with being an altruist? If there’s anything which is characteristic of “human nature”, it is the inconsistency of their preferences.
A human which doesn’t share what you think of as “correct” values (may I ask, not disparagingly, are you religious?) is still a human. An unusual one, maybe (probably not), but an agent not in “need” of any change towards more “moral” values. Stalin may have been happy the way he was.
Because of the warm fuzzies? The social signalling? Is being moral awsome, or deeply fulfilling? Are you internally consistent … ?
Call it a quasi-paperclip maximizer, then. I’m not interested in disputing definitions. Whatever you call it, it’s a being whose preferences are not necessarily internally consistent, but when they are, it prefers to maximize paperclips. When its preferences are internally inconsistent, it may prefer to do things and have goals other than maximizing paperclips.
There’s no necessary connection between the two, but I’m not equating morality and altruism. Morality is what one should do and/or how one should be, which need not be altruistic.
Humans can have incorrect values and still be human, but in that case they are internally inconsistent., because of the preferences they have due to human nature. I’m not saying that humans should strive to have human nature, I’m saying that they already have it. I doubt that Stalin was happy—just look at how paranoid he was. And no, I’m not religious, and have never been.
Yes to the first and third questions, Being moral is awesome and fulfilling. It makes you feel happier, more fulfilled, more stable, and similar feelings. It doesn’t guarantee happiness, but it contributes to it both directly (being moral feels good) and indirectly (it helps you make good decisions). It makes you stronger and more resilient (once you’ve internalized it fully). It’s hard to describe beyond that, but good feels good (TVTropes warning).
I think I’m internally consistent. I’ve been told that I am. It’s unlikely that I’m perfectly consistent, but whatever inconsistencies I have are probably minor. I’m open to having them addressed, whatever they are.
Claiming that Stalin wasn’t happy sounds like a variation of sour grapes where not only can you not be as successful as him, it would be actively uncomfortable for you to believe that someone who lacks compassion can be happy, so you claim that he’s not.
It’s true he was paranoid but it’s also true that in the real world, there are tradeoffs and you don’t see people becoming happy with no downsides whatsoever—claiming that this disqualifies them from being called happy eviscerates the word of meaning.
I’m also not convinced that Stalin’s “paranoia” was paranoia (it seems rationa for someone who doesn’t care about the welfare of others and can increase his safety by instilling fear and treating everyone as enemies to do so). I would also caution against assuming that since Stalin’s paranoia is prominent enough for you to have heard of it, it’s too big a deal for him to have been happy—it’s promiment enough for you to have heard of it because it was a big deal to the people affected by it, which is unrelated to how much it affected his happiness.
Stalin was paranoid even by the standards of major world leaders. Khrushchev wasn’t so paranoid, for example. Stalin saw enemies behind every corner. That is not a happy existence.
Khruschev was deposed. Stalin stayed dictator until he died of natural causes. That suggests that Khruschev wasn’t paranoid enough, while Stalin was appropriately paranoid.
Seeing enemies around every corner meant that sometimes he saw enemies that weren’t there, but it was overall adaptive because it resulted in him not getting defeated by any of the enemies that actually existed. (Furthermore, going against nonexistent enemies can be beneficial insofar as the ruthlessness in going after them discourages real enemies.)
How does the last sentence follow from the previous one? It’s certainly not as happy an existence as it would have been if he had no enemies, but as I pointed out, nobody’s perfectly happy. There are always tradeoffs and we don’t claim that the fact that someone had to do something to gain his happiness automatically makes that happiness fake.
Stalin’s paranoia, and the actions he took as a result, also created enemies, thus becoming a partly self-fulfilling attitude.
You do see people becoming happy with fewer downsides than others, though.
Stalin refused to believe Hitler would attack him, probably since that would be suicidally stupid on the attacker’s part. Was he paranoid, or did he update?
I’m not sure “preference” is a powerful enough term to capture an agent’s true goals, however defined. Consider any of the standard preference reversals: a heavy cigarette smoker, for example, might prefer to buy and consume their next pack in a Near context, yet prefer to quit in a Far. The apparent contradiction follows quite naturally from time discounting, yet neither interpretation of the person’s preferences is obviously wrong.
I’ve seen it used as shorthand for “utility function”, saving 5 keystrokes. That was the intended use here. Point taken, alternate phrasings welcome.
That would only prove that you think you want to do that. The issue is that what you think you want and what you actually want do not generally coincide, because of imperfect self-knowledge, bounded thinking time, etc.
I don’t know about child soldiers, but it’s fairly common for amateur philosophers to argue themselves into thinking they “should” be perfectly selfish egoists, or hedonistic utilitarians, because logic or rationality demands it. They are factually mistaken, and to the extent that they think they want to be egoists or hedonists, their “preferences” are inconsistent, because if they noticed the logical flaw in their argument they would change their minds.
Isn’t that when I throw up my arms and say “congratulations, your hypothesis is unfalsifiable, the dragon is permeable to fluor”. What experimental setup would you suggest? Would you say any statement about one’s preferences is moot? It seems that we’re always under bounded thinking time constraints. Maybe the paperclipper really wants to help humankind and be moral, and mistakingly thinks otherwise. Who would know, it optimized its own actions under resource constraints, and then there’s the ‘Löbstacle’ to consider.
Is saying “I like vanilla ice cream” FAI-complete and must never be uttered or relied upon by anyone?
Or argue themselves into thinking that there is some subset of preferences such every other (human?) agent should voluntarily choose to adopt them, against their better judgment (edit; as it contradicts what they (perceive, after thorough introspection) as their own preferences)? You can add “objective moralists” to the list.
What would it be that is present in every single human’s brain architecture throughout human history that would be compatible with some fixed ordering over actions, called “morally good”? (Otherwise you’d have your immediate counterexample.) The notion seems so obviously ill-defined and misguided (hence my first comment asking Cousin_It).
It’s fine (to me) to espouse preferences that aim to change other humans (say, towards being more altruistic, or towards being less altruistic, or whatever), but to appeal to some objective guiding principle based on “human nature” (which constantly evolves in different strands) or some well-sounding ev-psych applause-light is just a new substitute for the good old Abrahamic heavenly father.
I wouldn’t say any of those things. Obviously paperclippers don’t “really want to help humankind”, because they don’t have any human notion of morality built-in in the first place. Statements like “I like vanilla ice cream” are also more trustworthy on account of being a function of directly observable things like how you feel when you eat it.
The only point I’m trying to make here is that it is possible to be mistaken about your own utility function. It’s entirely consistent for the vast majority of humans to have a large shared portion of their built-in utility function (built-in by their genes), even though many of them seemingly want to do bad things, and that’s because humans are easily confused and not automatically self-aware.
For sure.
I’d agree if humans were like dishwashers. There are templates for dishwashers, ways they are supposed to work. If you came across a broken dishwasher, there could be a case for the dishwasher to be repaired, to go back to “what it’s supposed to be”.
However, that is because there is some external authority (exasparated humans who want to fix their damn dishwasher, dirty dishes are piling up) conceiving of and enforcing such a purpose. The fact that genes and the environment shape utility functions in similar ways is a description, not a prescription. It would not be a case for any “broken” human to go back to “what his genes would want him to be doing”. Just like it wouldn’t be a case against brain uploading.
Some of the discussion seems to me like saying that “deep down in every flawed human, there is ‘a figure of light’, in our community ‘a rational agent following uniform human values with slight deviations accounting for ice-cream taste’, we just need to dig it up”. There is only your brain. With its values. There is no external standard to call its values flawed. There are external standards (rationality = winning) to better its epistemic and instrumental rationality, but those can help the serial killer and the GiveWell activist equally. (Also, both of those can be ‘mistaken’ about their values.)