Can anything besides Gary’s preferences provide a justification for saying that “Gary should_gary X”? (My own answer would be “No.”)
This strikes me as an ill-formed question for reasons I tried to get at in No License To Be Human. When Gary asks “What is right?” he is asking the question e.g. “What state of affairs will help people have more fun?” and not “What state of affairs will match up with the current preferences of Gary’s brain?” and the proof of this is that if you offer Gary a pill to change his preferences, Gary won’t take it because this won’t change what is right. Gary’s preferences are about things like fairness, not about Gary’s preferences. Asking what justifies should_Gary to Gary is either answered by having should_Gary wrap around and judge itself (“Why, yes, it does seem better to care about fairness than about one’s own desires”) or else is a malformed question implying that there is some floating detachable ontologically basic property of rightness, apart from particular right things, which could be ripped loose of happiness and applied to pain instead and make it good to do evil.
By saying “Gary should_gary X”, do you mean
Shouldness does incorporate a concept of reflective equilibrium (people recognize apparent changes in their own preferences as cases of being “mistaken”), but should_Gary makes no mention of Gary (except insofar as Gary’s welfare is one of Gary’s terminal values) but instead is about a large logical function which explicitly mentions things like fairness and beauty. This large function is rightness which is why Gary knows that you can’t change what is right by messing with Gary’s brain structures or making Gary want to do something else.
Or, perhaps you are saying that one cannot give a concise definition of “should”
You can arrive at a concise metaethical understanding of what sort of thing shouldness is. It is not possible to concisely write out the large function that any particular human refers to by “should”, which is why all attempts at definition seem to fall short; and since for any particular definition it always seems like “should” is detachable from that definition, this reinforces the false impression that “should” is an undefinable extra supernatural property a la Moore’s Open Question.
By far the hardest part of naturalistic metaethics is getting people to realize that it changes absolutely nothing about morals or emotions, just like the fact of a deterministic physical universe never had any implications for the freeness of our will to begin with.
I also note that although morality is certainly not written down anywhere in the universe except human brains, what is written is not about human brains, it is about things like fairness; nor is it written that “being written in a human brain” grants any sort of normative status. So the more you talk about “fulfilling preferences”, the less the subject matter of what you are discussing resembles the subject matter that other people are talking about when they talk about morality, which is about how to achieve things like fairness. But if you built a Friendly AI, you’d build it to copy “morality” out of the brains where that morality is written down, not try to manually program in things like fairness (except insofar as you were offering a temporary approximation explicitly defined as temporary). It is likewise extremely hard to get people to realize that this level of indirection, what Bostrom terms “indirect normativity”, is as close as you can get to getting any piece of physical matter to compute what is right.
If you want to talk about the same thing other people are talking about when they talk about what’s right, I suggest consulting William Frankena’s wonderful list of some components of the large function:
“Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one’s own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc.”
(Just wanted to quote that so that I didn’t entirely fail to talk about morality in between all this stuff about preferences and metaethics.)
Damn. I still haven’t had my “Aha!” moment on this. I’m glad that ata, at least, appears to have it, but unfortunately I don’t understand ata’s explanation, either.
I’ll understand if you run out of patience with this exercise, but I’m hoping you won’t, because if I can come to understand your meta-ethical theory, then perhaps I will be able to explain it to all the other people on Less Wrong who don’t yet understand it, either.
Let me start by listing what I think I do understand about your views.
1. Human values are complex. As a result of evolution and memetic history, we humans value/desire/want many things, and our values cannot be compressed to any simple function. Certainly, we do not only value happiness or pleasure. I agree with this, and the neuroscience supporting your position is nicely summarized in Tim Schroeder’s Three Faces of Desire. We can value damn near anything. There is no need to design an artificial agent to value only one thing, either.
2. Changing one’s meta-ethics need not change one’s daily moral behavior. You write about this here, and I know it to be true from personal experience. When deconverting from Christianity, I went from divine command theory to error theory in the course of about 6 months. About a year after that, I transitioned from error theory to what was then called “desire utilitarianism” (now called “desirism”). My meta-ethical views have shifted in small ways since then, and I wouldn’t mind another radical transition if I can be persuaded. But I’m not sure yet that desirism and your own meta-ethical theory are in conflict.
3. Onlookers can agree that Jenny has 5 units of Fred::Sexiness, which can be specified in terms of curves, skin texture, etc. This specification need not mention Fred at all. As explained here.
5. Nothing is fundamentally moral. There is nothing that would have value if it existed in an isolated universe all by itself that contained no valuers.
5 is questionable. When you say “Nothing is fundamentally moral” can you explain what it would be like if something was fundamentally moral? If not, the term “fundamentally moral” is confused rather than untrue; it’s not that we looked in the closet of fundamental morality and found it empty, but that we were confused and looking in the wrong closet.
Indeed my utility function is generally indifferent to the exact state of universes that have no observers, but this is a contingent fact about me rather than a necessary truth of metaethics, for indifference is also a value. A paperclip maximizer would very much care that these uninhabited universes contained as many paperclips as possible—even if the paperclip maximizer were outside that universe and powerless to affect its state, in which case it might not bother to cognitively process the preference.
You seem to be angling for a theory of metaethics in which objects pick up a charge of value when some valuer values them, but this is not what I think, because I don’t think it makes any moral difference whether a paperclip maximizer likes paperclips. What makes moral differences are things like, y’know, life, consciousness, activity, blah blah.
And if you’ve been reading along this whole time, you know the answer isn’t going to be, “Look at this fundamentally moral stuff!”
I didn’t know what “fundamentally moral” meant, so I translated it to the nearest term with which I’m more familiar, what Mackie called “intrinsic prescriptivity.” Or, perhaps more clearly, “intrinsic goodness,” following Korsgaard:
Objects, activities, or whatever have an instrumental value if they are valued for the sake of something else—tools, money, and chores would be standard examples. A common explanation of the supposedly contrasting kind, intrinsic goodness, is to say that a thing is intrinsically good if it is valued for its own sake, that being the obvious alternative to a thing’s being valued for the sake of something else. This is not, however, what the words “intrinsic value” mean. To say that something is intrinsically good is not by definition to say that it is valued for its own sake: it is to say that it has goodness in itself. It refers, one might say, to the location or source of the goodness rather than the way we value the thing. The contrast between instrumental and intrinsic value is therefore misleading, a false contrast. The natural contrast to intrinsic goodness—the value a thing has “in itself”—is extrinsic goodness—the value a thing gets from some other source. The natural contrast to a thing that is valued instrumentally or as a means is a thing that is valued for its own sake or as an end.
So what I mean to say in (5) is that nothing is intrinsically good (in Korsgaard’s sense). That is, nothing has value in itself. Things only have value in relation to something else.
I’m not sure whether this notion of intrinsic value is genuinely confused or merely not-understood-by-Luke-Muehlhauser, but I’m betting it is either confused or false. (“Untrue” is the term usually used to capture a statement’s being either incoherent or meaningful-and-false: see for example Richard Joyce on error theory.)
But now, I’m not sure you agree with (5) as I intended it. Do you think life, consciousness, activity, and some other things have value-in-themselves? Do these things have intrinsic value?
Thanks again for your reply. I’m going to read Chappell’s comment on this thread, too.
Do you think a heap of five pebbles is intrinsically prime, or does it get its primeness from some extrinsic thing that attaches a tag with the five English letters “PRIME” and could in principle be made to attach the same tag to composite heaps instead? If you consider “beauty” as the logical function your brain’s beauty-detectors compute, then is a screensaver intrinsically beautiful?
Does the word “intrinsic” even help, considering that it invokes bad metaphysics all by itself? In the physical universe there are only quantum amplitudes. Moral facts are logical facts, but not all minds are compelled by that-subject-matter-which-we-name-”morality”; one could as easily build a mind to be compelled by the primality of a heap of pebbles.
So the short answer is that there are different functions that use the same labels to designate different relations while we believe that the same labels designate the same functions?
I wonder if Max Tegmark would have written a similar comment. I’m not sure if there is a meaningful difference regarding Luke’s question to say that there are only quantum amplitudes versus there are only relations.
What I’m saying is that in the physical world there are only causes and effects, and the primeness of a heap of pebbles is not an ontologically basic fact operating as a separate and additional element of physical reality, but it is nonetheless about as “intrinsic” to the heap of pebbles as anything.
Once morality stops being mysterious and you start cashing it out as a logical function, the moral awfulness of a murder is exactly as intrinsic as the primeness of a heap of pebbles. Just as we don’t care whether pebble heaps are prime or experience any affect associated with its primeness, the Pebblesorters don’t care or compute whether a murder is morally awful; and this doesn’t mean that a heap of five pebbles isn’t really prime or that primeness is arbitrary, nor yet that on the “moral Twin Earth” murder could be a good thing. And there are no little physical primons associated with the pebble-heap that could be replaced by compositons to make it composite without changing the number of pebbles; and no physical stone tablet on which morality is written that could be rechiseled to make murder good without changing the circumstances of the murder; but if you’re looking for those you’re looking in the wrong closet.
Are you arguing that the world is basically a cellular automaton and that therefore beauty is logically implied to be a property of some instance of the universe? If some agent does perceive beauty then that is a logically implied fact about the circumstances. Asking if another agent would perceive the same beauty could be rephrased as asking about the equality of the expressions of an equation?
I think a lot of people are arguing about the ambiguity of the string “beauty” as it is multiply realized.
But now, I’m not sure you agree with (5) as I intended it. Do you think life, consciousness, activity, and some other things have value-in-themselves? Do these things have intrinsic value?
It is rather difficult to ask that question in the way you intend it. Particularly if the semantics have “because I say so” embedded rather than supplemented.
When you say “Nothing is fundamentally moral” can you explain what it would be like if something was fundamentally moral? If not, the term “fundamentally moral” is confused rather than untrue; it’s not that we looked in the closet of fundamental morality and found it empty, but that we were confused and looking in the wrong closet.
“Innately” is being used in that post in the sense of being a fundamental personality trait or a strong predisposition (as in “Correspondance Bias”, to which that post is a followup). And fundamental personality traits and predispositions do exist — including some that actually do predispose people toward being evil (e.g. sociopathy) — so, although the phrase “innately evil” is a bit dramatic, I find its meaning clear enough in that post’s context that I don’t think it’s a mistake similar to “fundamentally moral”. It’s not arguing about whether there’s a ghostly detachable property called “evil” that’s independent of any normal facts about a person’s mind and history.
When you say “Nothing is fundamentally moral” can you explain what it would be like if something was fundamentally moral?
He did, by implication, in describing what it’s like if nothing is:
There is nothing that would have value if it existed in an isolated universe all by itself that contained no valuers.
Clearly, many of the items on EY’s list, such as fun, humor, and justice, require the existence of valuers. The question above then amounts to whether all items of moral goodness require the existence of valuers. I think the question merits an answer, even if (see below) it might not be the one lukeprog is most curious about.
Or, perhaps more clearly, “intrinsic goodness,” following Korsgaard [...]
Unfortunately, lukeprog changed the terms in the middle of the discussion. Not that there is anything wrong with the new question (and I like EY’s answer).
I don’t think it makes any moral difference whether a paperclip maximizer likes paperclips. What makes moral differences are things like, y’know, life, consciousness, activity, blah blah.
What difference would CEV make from a universe in which a Paperclip Maximizer equipped everyone with the desire to maximize paperclips? Of what difference is a universe with as many discrete consciousness entities as possible from one with a single universe-spanning consciousness?
If it doesn’t make any difference, then how can we be sure that the SIAI won’t just implement the first fooming AI with whatever terminal goal it desires?
I don’t see how you can argue that the question “What is right?” is about the state of affairs that will help people to have more fun and yet claim that you don’t think that “it makes any moral difference whether a paperclip maximizer likes paperclips”
What difference would CEV make from a universe in which a Paperclip Maximizer equipped everyone with the desire to maximize paperclips? Of what difference is a universe with as many discrete consciousness entities as possible from one with a single universe-spanning consciousness?
If a paperclip maximizer modified everyone such that we really only valued paperclips and nothing else, and we then ran CEV, then CEV would produce a powerful paperclip maximizer. This is… I’m not going to say it’s a feature, but it’s not a bug, at least. You can’t expect CEV to generate accurate information about morality if you erase morality from the minds it’s looking at. (You could recover some information about morality by looking at history, or human DNA (if the paperclip maximizer didn’t modify that), etc., but then you’d need a strategy other than CEV.)
I don’t think I understand your second question.
I don’t see how you can argue that the question “What is right?” is about the state of affairs that will help people to have more fun and yet claim that you don’t think that “it makes any moral difference whether a paperclip maximizer likes paperclips”
That depends on whether the paperclip maximizer is sentient, whether it just makes paperclips or it actually enjoys making paperclips, etc. If those are the case, then its preferences matter… a little. (So let’s not make one of those.)
That depends on whether the paperclip maximizer is sentient, whether it just makes paperclips or it actually enjoys making paperclips, etc.
All those concepts seem to be vague. To be sentient, to enjoy. Do you need to figure out how to define those concepts mathematically before you’ll be able to implement CEV? Or are you just going to let extrapolated human volition decide about that? If so, how can you possible make claims about how valuable, or how much the preference of a paperclip maximizer matter? Maybe it will all turn out to be wireheading in the end...
What is really weird is that Yudkowsky is using the word right in reference to actions affecting other agents, yet doesn’t think that it would be reasonable to assign moral weight to the preferences of a paperclip maximizer.
CEV will decide. In general, it seems unlikely that the preferences of nonsentient objects will have moral value.
Edit: Looking back, this comment doesn’t really address the parent. Extrapolated human volition will be used to determine which things are morally significant. I think it is relatively probable that wireheading might turn out to be morally necessary. Eliezer does think that the preferences of a paperclip maximizer would have moral value if one existed. (If a nonexistent paperclip maximizer had moral worth, so would a nonexistent paperclip minimizer. This isn’t completely certain, because paperclip maximizers might gain moral significance from a property other than existence that is not shared with paperclip minimizers, but at this point, this is just speculation and we can do little better without CEV.) A nonsentient paperclip maximizer probably has no more moral value than a rock with “make paperclips” written on the side.
The reason that CEV is only based on human preferences is because, as humans, we want to create an algorithm that does what is right and humans are the only things we have that know what is right. If other species have moral value then humans, if we knew more, would care about them. If there is nothing in human minds that could motivate us to care about some specific thing, than what reason could we possibly have for designing an AI to care about that thing?
Paperclips aren’t part of fun, on EY’s account as I understand it, and therefore not relevant to morality or right. If paperclip maximizers believe otherwise they are simply wrong (perhaps incorrigibly so, but wrong nonetheless)… right and wrong don’t depend on the beliefs of agents, on this account.
So those claims seem consistent to me.
Similarly, a universe in which a PM equipped everyone with the desire to maximize paperclips would therefore be a universe with less desire for fun in it. (This would presumably in turn cause it to be a universe with less fun in it, and therefore a less valuable universe.)
I should add that I don’t endorse this view, but it does seem to be pretty clearly articulated/presented. If I’m wrong about this, then I am deeply confused.
If paperclip maximizers believe otherwise they are simply wrong (perhaps incorrigibly so, but wrong nonetheless)… right and wrong don’t depend on the beliefs of agents, on this account.
I don’t understand how someone can arrive at “right and wrong don’t depend on the beliefs of agents”.
I conclude that you use “I don’t understand” here to indicate that you don’t find the reasoning compelling. I don’t find it compelling, either—hence, my not endorsing it—so I don’t have anything more to add on that front.
If those people propose that utility functions are timeless (e.g. the Mathematical Universe), or simply an intrinsic part of the quantum amplitudes that make up physical reality (is there a meaningful difference?), then under that assumption I agree. If beauty can be captured as a logical function then women.beautiful is right independent of any agent that might endorse that function. The problem of differing tastes, differing aesthetic value, that lead to sentences like “beauty is in the eye of the beholder” are a result of trying to derive functions by the labeling of relations. There can be different functions that designate the same label to different relations. x is R-related to y can be labeled “beautiful” but so can xSy. So while some people talk about the ambiguity of the label beauty and conclude that what is beautiful is agent-dependent, other people talk about the set of functions that are labeled as beauty-function or assign the label beautiful to certain relations and conclude that their output is agent-independent.
(nods) Yes, I think EY believes that rightness can be computed as a property of physical reality, without explicit reference to other agents.
That said, I think he also believes that the specifics of that computation cannot be determined without reference to humans. I’m not 100% clear on whether he considers that a mere practical limitation or something more fundamental.
After trying to read No License To Be Human I officially give up reading the sequences for now and postpone it until I learnt a lot more. I think it is wrong to suggest that anyone can read the sequences. Either you’ve to be a prodigy or a post-graduate. The second comment on that post expresses my own feelings, can people actually follow Yudkowsky’s posts? It’s over my head.
I agree with you sentiment, but I suggest not giving up so easily. I have the same feeling after many sequence posts, but some of them that I groked were real gems and seriously affected my thinking.
Also, borrowing some advice on reading hard papers, it’s re-reading that makes a difference.
Also, as my coach put it “the best stretching for doing sidekicks is actually doing sidekicks”.
When Gary asks “What is right?” he is asking the question e.g. “What state of affairs will help people have more fun?” and not “What state of affairs will match up with the current preferences of Gary’s brain?”
I do not necessarily disagree with this, but the following:
and the proof of this is that if you offer Gary a pill to change his preferences, Gary won’t take it because this won’t change what is right.
… does not prove the claim. Gary would still not take the pill if the question he was asking was “What state of affairs will match up with the current preferences of Gary’s brain?”. A reference to the current preferences of Gary’s brain is different to asking the question “What is a state of affairs in which there is a high satisfaction of the preferences in the brain of Gary?”.
Perhaps a better thought experiment, then, is to offer Gary the chance to travel back in time and feed his 2-year-old self the pill. Or, if you dislike time machines in your thought experiments, we can simply ask Gary whether or not he now would have wanted his parents to have given him the pill when he was a child. Presumably the answer will still be no.
If timetravel is to be considered then we must emphasize that when we say ‘current preferences’ we do not mean “preferences at time Time.now, whatever we can make those preferences be” but rather “I want things X, Y, Z to happen, regardless of the state of the atoms that make up me at this or any other time.” Changing yourself to not want X, Y or Z will make X, Y and Z less likely to happen so you don’t want to do that.
It seems so utterly wrong to me that I concluded it must be me who simply doesn’t understand it. Why would it be right to help people to have more fun if helping people to have more fun does not match up with your current preferences. The main reason for why I was able to abandon religion was to realize that what I want implies what is right. That still feels intuitively right. I didn’t expect to see many people on LW to argue that there exist preference/(agent/mind)-independent moral statements like ‘it is right to help people’ or ‘killing is generally wrong’. I got a similar reply from Alicorn. Fascinating. This makes me doubt my own intelligence more than anything I’ve so far come across. If I parse this right it would mean that a Paperclip Maximizer is morally bankrupt?
The main reason for why I was able to abandon religion was to realize that what I want implies what is right. That still feels intuitively right. I didn’t expect to see many people on LW to argue that there exist preference/(agent/mind)-independent moral statements like ‘it is right to help people’ or ‘killing is generally wrong’.
Well, something I’ve been noticing is that in their tell your rationalist origin stories, the reason a lot of people give for why they left their religion aren’t actually valid arguments. Make of that what you will.
If I parse this right it would mean that a Paperclip Maximizer is morally bankrupt?
Yes. It is morally bankrupt. (or would you not mind turning into paperclips if that’s what the Paperclip Maximizer wanted?)
BTW, your current position is more-or-less what theists mean when they say atheists are amoral.
Yes. It is morally bankrupt. (or would you not mind turning into paperclips if that’s what the Paperclip Maximizer wanted?)
Yes, but that is a matter of taste.
BTW, your current position is more-or-less what theists mean when they say atheists are amoral.
Why would I ever change my current position? If Yudkowsky told me there was some moral laws written into the fabric of reality, what difference would that make? Either such laws are imperative, so that I am unable to escape them, or I simply ignore them if they are opposing my preferences.
Assume all I wanted to do is to kill puppies. Now Yudkowsky told me that this is prohibited and I will suffer disutility because of it. The crucial question would be, does the disutility outweigh the utility I assign to killing puppies? If it doesn’t, why should I care?
Perhaps you assign net utility to killing puppies. If you do, you do. What EY tells you, what I tell you, what is prohibited, etc., has nothing to do with it. Nothing forces you to care about any of that.
If I understand EY’s position, it’s that it cuts both ways: whether killing puppies is right or wrong doesn’t force you to care, but whether or not you care doesn’t change whether it’s right or wrong.
If I understand your position, it’s that what’s right and wrong depends on the agent’s preferences: if you prefer killing puppies, then killing puppies is right; if you don’t, it isn’t.
My own response to EY’s claim is “How do you know that? What would you expect to observe if it weren’t true?” I’m not clear what his answer to that is.
My response to your claim is “If that’s true, so what? Why is right and wrong worth caring about, on that model… why not just say you feel like killing puppies?”
My response to your claim is “If that’s true, so what? Why is right and wrong worth caring about, on that model… why not just say you feel like killing puppies?”
I don’t think those terms are useless, that moral doesn’t exist. But you have to use those words with great care, because on its own they are meaningless. If I know what you want, I can approach the conditions that would be right for you. If I know how you define morality, I can act morally according to you. But I will do so only if I care about your preferences. If part of my preferences is to see other human beings happy then I have to account for your preferences to some extent, which makes them a subset of my preferences. All those different values are then weighted accordingly. Do you disagree with that understanding?
I agree with you that your preferences account for your actions, and that my preferences account for my actions, and that your preferences can include a preference for my preferences being satisfied.
But I think it’s a mistake to use the labels “morality” and “preferences” as though they are interchangeable.
If you have only one referent—which it sounds like you do—then I would recommend picking one label and using it consistently, and not use the other at all. If you have two referents, I would recommend getting clear about the difference and using one label per referent.
Otherwise, you introduce way too many unnecessary vectors for confusion.
It seems relatively clear to me that EY has two referents—he thinks there are two things being talked about. If I’m right, then you and he disagree on something, and by treating the language of morality as though it referred to preferences you obscure that disagreement.
More precisely: consider a system S comprising two agents A and B, each of which has a set of preferences Pa and Pb, and each of which has knowledge of their own and the other’s preferences. Suppose I commit an act X in S.
If I’ve understood correctly, you and EY agree that knowing all of that, you know enough in principle to determine whether X is right or wrong. That is, there isn’t anything left over, there’s no mysterious essence of rightness or external privileged judge or anything like that.
In this, both of you disagree with many other people, such as theists (who would say that you need to consult God’s will to make that determination) and really really strict consequentialists (who would say that you need to consult the whole future history of the results of X to make that determination).
If I’ve understood correctly, you and EY disagree on symmetry. That is, if A endorses X and B rejects X, you would say that whether X is right or not is undetermined… it’s right by reference to A, and wrong by reference to B, and there’s nothing more to be said. EY, if I understand what he’s written, would disagree—he would say that there is, or at least could be, additonal computation to be performed on S that will tell you whether X is right or not.
For example, if A = pebblesorters and X = sorting four pebbles into a pile, A rejects X, and EY (I think) would say that A is wrong to do so… not “wrong with reference to humans,” but simply wrong. You would (I think) say that such a distinction is meaningless, “wrong” is always with reference to something. You consider “wrong” a two-place predicate, EY considers “wrong” a one-place predicate—at least sometimes. I think.
For example, if A = SHFP and B = humans and X = allowing people to experience any pain at all, A rejects X and B endorses X. You would say that X is “right_human” and “wrong_SHFP” and that whether X is right or not is insufficiently specified question. EY would say that X is right and the SHFP are mistaken.
So, I disagree with your understanding, or at least your labeling, insofar as it leads you to elide real disagreements. I endorse clarity about disagreement.
As for whether I agree with your position or EY’s, I certainly find yours easier to justify.
But maybe I misunderstand how he arrives at the belief that “wrong” can be a one-place predicate.
Yeah. While I’m reasonably confident that he holds the belief, I have no confidence in any theories how he arrives at that belief.
What I have gotten from his writing on the subject is a combination of “Well, it sure seems that way to me,” and “Well, if that isn’t true, then I don’t see any way to build a superintelligence that does the right thing, and there has to be a way to build a superintelligence that does the right thing.” Neither of which I find compelling.
But there’s a lot of the metaethics sequence that doesn’t make much sense to me at all, so I have little confidence that what I’ve gotten out of it is a good representation of what’s there.
It’s also possible that I’m completely mistaken and he simply insists on “right” as a one-place predicate as a rhetorical trick; a way of drawing the reader’s attention away from the speaker’s role in that computation.
If that is the case I don’t see how different agents could arrive at the same perception of right and wrong, if their preferences are fundamentally opposing, given additional computation
I am fairly sure EY would say (and I agree) that there’s no reason to expect them to. Different agents with different preferences will have different beliefs about right and wrong, possibly incorrigibly different.
Humans and Babykillers as defined will simply never agree about how the universe would best be ordered, even if they come to agree (as a political exercise) on how to order the universe, without the exercise of force (as the SHFP purpose to do, for example).
(if right and wrong designate future world states).
Um.
Certainly, this model says that you can order world-states in terms of their rightness and wrongness, and there might therefore be a single possible world-state that’s most right within the set of possible world-states (though there might instead be several possible world-states that are equally right and better than all other possibilities).
If there’s only one such state, then I guess “right” could designate a future world state; if there are several, it could designate a set of world states.
But this depends on interpreting “right” to mean maximally right, in the same sense that “cold” could be understood to designate absolute zero. These aren’t the ways we actually use these words, though.
If you just argue that we don’t have free will because what is right is logically implied by cause and effect,
I don’t see what the concept of free will contributes to this discussion.
I’m fairly certain that EY would reject the idea that what’s right is logically implied by cause and effect, if by that you mean that an intelligence that started out without the right values could somehow infer, by analyzing causality in the world, what the right values were.
My own jury is to some degree still out on this one. I’m enough of a consequentialist to believe that an adequate understanding of cause and effect lets you express all judgments about right and wrong action in terms of more and less preferable world-states, but I cannot imagine how you could derive “preferable” from such an understanding. That said, my failure of imagination does not constitute a fact about the world.
Humans and Babykillers are not talking about the same subject matter when they debate what-to-do-next, and their doing different things does not constitute disagreement.
There’s a baby in front of me, and I say “Humans and Babykillers disagree about what to do next with this baby.”
The one replies: “No, they don’t. They aren’t talking about the same subject when they debate what to do next; this is not a disagreement.”
“Let me rephrase,” I say. “Babykillers prefer that this baby be killed. Humans prefer that this baby have fun. Fun and babykilling can’t both be implemented on the same baby: if it’s killed, it’s not having fun; if it’s having fun, it hasn’t been killed.”
Have I left out anything of value in my restatement? If so, what have I left out?
More generally: given all the above, why should I care whether or not what humans and Babykillers have with respect to this baby is a disagreement? What difference does that make?
More generally: given all the above, why should I care whether or not what humans and Babykillers have with respect to this baby is a disagreement? What difference does that make?
If you disagree with someone, and you’re both sufficiently rational, then you can expect to have a good shot at resolving your disagreement by arguing. That doesn’t work if you just have fundamentally different motivational frameworks.
I don’t know if I agree that a disagreement is necessarily resolvable by argument, but I certainly agree that many disagreements are so resolvable, whereas a complete difference of motivational framework is not.
If that’s what EY meant to convey by bringing up the question of whether Humans and Babykillers disagree, I agree completely.
As I said initially: “Humans and Babykillers as defined will simply never agree about how the universe would best be ordered.”
To understand the other side of the argument, I think it helps to look at this:
all disagreements are about facts. What else would you be talking about?
One side has redefined “disagreement” to mean “a difference of opinion over facts”!
I think that explains much of the sound and fury surrounding the issue.
A “difference of opinion over goals” is not a “difference of opinion over facts”.
However, note that different goals led to the cigarette companies denying the link between cigarettes and cancer—and also led to oil company AGW denialism—which caused many real-world disagreements.
All of which leaves me with the same question I started with. If I know what questions you and I give different answers to—be they questions about facts, values, goals, or whatever else—what is added to my understanding of the situation by asserting that we disagree, or don’t disagree?
ata’s reply was that “we disagree” additionally indicates that we can potentially converge on a common answer by arguing. That also seems to be what EY was getting at about hot air and rocks.
That makes sense to me, and sure, it’s additionally worth clarifying whether you and I can potentially converge on a common answer by arguing.
Anything else?
Because all of this dueling-definitions stuff strikes me as a pointless distraction. I use words to communicate concepts; if a word no longer clearly communicates concepts it’s no longer worth anything to me.
ata’s reply was that “we disagree” additionally indicates that we can potentially converge on a common answer by arguing
That doesn’t seem to be what the dictionary says “disagreement” means.
Maybe if both sides realise that the argument is pointless, they would not waste their time—but what if they don’t know what will happen? - or what if their disagreement is intended to sway not their debating partner, but a watching audience?
I agree with you about what the dictionary says, and that people might not know whether they can converge on a common answer, and that people might go through the motions of a disagreement for the benefit of observers.
We talk about what is good, and Babykillers talk about what is eat-babies, but both good and eat-babies perform analogous functions. For building a Friendly-AI we may not give a damn about how to categorize such analogous functions, but I’ve got a feeling that simply hijacking the word “moral” to suddenly not apply to such similar things, as I think it is usually used, you’ve successfully increased my confusion through the last year. Either this, or I’m back at square one. Probably the latter.
The fact that killing puppies is wrong follows from the definition of wrong. The fact that Eliezer does not want to do what is wrong is a fact about his brain, determined by introspection.
Why would it be right to help people to have more fun if helping people to have more fun does not match up with your current preferences
Because right is a rigid designator. It refers to a specific set of terminal values. If your terminal values don’t match up with this specific set of values, then they are wrong, i.e. not right. Not that you would particularly care, of course. From your perspective, you only want to maximize your own values and no others. If your values don’t match up with the values defined as moral, so much for morality. But you still should be moral because should, as it’s defined here, refers to a specific set of terminal values—the one we labeled “right.”
(Note: I’m using the term should exactly as EY uses it, unlike in my previous comments in these threads. In my terms, should=should_human and on the assumption that you, XiXiDu, don’t care about the terminal values defined as right, should_XiXiDu =/= should)
I’m getting the impression that nobody here actually disagrees but that some people are expressing themselves in a very complicated way.
I parse your comment to mean that the definition of moral is a set of terminal values of some agents and should is the term that they use to designate instrumental actions that do serve that goal?
Your second paragraph looks correct. ‘Some agents’ refers to humanity rather than any group of agents. Technically, should is the term anything should use when discussing humanity’s goals, at least when speaking Eliezer.
Your first paragraph is less clear. You definitely disagree with others. There are also some other disagreements.
Correct, I disagree. What I wanted to say with my first paragraph was that I might disagree because I don’t understand what others believe because they expressed it in a way that was too complicated for me to grasp. You are also correct that I myself was not clear in what I tried to communicate.
ETA That is if you believe that disagreement fundamentally arises out of misunderstanding as long as one is not talking about matters of taste.
In Eliezer’s metaethics, all disagreement are from misunderstanding. A paperclip maximizer agrees about what is right, it just has no reason to act correctly.
To whoever voted the parent down, this is edit nearly /edit exactly correct. A paperclip maximizer could, in principle, agree about what is right. It doesn’t have to, I mean a paperclip maximizer could be stupid, but assuming it’s intelligent enough, it could discover what is moral. But a paperclip maximizer doesn’t care about what is right, it only cares about paperclips, so it will continue maximizing paperclips and only worry about what is “right” when doing so helps it create more paperclips. Right is a specific set of terminal values that the paperclip maximizer DOESN”T have. On the other hand you, being human, do have those terminal values on EY’s metaethics.
Agreed that a paperclip maximizer can “discover what is moral,” in the sense that you’re using it here. (Although there’s no reason to expect any particular PM to do so, no matter how intelligent it is.)
Can you clarify why this sort of discovery is in any way interesting, useful, or worth talking about?
...morality is an objective feature of the universe...
Fascinating. I still don’t understand in what sense this could be true, except maybe the way I tried to interpret EY here and here. But those comments simply got downvoted without any explanation or attempt to correct me, therefore I can’t draw any particular conclusion from those downvotes.
You could argue that morality (what is right?) is human and other species will agree that from a human perspective what is moral is right is right is moral. Although I would agree, I don’t understand how such a confusing use of terms is helpful.
Morality is just a specific set of terminal values. It’s an objective feature of the universe because… humans have those terminal values. You can look inside the heads of humans and discover them. “Should,” “right,” and “moral,” in EY’s terms, are just being used as a rigid designators to refer to those specific values.
I’m not sure I understand the distinction between “right” and “moral” in your comment.
To whoever voted the parent down, this is exactly correct.
I was the second to vote down the grandparent. It is not exactly correct. In particular it claims “all disagreement” and “a paperclip maximiser agrees”, not “could in principle agree”.
While the comment could perhaps be salvaged with some tweaks, as it stands it is not correct and would just serve to further obfuscate what some people find confusing as it is.
I concede that I was implicitly assuming that all agents have access to the same information. Other than that, I can think of no source of disagreements apart from misunderstanding. I also meant that if paperclip maximizer attempted to find out what is right and did not make any mistakes, it would arrive at the same answer as a human, though there is not necessarily any reason for it to try in the first place. I do not think that these distinctions were nonobvious, but this may be overconfidence on my part.
Depends on how the question is asked. Does the paperclip maximizer have the definition of the word right stored in its memory? If so, it just consults the memory. Otherwise, the questioner would have to either define the word or explain how to arrive at a definition.
This may seem like cheating, but consider the analogous case where we are discussing prime numbers. You must either already know what a prime number is, or I must tell you, or I must tell you about mathematicians, and you must observe them.
As long as a human and a paperclip maximizer both have the same information about humans, they will both come to the same conclusions about human brains, which happen to encode what is right, thus allowing both the human and the paperclip maximizer to learn about morality. If this paperclip maximizer then chooses to wipe out humanity in order to get more raw materials, it will knows that its actions are wrong; it just has no term in its utility function for morality.
Eliezer believes that you desire to do what is right. It is important to remember that what is right has nothing to do with whether you desire it. Moral facts are interesting because they describe our desires, but they would be true even if our desires were different.
In general, these things are useful for programming FAI and evaluating moral arguments. We should not allow our values to drift too far over time. The fact that wireheads want to be wireheaded is not a a valid argument in favour of wireheading. A FAI should try to make reality match what is right, not make reality match people’s desires (the latter could be accomplished by changing people’s desires). We can be assured that we are acting morally even if there is no magic light from the sky telling us that we are. Moral goals should be pursued. Even if society condones that which is wrong, it is still wrong. Studying the human brain is necessary in order to learn more about morality. When two people disagree about morality, one or both of them is wrong.
And if it turns out that humans currently want something different than what we wanted a thousand years ago, then it follows that a thousand years ago we didn’t want what was right, and now we do… though if you’d asked us a thousand years ago, we’d have said that we want what is right, and we’d have arrived at that conclusion through exactly the same cognitive operations we’re currently using. (Of course, in that case we would be mistaken, unlike the current case.)
And if it turns out that a thousand years from now humans want something different, then we will no longer want what is right… though if you ask us then, we’ll say we want what is right, again using the same cognitive operations. (Again, in that case we would be mistaken.)
And if there turn out to be two groups of humans who want incompatible things (for example, because their brains are sufficiently different), then whichever group I happen to be in wants what is right, and the other group doesn’t… though if you ask them, they’ll (mistakenly) say they want what is right, again using the same cognitive operations.
All of which strikes me as a pointlessly confusing way of saying that I endorse what humans-sufficiently-like-me currently want, and don’t endorse what we used to want or come to want or what anyone else wants if it’s too different from that.
Talking about whether some action is right or wrong or moral seems altogether unnecessary on this view. It is enough to say that I endorse what I value, and will program FAI to optimize for that, and will reject moral arguments that are inconsistent with that, and etc. Sure, if I valued something different, I would endorse that instead, but that doesn’t change anything; if I were hit by a speeding train, I’d be dead, but it doesn’t follow that I am dead. I endorse what I value, which means I consider worlds in which there is less of what I value worse than worlds in which there is more of what I value—even if those worlds also include versions of me that endorse something different. Fine and dandy.
What is added to that description by introducing words like right and wrong and moral, other than the confusion caused by people who assume those words refer to a magic light from the sky? It seems no more useful, on this view, than talking about how certain acts are salvatory or diabolical or fleabag.
If the people a thousand years ago might have wanted what is right, but were mistaken as to what they really wanted. People do not understand their own brains. (You may agree with this; it is unclear from your wording.) Even if they really did have different desires they would not be mistaken. Even if they used the same sound - ‘right’ - they would be attaching a different meaning to it, so it would be a different word. They would be incorrect if they did not recognize our values as right in Eliezer-speak.
This is admitted a nonintuitive meaning. I do not know if there is a clearer way of saying things and I am unsure of what aspects of most people’s understanding of the word Eliezer believes this to capture. The alternative does not seem much clearer. Consider Eliezer’s example of pulling a child off of some train tracks. If you see me do so, you could explain it in terms of physics/neuroscience. If you ask me about it, I could mention the same explanation, but I also have another one. Why did seeing the child motivate me to save it? Yes, my neural pathways caused it, but I was not thinking about those neural pathway; that would be a level confusion. I was thinking about what is right. Saying that I acted because of neuroscience is true, but saying nothing else promotes level confusion. If you ask me what should happen if I were uninvolved or if my brain were different, I would not change my answer from if I were involved because should is a 1-place function. People do get confused about these things, especially when talking about AI, and that should be stopped. For many people, Eliezer did not resolve confusion, so we need to do better, but default language is no less clear than Eliezer-speak. (To the extent that I agree with Eliezer, I came to this agreement after having read the sequences, but directly after reading other arguments.)
I agree that people don’t fully understand their own brains. I agree that it is possible to have mistaken beliefs about what one really wants. I agree that on EY’s view any group that fails to identify our current values as right is mistaken.
I think EY’s usage of “right” in this context leads to unnecessary confusion.
The alternative that seems clearer to me, as I’ve argued elsewhere, is to designate our values as our values, assert that we endorse our values, engage in research to articulate our values more precisely, build systems to optimize for our values, and evaluate moral arguments in terms of how well they align with our values.
None of this requires further discussion of right and wrong, good and evil, salvatory and diabolical, etc., and such terms seem like “applause lights” better-suited to soliciting alliances than anything else.
If you ask me why I pulled the child off the train tracks, I probably reply that I didn’t want the child to die. If you ask me why I stood on the platform while the train ran over the child, I probably reply that I was paralyzed by shock/fear, or that I wasn’t sure what to do. In both cases, the actual reality is more complicated than my self-report: there are lots of factors that influence what I do, and I’m not aware of most of them.
I agree with you that people get confused about these things. I agree with you that there are multiple levels of description, and mixing them leads to confusion.
If you ask me whether the child should be pulled off the tracks, I probably say “yes”; if you ask me why, I probably get confused. The reason I get confused is because I don’t have a clear understanding of how I come to that conclusion; I simply consulted my preferences.
Faced with that confusion, people make up answers, including answers like “because it’s right to do so” or “because it’s wrong to let the child die” or “because children have moral value” or “because pulling the child off the tracks has shouldness” or a million other such sequences of words, none of which actually help resolve the confusion. They add nothing of value.
There are useful ways to address the question. There are things that can be said about how my preferences came to be that way, and what the consequences are of my preferences being that way, and whether my preferences are consistent. There are techniques for arriving at true statements in those categories.
As far as I can tell, talking about what’s right isn’t among them, any more than talking about what God wants is. It merely adds to the confusion.
I agree with everything non-linguistic If we get rid of words like right, wrong, and should, then we are forced to either come up with new words or use ‘want’ and ‘desire’. The first option is confusing and the second can make us seem like egoists or like people who think that wireheading is right because wireheaded people desire it. To someone unfamiliar with this ethical theory, it would be very misleading. Even many of the readers of this website would be confused if we only used words like ‘want’. What we have now is still far from optimal.
If we get rid of words like right, wrong, and should, then we are forced to either come up with new words or use ‘want’ and ‘desire’.
...and ‘preference’ and ‘value’ and so forth. Yes.
If I am talking about current human values, I endorse calling them that, and avoiding introducing new words (like “right”) until there’s something else for those words to designate.
That neither implies that I’m an egoist, nor that I endorse wireheading.
I agree with you that somebody might nevertheless conclude one or both of those things. They’d be mistaken.
I don’t think familiarity with any particular ethical theory is necessary to interpret the lack of a word, though I agree with you that using a word in the absence of a shared theory about its meaning leads to confusion. (I think most usages of “right” fall into this category.)
If you are using ‘right’ to designate something over and above current human values, I endorse you using the word… but I have no idea at the moment what that something is.
I tentatively agree with your wording, though I will have to see if there are any contexts where it fails.
If you are using ‘right’ to designate something over and above current human values, I endorse you using the word… but I have no idea at the moment what that something is.
By definition, wouldn’t humans be unable to want to pursue such a thing?
For example, if humans value X, and “right” designates Y, and aliens edit our brains so we value Y, then we would want to pursue such a thing. Or if Y is a subset of X, we might find it possible to pursue Y instead of X. (I’m less sure about that, though.) Or various other contrived possibilities.
Yes, my statement was way too strong. In fact, it should be much weaker than even what you say; just start a religion that tells people to value Y. I was attempting to express an actual idea that I had with this sentence originally, but my idea was wrong, so never mind.
But supposing it were true, why would it matter?
What does this mean? Supposing that something were right, what would it matter to humans? You could get it to matter to humans by exploiting their irrationality, but if CEV works, it would not matter to that.
What would it even mean for this to be true? You’d need a definition of right.
Eliezer believes that you desire to do what is right. It is important to remember that what is right has nothing to do with whether you desire it. [...] Moral goals should be pursued. Even if society condones that which is wrong, it is still wrong. Studying the human brain is necessary in order to learn more about morality.
How is this helpful? Here is how I would paraphrase the above (as I understand it):
Human brains cause human action through an ambivalent decision process.
What does this tell about wireheading? I think wireheading might increase pleasure but at the same time feel that it would be wrong. So? All that means is that I have complex and frequently ambivalent preferences and that I use an inaccurate and ambivalent language to describe them. What important insight am I missing?
The important thing about wireheading in this context is that desires after being wireheaded do not matter. The pleasure is irrelevant for this purpose; we could just as easily imagine humans being wireheaded to feel pain, but to desire continuing to feel pain. The point is that what is right should be pursued because it is right, not because people desire it. People’s desires are useful as a way of determining what is right, but if it is known that people desires were altered in some way, they stop providing evidence as to what is right. This understanding is essential to a superintelligence considering the best way to alter peoples brains.
The pleasure is irrelevant for this purpose; we could just as easily imagine humans being wireheaded to feel pain, but to desire continuing to feel pain. The point is that what is right should be pursued because it is right, not because people desire it.
That’s expressed very clearly, thanks. I don’t want to sound rude, I honestly want to understand this. I’m reading your comment and can’t help but think that you are arguing about some kind of universal right. I still can’t pinpoint the argument. Why isn’t it completely arbitrary if we desire to feel pain or pleasure? Is the right answer implied by our evolutionary history? That’s a guess, I’m confused.
People’s desires are useful as a way of determining what is right, but if it is known that people desires were altered in some way, they stop providing evidence as to what is right.
Aren’t our desires altered constantly by mutation, nurture, culture and what we experience and learn? Where can you find the purity of human desire?
I get that you are having trouble understanding this; it is hard and I am much worse at explaining thing in text than in person.
What is right is universal in the sense that what is right would not change if our brains were different. The fact that we care about what is right is caused by our evolutionary history. If we evolved differently, we would have different values, wanting what is gleerp rather than what is right. The differences would be arbitrary to most minds, but not to us. One of the problems of friendliness is ensuring that it is not arbitrary to the AI either.
Aren’t our desires altered constantly by mutation, nurture, culture and what we experience and learn?
There are two types of this; we may learn more about our own values, which is good and which Eliezer believes to be the cause of “moral progress”, or our values may really change. The second type of changes to our desires really are bad. People actually do this, like those who refuse to expose themselves to violence because they think that it will desensitize them from violence. They are really just refusing to take Gandhi’s murder pill, but on a smaller scale. If you have a transtemporal disagreement with your future self on what action you future self should take, your future self will win, because you will no longer exist. The only way to prevent this is to simply refuse to allow your values to change, preventing your future self from disagreeing with you in the first place.
I don’t know what you mean by “purity of human desire”.
The main reason for why I was able to abandon religion was to realize that what I want implies what is right.
And if you modify this to say a certain subset of what you want—the subset you’d still call “right” given omniscience, I think—then it seems correct, as far as it goes. It just doesn’t get you any closer to a more detailed answer, specifying the subset in question.
Or not much closer. At best it tells you not to worry that you ‘are’ fundamentally evil and that no amount of information would change that.
The main reason for why I was able to abandon religion was to realize that what I want implies what is right. That still feels intuitively right. I didn’t expect to see many people on LW to argue that there exist preference/(agent/mind)-independent moral statements like ‘it is right to help people’ or ‘killing is generally wrong’.
For what it’s worth, I’m also one of those people, and I never did have religion. I don’t know if there’s a correlation there.
The main reason for why I was able to abandon religion was to realize that what I want implies what is right. That still feels intuitively right. I didn’t expect to see many people on LW to argue that there exist preference/(agent/mind)-independent moral statements like ‘it is right to help people’ or ‘killing is generally wrong’.
It is useful to think of right and wrong as being some agent’s preferences. That agent doesn’t have to be you—or even to exist IRL. If you are a sadist (no slur intended) you might want to inflict pain—but that would not make it “right”—in the eyes of conventional society.
It is fairly common to use “right” and “wrong” to describe society-level preferences.
If you are a sadist (no slur intended) you might want to inflict pain—but that would not make it “right”—in the eyes of conventional society.
Why would a sadistic Boltzmann brain conclude that it is wrong to be a sadistic Boltzmann brain? Whatever some society thinks is completely irrelevant to an agent with outlier preferences.
This strikes me as an ill-formed question for reasons I tried to get at in No License To Be Human. When Gary asks “What is right?” he is asking the question e.g. “What state of affairs will help people have more fun?” and not “What state of affairs will match up with the current preferences of Gary’s brain?” and the proof of this is that if you offer Gary a pill to change his preferences, Gary won’t take it because this won’t change what is right. Gary’s preferences are about things like fairness, not about Gary’s preferences. Asking what justifies should_Gary to Gary is either answered by having should_Gary wrap around and judge itself (“Why, yes, it does seem better to care about fairness than about one’s own desires”) or else is a malformed question implying that there is some floating detachable ontologically basic property of rightness, apart from particular right things, which could be ripped loose of happiness and applied to pain instead and make it good to do evil.
Shouldness does incorporate a concept of reflective equilibrium (people recognize apparent changes in their own preferences as cases of being “mistaken”), but should_Gary makes no mention of Gary (except insofar as Gary’s welfare is one of Gary’s terminal values) but instead is about a large logical function which explicitly mentions things like fairness and beauty. This large function is rightness which is why Gary knows that you can’t change what is right by messing with Gary’s brain structures or making Gary want to do something else.
You can arrive at a concise metaethical understanding of what sort of thing shouldness is. It is not possible to concisely write out the large function that any particular human refers to by “should”, which is why all attempts at definition seem to fall short; and since for any particular definition it always seems like “should” is detachable from that definition, this reinforces the false impression that “should” is an undefinable extra supernatural property a la Moore’s Open Question.
By far the hardest part of naturalistic metaethics is getting people to realize that it changes absolutely nothing about morals or emotions, just like the fact of a deterministic physical universe never had any implications for the freeness of our will to begin with.
I also note that although morality is certainly not written down anywhere in the universe except human brains, what is written is not about human brains, it is about things like fairness; nor is it written that “being written in a human brain” grants any sort of normative status. So the more you talk about “fulfilling preferences”, the less the subject matter of what you are discussing resembles the subject matter that other people are talking about when they talk about morality, which is about how to achieve things like fairness. But if you built a Friendly AI, you’d build it to copy “morality” out of the brains where that morality is written down, not try to manually program in things like fairness (except insofar as you were offering a temporary approximation explicitly defined as temporary). It is likewise extremely hard to get people to realize that this level of indirection, what Bostrom terms “indirect normativity”, is as close as you can get to getting any piece of physical matter to compute what is right.
If you want to talk about the same thing other people are talking about when they talk about what’s right, I suggest consulting William Frankena’s wonderful list of some components of the large function:
“Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one’s own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc.”
(Just wanted to quote that so that I didn’t entirely fail to talk about morality in between all this stuff about preferences and metaethics.)
Damn. I still haven’t had my “Aha!” moment on this. I’m glad that ata, at least, appears to have it, but unfortunately I don’t understand ata’s explanation, either.
I’ll understand if you run out of patience with this exercise, but I’m hoping you won’t, because if I can come to understand your meta-ethical theory, then perhaps I will be able to explain it to all the other people on Less Wrong who don’t yet understand it, either.
Let me start by listing what I think I do understand about your views.
1. Human values are complex. As a result of evolution and memetic history, we humans value/desire/want many things, and our values cannot be compressed to any simple function. Certainly, we do not only value happiness or pleasure. I agree with this, and the neuroscience supporting your position is nicely summarized in Tim Schroeder’s Three Faces of Desire. We can value damn near anything. There is no need to design an artificial agent to value only one thing, either.
2. Changing one’s meta-ethics need not change one’s daily moral behavior. You write about this here, and I know it to be true from personal experience. When deconverting from Christianity, I went from divine command theory to error theory in the course of about 6 months. About a year after that, I transitioned from error theory to what was then called “desire utilitarianism” (now called “desirism”). My meta-ethical views have shifted in small ways since then, and I wouldn’t mind another radical transition if I can be persuaded. But I’m not sure yet that desirism and your own meta-ethical theory are in conflict.
3. Onlookers can agree that Jenny has 5 units of Fred::Sexiness, which can be specified in terms of curves, skin texture, etc. This specification need not mention Fred at all. As explained here.
4. Recursive justification can’t “hit bottom” in “an ideal philosophy student of perfect emptiness”; all I can do is reflect on my mind’s trustworthiness, using my current mind, in a process of something like reflective equilibrium, even though reflective coherence isn’t specified as the goal.
5. Nothing is fundamentally moral. There is nothing that would have value if it existed in an isolated universe all by itself that contained no valuers.
Before I go on… do I have this right so far?
1-4 yes.
5 is questionable. When you say “Nothing is fundamentally moral” can you explain what it would be like if something was fundamentally moral? If not, the term “fundamentally moral” is confused rather than untrue; it’s not that we looked in the closet of fundamental morality and found it empty, but that we were confused and looking in the wrong closet.
Indeed my utility function is generally indifferent to the exact state of universes that have no observers, but this is a contingent fact about me rather than a necessary truth of metaethics, for indifference is also a value. A paperclip maximizer would very much care that these uninhabited universes contained as many paperclips as possible—even if the paperclip maximizer were outside that universe and powerless to affect its state, in which case it might not bother to cognitively process the preference.
You seem to be angling for a theory of metaethics in which objects pick up a charge of value when some valuer values them, but this is not what I think, because I don’t think it makes any moral difference whether a paperclip maximizer likes paperclips. What makes moral differences are things like, y’know, life, consciousness, activity, blah blah.
Eliezer,
In Setting Up Metaethics, you wrote:
I didn’t know what “fundamentally moral” meant, so I translated it to the nearest term with which I’m more familiar, what Mackie called “intrinsic prescriptivity.” Or, perhaps more clearly, “intrinsic goodness,” following Korsgaard:
So what I mean to say in (5) is that nothing is intrinsically good (in Korsgaard’s sense). That is, nothing has value in itself. Things only have value in relation to something else.
I’m not sure whether this notion of intrinsic value is genuinely confused or merely not-understood-by-Luke-Muehlhauser, but I’m betting it is either confused or false. (“Untrue” is the term usually used to capture a statement’s being either incoherent or meaningful-and-false: see for example Richard Joyce on error theory.)
But now, I’m not sure you agree with (5) as I intended it. Do you think life, consciousness, activity, and some other things have value-in-themselves? Do these things have intrinsic value?
Thanks again for your reply. I’m going to read Chappell’s comment on this thread, too.
Do you think a heap of five pebbles is intrinsically prime, or does it get its primeness from some extrinsic thing that attaches a tag with the five English letters “PRIME” and could in principle be made to attach the same tag to composite heaps instead? If you consider “beauty” as the logical function your brain’s beauty-detectors compute, then is a screensaver intrinsically beautiful?
Does the word “intrinsic” even help, considering that it invokes bad metaphysics all by itself? In the physical universe there are only quantum amplitudes. Moral facts are logical facts, but not all minds are compelled by that-subject-matter-which-we-name-”morality”; one could as easily build a mind to be compelled by the primality of a heap of pebbles.
So the short answer is that there are different functions that use the same labels to designate different relations while we believe that the same labels designate the same functions?
I wonder if Max Tegmark would have written a similar comment. I’m not sure if there is a meaningful difference regarding Luke’s question to say that there are only quantum amplitudes versus there are only relations.
What I’m saying is that in the physical world there are only causes and effects, and the primeness of a heap of pebbles is not an ontologically basic fact operating as a separate and additional element of physical reality, but it is nonetheless about as “intrinsic” to the heap of pebbles as anything.
Once morality stops being mysterious and you start cashing it out as a logical function, the moral awfulness of a murder is exactly as intrinsic as the primeness of a heap of pebbles. Just as we don’t care whether pebble heaps are prime or experience any affect associated with its primeness, the Pebblesorters don’t care or compute whether a murder is morally awful; and this doesn’t mean that a heap of five pebbles isn’t really prime or that primeness is arbitrary, nor yet that on the “moral Twin Earth” murder could be a good thing. And there are no little physical primons associated with the pebble-heap that could be replaced by compositons to make it composite without changing the number of pebbles; and no physical stone tablet on which morality is written that could be rechiseled to make murder good without changing the circumstances of the murder; but if you’re looking for those you’re looking in the wrong closet.
Are you arguing that the world is basically a cellular automaton and that therefore beauty is logically implied to be a property of some instance of the universe? If some agent does perceive beauty then that is a logically implied fact about the circumstances. Asking if another agent would perceive the same beauty could be rephrased as asking about the equality of the expressions of an equation?
I think a lot of people are arguing about the ambiguity of the string “beauty” as it is multiply realized.
Good answer!
It is rather difficult to ask that question in the way you intend it. Particularly if the semantics have “because I say so” embedded rather than supplemented.
BTW, in your post Are Your Enemies Innately Evil?, I think you are making a similar mistake about the concept of evil.
“Innately” is being used in that post in the sense of being a fundamental personality trait or a strong predisposition (as in “Correspondance Bias”, to which that post is a followup). And fundamental personality traits and predispositions do exist — including some that actually do predispose people toward being evil (e.g. sociopathy) — so, although the phrase “innately evil” is a bit dramatic, I find its meaning clear enough in that post’s context that I don’t think it’s a mistake similar to “fundamentally moral”. It’s not arguing about whether there’s a ghostly detachable property called “evil” that’s independent of any normal facts about a person’s mind and history.
He did, by implication, in describing what it’s like if nothing is:
Clearly, many of the items on EY’s list, such as fun, humor, and justice, require the existence of valuers. The question above then amounts to whether all items of moral goodness require the existence of valuers. I think the question merits an answer, even if (see below) it might not be the one lukeprog is most curious about.
Unfortunately, lukeprog changed the terms in the middle of the discussion. Not that there is anything wrong with the new question (and I like EY’s answer).
What difference would CEV make from a universe in which a Paperclip Maximizer equipped everyone with the desire to maximize paperclips? Of what difference is a universe with as many discrete consciousness entities as possible from one with a single universe-spanning consciousness?
If it doesn’t make any difference, then how can we be sure that the SIAI won’t just implement the first fooming AI with whatever terminal goal it desires?
I don’t see how you can argue that the question “What is right?” is about the state of affairs that will help people to have more fun and yet claim that you don’t think that “it makes any moral difference whether a paperclip maximizer likes paperclips”
If a paperclip maximizer modified everyone such that we really only valued paperclips and nothing else, and we then ran CEV, then CEV would produce a powerful paperclip maximizer. This is… I’m not going to say it’s a feature, but it’s not a bug, at least. You can’t expect CEV to generate accurate information about morality if you erase morality from the minds it’s looking at. (You could recover some information about morality by looking at history, or human DNA (if the paperclip maximizer didn’t modify that), etc., but then you’d need a strategy other than CEV.)
I don’t think I understand your second question.
That depends on whether the paperclip maximizer is sentient, whether it just makes paperclips or it actually enjoys making paperclips, etc. If those are the case, then its preferences matter… a little. (So let’s not make one of those.)
All those concepts seem to be vague. To be sentient, to enjoy. Do you need to figure out how to define those concepts mathematically before you’ll be able to implement CEV? Or are you just going to let extrapolated human volition decide about that? If so, how can you possible make claims about how valuable, or how much the preference of a paperclip maximizer matter? Maybe it will all turn out to be wireheading in the end...
What is really weird is that Yudkowsky is using the word right in reference to actions affecting other agents, yet doesn’t think that it would be reasonable to assign moral weight to the preferences of a paperclip maximizer.
CEV will decide. In general, it seems unlikely that the preferences of nonsentient objects will have moral value.
Edit: Looking back, this comment doesn’t really address the parent. Extrapolated human volition will be used to determine which things are morally significant. I think it is relatively probable that wireheading might turn out to be morally necessary. Eliezer does think that the preferences of a paperclip maximizer would have moral value if one existed. (If a nonexistent paperclip maximizer had moral worth, so would a nonexistent paperclip minimizer. This isn’t completely certain, because paperclip maximizers might gain moral significance from a property other than existence that is not shared with paperclip minimizers, but at this point, this is just speculation and we can do little better without CEV.) A nonsentient paperclip maximizer probably has no more moral value than a rock with “make paperclips” written on the side.
The reason that CEV is only based on human preferences is because, as humans, we want to create an algorithm that does what is right and humans are the only things we have that know what is right. If other species have moral value then humans, if we knew more, would care about them. If there is nothing in human minds that could motivate us to care about some specific thing, than what reason could we possibly have for designing an AI to care about that thing?
near future : “you are paper clip maximazer! Kill him!”
What is this supposed to mean?
Paperclips aren’t part of fun, on EY’s account as I understand it, and therefore not relevant to morality or right. If paperclip maximizers believe otherwise they are simply wrong (perhaps incorrigibly so, but wrong nonetheless)… right and wrong don’t depend on the beliefs of agents, on this account.
So those claims seem consistent to me.
Similarly, a universe in which a PM equipped everyone with the desire to maximize paperclips would therefore be a universe with less desire for fun in it. (This would presumably in turn cause it to be a universe with less fun in it, and therefore a less valuable universe.)
I should add that I don’t endorse this view, but it does seem to be pretty clearly articulated/presented. If I’m wrong about this, then I am deeply confused.
I don’t understand how someone can arrive at “right and wrong don’t depend on the beliefs of agents”.
I conclude that you use “I don’t understand” here to indicate that you don’t find the reasoning compelling. I don’t find it compelling, either—hence, my not endorsing it—so I don’t have anything more to add on that front.
If those people propose that utility functions are timeless (e.g. the Mathematical Universe), or simply an intrinsic part of the quantum amplitudes that make up physical reality (is there a meaningful difference?), then under that assumption I agree. If beauty can be captured as a logical function then women.beautiful is right independent of any agent that might endorse that function. The problem of differing tastes, differing aesthetic value, that lead to sentences like “beauty is in the eye of the beholder” are a result of trying to derive functions by the labeling of relations. There can be different functions that designate the same label to different relations. x is R-related to y can be labeled “beautiful” but so can xSy. So while some people talk about the ambiguity of the label beauty and conclude that what is beautiful is agent-dependent, other people talk about the set of functions that are labeled as beauty-function or assign the label beautiful to certain relations and conclude that their output is agent-independent.
(nods) Yes, I think EY believes that rightness can be computed as a property of physical reality, without explicit reference to other agents.
That said, I think he also believes that the specifics of that computation cannot be determined without reference to humans. I’m not 100% clear on whether he considers that a mere practical limitation or something more fundamental.
After trying to read No License To Be Human I officially give up reading the sequences for now and postpone it until I learnt a lot more. I think it is wrong to suggest that anyone can read the sequences. Either you’ve to be a prodigy or a post-graduate. The second comment on that post expresses my own feelings, can people actually follow Yudkowsky’s posts? It’s over my head.
I agree with you sentiment, but I suggest not giving up so easily. I have the same feeling after many sequence posts, but some of them that I groked were real gems and seriously affected my thinking.
Also, borrowing some advice on reading hard papers, it’s re-reading that makes a difference.
Also, as my coach put it “the best stretching for doing sidekicks is actually doing sidekicks”.
I do not necessarily disagree with this, but the following:
… does not prove the claim. Gary would still not take the pill if the question he was asking was “What state of affairs will match up with the current preferences of Gary’s brain?”. A reference to the current preferences of Gary’s brain is different to asking the question “What is a state of affairs in which there is a high satisfaction of the preferences in the brain of Gary?”.
Perhaps a better thought experiment, then, is to offer Gary the chance to travel back in time and feed his 2-year-old self the pill. Or, if you dislike time machines in your thought experiments, we can simply ask Gary whether or not he now would have wanted his parents to have given him the pill when he was a child. Presumably the answer will still be no.
If timetravel is to be considered then we must emphasize that when we say ‘current preferences’ we do not mean “preferences at time Time.now, whatever we can make those preferences be” but rather “I want things X, Y, Z to happen, regardless of the state of the atoms that make up me at this or any other time.” Changing yourself to not want X, Y or Z will make X, Y and Z less likely to happen so you don’t want to do that.
It seems so utterly wrong to me that I concluded it must be me who simply doesn’t understand it. Why would it be right to help people to have more fun if helping people to have more fun does not match up with your current preferences. The main reason for why I was able to abandon religion was to realize that what I want implies what is right. That still feels intuitively right. I didn’t expect to see many people on LW to argue that there exist preference/(agent/mind)-independent moral statements like ‘it is right to help people’ or ‘killing is generally wrong’. I got a similar reply from Alicorn. Fascinating. This makes me doubt my own intelligence more than anything I’ve so far come across. If I parse this right it would mean that a Paperclip Maximizer is morally bankrupt?
Well, something I’ve been noticing is that in their tell your rationalist origin stories, the reason a lot of people give for why they left their religion aren’t actually valid arguments. Make of that what you will.
Yes. It is morally bankrupt. (or would you not mind turning into paperclips if that’s what the Paperclip Maximizer wanted?)
BTW, your current position is more-or-less what theists mean when they say atheists are amoral.
Yes, but that is a matter of taste.
Why would I ever change my current position? If Yudkowsky told me there was some moral laws written into the fabric of reality, what difference would that make? Either such laws are imperative, so that I am unable to escape them, or I simply ignore them if they are opposing my preferences.
Assume all I wanted to do is to kill puppies. Now Yudkowsky told me that this is prohibited and I will suffer disutility because of it. The crucial question would be, does the disutility outweigh the utility I assign to killing puppies? If it doesn’t, why should I care?
Perhaps you assign net utility to killing puppies. If you do, you do. What EY tells you, what I tell you, what is prohibited, etc., has nothing to do with it. Nothing forces you to care about any of that.
If I understand EY’s position, it’s that it cuts both ways: whether killing puppies is right or wrong doesn’t force you to care, but whether or not you care doesn’t change whether it’s right or wrong.
If I understand your position, it’s that what’s right and wrong depends on the agent’s preferences: if you prefer killing puppies, then killing puppies is right; if you don’t, it isn’t.
My own response to EY’s claim is “How do you know that? What would you expect to observe if it weren’t true?” I’m not clear what his answer to that is.
My response to your claim is “If that’s true, so what? Why is right and wrong worth caring about, on that model… why not just say you feel like killing puppies?”
I don’t think those terms are useless, that moral doesn’t exist. But you have to use those words with great care, because on its own they are meaningless. If I know what you want, I can approach the conditions that would be right for you. If I know how you define morality, I can act morally according to you. But I will do so only if I care about your preferences. If part of my preferences is to see other human beings happy then I have to account for your preferences to some extent, which makes them a subset of my preferences. All those different values are then weighted accordingly. Do you disagree with that understanding?
I agree with you that your preferences account for your actions, and that my preferences account for my actions, and that your preferences can include a preference for my preferences being satisfied.
But I think it’s a mistake to use the labels “morality” and “preferences” as though they are interchangeable.
If you have only one referent—which it sounds like you do—then I would recommend picking one label and using it consistently, and not use the other at all. If you have two referents, I would recommend getting clear about the difference and using one label per referent.
Otherwise, you introduce way too many unnecessary vectors for confusion.
It seems relatively clear to me that EY has two referents—he thinks there are two things being talked about. If I’m right, then you and he disagree on something, and by treating the language of morality as though it referred to preferences you obscure that disagreement.
More precisely: consider a system S comprising two agents A and B, each of which has a set of preferences Pa and Pb, and each of which has knowledge of their own and the other’s preferences. Suppose I commit an act X in S.
If I’ve understood correctly, you and EY agree that knowing all of that, you know enough in principle to determine whether X is right or wrong. That is, there isn’t anything left over, there’s no mysterious essence of rightness or external privileged judge or anything like that.
In this, both of you disagree with many other people, such as theists (who would say that you need to consult God’s will to make that determination) and really really strict consequentialists (who would say that you need to consult the whole future history of the results of X to make that determination).
If I’ve understood correctly, you and EY disagree on symmetry. That is, if A endorses X and B rejects X, you would say that whether X is right or not is undetermined… it’s right by reference to A, and wrong by reference to B, and there’s nothing more to be said. EY, if I understand what he’s written, would disagree—he would say that there is, or at least could be, additonal computation to be performed on S that will tell you whether X is right or not.
For example, if A = pebblesorters and X = sorting four pebbles into a pile, A rejects X, and EY (I think) would say that A is wrong to do so… not “wrong with reference to humans,” but simply wrong. You would (I think) say that such a distinction is meaningless, “wrong” is always with reference to something. You consider “wrong” a two-place predicate, EY considers “wrong” a one-place predicate—at least sometimes. I think.
For example, if A = SHFP and B = humans and X = allowing people to experience any pain at all, A rejects X and B endorses X. You would say that X is “right_human” and “wrong_SHFP” and that whether X is right or not is insufficiently specified question. EY would say that X is right and the SHFP are mistaken.
So, I disagree with your understanding, or at least your labeling, insofar as it leads you to elide real disagreements. I endorse clarity about disagreement.
As for whether I agree with your position or EY’s, I certainly find yours easier to justify.
Thanks for this, very enlightening! A very good framing and analysis of my beliefs.
Yeah. While I’m reasonably confident that he holds the belief, I have no confidence in any theories how he arrives at that belief.
What I have gotten from his writing on the subject is a combination of “Well, it sure seems that way to me,” and “Well, if that isn’t true, then I don’t see any way to build a superintelligence that does the right thing, and there has to be a way to build a superintelligence that does the right thing.” Neither of which I find compelling.
But there’s a lot of the metaethics sequence that doesn’t make much sense to me at all, so I have little confidence that what I’ve gotten out of it is a good representation of what’s there.
It’s also possible that I’m completely mistaken and he simply insists on “right” as a one-place predicate as a rhetorical trick; a way of drawing the reader’s attention away from the speaker’s role in that computation.
I am fairly sure EY would say (and I agree) that there’s no reason to expect them to. Different agents with different preferences will have different beliefs about right and wrong, possibly incorrigibly different.
Humans and Babykillers as defined will simply never agree about how the universe would best be ordered, even if they come to agree (as a political exercise) on how to order the universe, without the exercise of force (as the SHFP purpose to do, for example).
Um.
Certainly, this model says that you can order world-states in terms of their rightness and wrongness, and there might therefore be a single possible world-state that’s most right within the set of possible world-states (though there might instead be several possible world-states that are equally right and better than all other possibilities).
If there’s only one such state, then I guess “right” could designate a future world state; if there are several, it could designate a set of world states.
But this depends on interpreting “right” to mean maximally right, in the same sense that “cold” could be understood to designate absolute zero. These aren’t the ways we actually use these words, though.
I don’t see what the concept of free will contributes to this discussion.
I’m fairly certain that EY would reject the idea that what’s right is logically implied by cause and effect, if by that you mean that an intelligence that started out without the right values could somehow infer, by analyzing causality in the world, what the right values were.
My own jury is to some degree still out on this one. I’m enough of a consequentialist to believe that an adequate understanding of cause and effect lets you express all judgments about right and wrong action in terms of more and less preferable world-states, but I cannot imagine how you could derive “preferable” from such an understanding. That said, my failure of imagination does not constitute a fact about the world.
Humans and Babykillers are not talking about the same subject matter when they debate what-to-do-next, and their doing different things does not constitute disagreement.
There’s a baby in front of me, and I say “Humans and Babykillers disagree about what to do next with this baby.”
The one replies: “No, they don’t. They aren’t talking about the same subject when they debate what to do next; this is not a disagreement.”
“Let me rephrase,” I say. “Babykillers prefer that this baby be killed. Humans prefer that this baby have fun. Fun and babykilling can’t both be implemented on the same baby: if it’s killed, it’s not having fun; if it’s having fun, it hasn’t been killed.”
Have I left out anything of value in my restatement? If so, what have I left out?
More generally: given all the above, why should I care whether or not what humans and Babykillers have with respect to this baby is a disagreement? What difference does that make?
If you disagree with someone, and you’re both sufficiently rational, then you can expect to have a good shot at resolving your disagreement by arguing. That doesn’t work if you just have fundamentally different motivational frameworks.
I don’t know if I agree that a disagreement is necessarily resolvable by argument, but I certainly agree that many disagreements are so resolvable, whereas a complete difference of motivational framework is not.
If that’s what EY meant to convey by bringing up the question of whether Humans and Babykillers disagree, I agree completely.
As I said initially: “Humans and Babykillers as defined will simply never agree about how the universe would best be ordered.”
We previously debated the disagreements between those with different values here.
The dictionary apparently supports the idea that any conflict is a disagreement.
To understand the other side of the argument, I think it helps to look at this:
One side has redefined “disagreement” to mean “a difference of opinion over facts”!
I think that explains much of the sound and fury surrounding the issue.
A “difference of opinion over goals” is not a “difference of opinion over facts”.
However, note that different goals led to the cigarette companies denying the link between cigarettes and cancer—and also led to oil company AGW denialism—which caused many real-world disagreements.
All of which leaves me with the same question I started with. If I know what questions you and I give different answers to—be they questions about facts, values, goals, or whatever else—what is added to my understanding of the situation by asserting that we disagree, or don’t disagree?
ata’s reply was that “we disagree” additionally indicates that we can potentially converge on a common answer by arguing. That also seems to be what EY was getting at about hot air and rocks.
That makes sense to me, and sure, it’s additionally worth clarifying whether you and I can potentially converge on a common answer by arguing.
Anything else?
Because all of this dueling-definitions stuff strikes me as a pointless distraction. I use words to communicate concepts; if a word no longer clearly communicates concepts it’s no longer worth anything to me.
That doesn’t seem to be what the dictionary says “disagreement” means.
Maybe if both sides realise that the argument is pointless, they would not waste their time—but what if they don’t know what will happen? - or what if their disagreement is intended to sway not their debating partner, but a watching audience?
I agree with you about what the dictionary says, and that people might not know whether they can converge on a common answer, and that people might go through the motions of a disagreement for the benefit of observers.
We talk about what is good, and Babykillers talk about what is eat-babies, but both good and eat-babies perform analogous functions. For building a Friendly-AI we may not give a damn about how to categorize such analogous functions, but I’ve got a feeling that simply hijacking the word “moral” to suddenly not apply to such similar things, as I think it is usually used, you’ve successfully increased my confusion through the last year. Either this, or I’m back at square one. Probably the latter.
The fact that killing puppies is wrong follows from the definition of wrong. The fact that Eliezer does not want to do what is wrong is a fact about his brain, determined by introspection.
Because right is a rigid designator. It refers to a specific set of terminal values. If your terminal values don’t match up with this specific set of values, then they are wrong, i.e. not right. Not that you would particularly care, of course. From your perspective, you only want to maximize your own values and no others. If your values don’t match up with the values defined as moral, so much for morality. But you still should be moral because should, as it’s defined here, refers to a specific set of terminal values—the one we labeled “right.”
(Note: I’m using the term should exactly as EY uses it, unlike in my previous comments in these threads. In my terms, should=should_human and on the assumption that you, XiXiDu, don’t care about the terminal values defined as right, should_XiXiDu =/= should)
I’m getting the impression that nobody here actually disagrees but that some people are expressing themselves in a very complicated way.
I parse your comment to mean that the definition of moral is a set of terminal values of some agents and should is the term that they use to designate instrumental actions that do serve that goal?
Your second paragraph looks correct. ‘Some agents’ refers to humanity rather than any group of agents. Technically, should is the term anything should use when discussing humanity’s goals, at least when speaking Eliezer.
Your first paragraph is less clear. You definitely disagree with others. There are also some other disagreements.
Correct, I disagree. What I wanted to say with my first paragraph was that I might disagree because I don’t understand what others believe because they expressed it in a way that was too complicated for me to grasp. You are also correct that I myself was not clear in what I tried to communicate.
ETA That is if you believe that disagreement fundamentally arises out of misunderstanding as long as one is not talking about matters of taste.
In Eliezer’s metaethics, all disagreement are from misunderstanding. A paperclip maximizer agrees about what is right, it just has no reason to act correctly.
To whoever voted the parent down, this is edit nearly /edit exactly correct. A paperclip maximizer could, in principle, agree about what is right. It doesn’t have to, I mean a paperclip maximizer could be stupid, but assuming it’s intelligent enough, it could discover what is moral. But a paperclip maximizer doesn’t care about what is right, it only cares about paperclips, so it will continue maximizing paperclips and only worry about what is “right” when doing so helps it create more paperclips. Right is a specific set of terminal values that the paperclip maximizer DOESN”T have. On the other hand you, being human, do have those terminal values on EY’s metaethics.
Agreed that a paperclip maximizer can “discover what is moral,” in the sense that you’re using it here. (Although there’s no reason to expect any particular PM to do so, no matter how intelligent it is.)
Can you clarify why this sort of discovery is in any way interesting, useful, or worth talking about?
It drives home the point that morality is an objective feature of the universe that doesn’t depend on the agent asking “what should I do?”
Huh. I don’t see how it drives home that point at all. But OK, at least I know what your intention is… thank you for clarifying that.
Fascinating. I still don’t understand in what sense this could be true, except maybe the way I tried to interpret EY here and here. But those comments simply got downvoted without any explanation or attempt to correct me, therefore I can’t draw any particular conclusion from those downvotes.
You could argue that morality (what is right?) is human and other species will agree that from a human perspective what is moral is right is right is moral. Although I would agree, I don’t understand how such a confusing use of terms is helpful.
Morality is just a specific set of terminal values. It’s an objective feature of the universe because… humans have those terminal values. You can look inside the heads of humans and discover them. “Should,” “right,” and “moral,” in EY’s terms, are just being used as a rigid designators to refer to those specific values.
I’m not sure I understand the distinction between “right” and “moral” in your comment.
I was the second to vote down the grandparent. It is not exactly correct. In particular it claims “all disagreement” and “a paperclip maximiser agrees”, not “could in principle agree”.
While the comment could perhaps be salvaged with some tweaks, as it stands it is not correct and would just serve to further obfuscate what some people find confusing as it is.
I concede that I was implicitly assuming that all agents have access to the same information. Other than that, I can think of no source of disagreements apart from misunderstanding. I also meant that if paperclip maximizer attempted to find out what is right and did not make any mistakes, it would arrive at the same answer as a human, though there is not necessarily any reason for it to try in the first place. I do not think that these distinctions were nonobvious, but this may be overconfidence on my part.
Can you say more about how the sufficiently intelligent paperclip maximizer goes about finding out what is right?
Depends on how the question is asked. Does the paperclip maximizer have the definition of the word right stored in its memory? If so, it just consults the memory. Otherwise, the questioner would have to either define the word or explain how to arrive at a definition.
This may seem like cheating, but consider the analogous case where we are discussing prime numbers. You must either already know what a prime number is, or I must tell you, or I must tell you about mathematicians, and you must observe them.
As long as a human and a paperclip maximizer both have the same information about humans, they will both come to the same conclusions about human brains, which happen to encode what is right, thus allowing both the human and the paperclip maximizer to learn about morality. If this paperclip maximizer then chooses to wipe out humanity in order to get more raw materials, it will knows that its actions are wrong; it just has no term in its utility function for morality.
Sure, agreed: if I tell the PM that thus-and-such is labeled “right,” or “moral,” or “fleabag,” then it will know these things, and it won’t care.
I have entirely lost track of why this is important.
Eliezer believes that you desire to do what is right. It is important to remember that what is right has nothing to do with whether you desire it. Moral facts are interesting because they describe our desires, but they would be true even if our desires were different.
In general, these things are useful for programming FAI and evaluating moral arguments. We should not allow our values to drift too far over time. The fact that wireheads want to be wireheaded is not a a valid argument in favour of wireheading. A FAI should try to make reality match what is right, not make reality match people’s desires (the latter could be accomplished by changing people’s desires). We can be assured that we are acting morally even if there is no magic light from the sky telling us that we are. Moral goals should be pursued. Even if society condones that which is wrong, it is still wrong. Studying the human brain is necessary in order to learn more about morality. When two people disagree about morality, one or both of them is wrong.
Sure.
And if it turns out that humans currently want something different than what we wanted a thousand years ago, then it follows that a thousand years ago we didn’t want what was right, and now we do… though if you’d asked us a thousand years ago, we’d have said that we want what is right, and we’d have arrived at that conclusion through exactly the same cognitive operations we’re currently using. (Of course, in that case we would be mistaken, unlike the current case.)
And if it turns out that a thousand years from now humans want something different, then we will no longer want what is right… though if you ask us then, we’ll say we want what is right, again using the same cognitive operations. (Again, in that case we would be mistaken.)
And if there turn out to be two groups of humans who want incompatible things (for example, because their brains are sufficiently different), then whichever group I happen to be in wants what is right, and the other group doesn’t… though if you ask them, they’ll (mistakenly) say they want what is right, again using the same cognitive operations.
All of which strikes me as a pointlessly confusing way of saying that I endorse what humans-sufficiently-like-me currently want, and don’t endorse what we used to want or come to want or what anyone else wants if it’s too different from that.
Talking about whether some action is right or wrong or moral seems altogether unnecessary on this view. It is enough to say that I endorse what I value, and will program FAI to optimize for that, and will reject moral arguments that are inconsistent with that, and etc. Sure, if I valued something different, I would endorse that instead, but that doesn’t change anything; if I were hit by a speeding train, I’d be dead, but it doesn’t follow that I am dead. I endorse what I value, which means I consider worlds in which there is less of what I value worse than worlds in which there is more of what I value—even if those worlds also include versions of me that endorse something different. Fine and dandy.
What is added to that description by introducing words like right and wrong and moral, other than the confusion caused by people who assume those words refer to a magic light from the sky? It seems no more useful, on this view, than talking about how certain acts are salvatory or diabolical or fleabag.
If the people a thousand years ago might have wanted what is right, but were mistaken as to what they really wanted. People do not understand their own brains. (You may agree with this; it is unclear from your wording.) Even if they really did have different desires they would not be mistaken. Even if they used the same sound - ‘right’ - they would be attaching a different meaning to it, so it would be a different word. They would be incorrect if they did not recognize our values as right in Eliezer-speak.
This is admitted a nonintuitive meaning. I do not know if there is a clearer way of saying things and I am unsure of what aspects of most people’s understanding of the word Eliezer believes this to capture. The alternative does not seem much clearer. Consider Eliezer’s example of pulling a child off of some train tracks. If you see me do so, you could explain it in terms of physics/neuroscience. If you ask me about it, I could mention the same explanation, but I also have another one. Why did seeing the child motivate me to save it? Yes, my neural pathways caused it, but I was not thinking about those neural pathway; that would be a level confusion. I was thinking about what is right. Saying that I acted because of neuroscience is true, but saying nothing else promotes level confusion. If you ask me what should happen if I were uninvolved or if my brain were different, I would not change my answer from if I were involved because should is a 1-place function. People do get confused about these things, especially when talking about AI, and that should be stopped. For many people, Eliezer did not resolve confusion, so we need to do better, but default language is no less clear than Eliezer-speak. (To the extent that I agree with Eliezer, I came to this agreement after having read the sequences, but directly after reading other arguments.)
I agree that people don’t fully understand their own brains. I agree that it is possible to have mistaken beliefs about what one really wants. I agree that on EY’s view any group that fails to identify our current values as right is mistaken.
I think EY’s usage of “right” in this context leads to unnecessary confusion.
The alternative that seems clearer to me, as I’ve argued elsewhere, is to designate our values as our values, assert that we endorse our values, engage in research to articulate our values more precisely, build systems to optimize for our values, and evaluate moral arguments in terms of how well they align with our values.
None of this requires further discussion of right and wrong, good and evil, salvatory and diabolical, etc., and such terms seem like “applause lights” better-suited to soliciting alliances than anything else.
If you ask me why I pulled the child off the train tracks, I probably reply that I didn’t want the child to die. If you ask me why I stood on the platform while the train ran over the child, I probably reply that I was paralyzed by shock/fear, or that I wasn’t sure what to do. In both cases, the actual reality is more complicated than my self-report: there are lots of factors that influence what I do, and I’m not aware of most of them.
I agree with you that people get confused about these things. I agree with you that there are multiple levels of description, and mixing them leads to confusion.
If you ask me whether the child should be pulled off the tracks, I probably say “yes”; if you ask me why, I probably get confused. The reason I get confused is because I don’t have a clear understanding of how I come to that conclusion; I simply consulted my preferences.
Faced with that confusion, people make up answers, including answers like “because it’s right to do so” or “because it’s wrong to let the child die” or “because children have moral value” or “because pulling the child off the tracks has shouldness” or a million other such sequences of words, none of which actually help resolve the confusion. They add nothing of value.
There are useful ways to address the question. There are things that can be said about how my preferences came to be that way, and what the consequences are of my preferences being that way, and whether my preferences are consistent. There are techniques for arriving at true statements in those categories.
As far as I can tell, talking about what’s right isn’t among them, any more than talking about what God wants is. It merely adds to the confusion.
I agree with everything non-linguistic If we get rid of words like right, wrong, and should, then we are forced to either come up with new words or use ‘want’ and ‘desire’. The first option is confusing and the second can make us seem like egoists or like people who think that wireheading is right because wireheaded people desire it. To someone unfamiliar with this ethical theory, it would be very misleading. Even many of the readers of this website would be confused if we only used words like ‘want’. What we have now is still far from optimal.
...and ‘preference’ and ‘value’ and so forth. Yes.
If I am talking about current human values, I endorse calling them that, and avoiding introducing new words (like “right”) until there’s something else for those words to designate.
That neither implies that I’m an egoist, nor that I endorse wireheading.
I agree with you that somebody might nevertheless conclude one or both of those things. They’d be mistaken.
I don’t think familiarity with any particular ethical theory is necessary to interpret the lack of a word, though I agree with you that using a word in the absence of a shared theory about its meaning leads to confusion. (I think most usages of “right” fall into this category.)
If you are using ‘right’ to designate something over and above current human values, I endorse you using the word… but I have no idea at the moment what that something is.
I tentatively agree with your wording, though I will have to see if there are any contexts where it fails.
By definition, wouldn’t humans be unable to want to pursue such a thing?
Not necessarily.
For example, if humans value X, and “right” designates Y, and aliens edit our brains so we value Y, then we would want to pursue such a thing. Or if Y is a subset of X, we might find it possible to pursue Y instead of X. (I’m less sure about that, though.) Or various other contrived possibilities.
But supposing it were true, why would it matter?
Yes, my statement was way too strong. In fact, it should be much weaker than even what you say; just start a religion that tells people to value Y. I was attempting to express an actual idea that I had with this sentence originally, but my idea was wrong, so never mind.
What does this mean? Supposing that something were right, what would it matter to humans? You could get it to matter to humans by exploiting their irrationality, but if CEV works, it would not matter to that.
What would it even mean for this to be true? You’d need a definition of right.
How is this helpful? Here is how I would paraphrase the above (as I understand it):
Human brains cause human action through an ambivalent decision process.
What does this tell about wireheading? I think wireheading might increase pleasure but at the same time feel that it would be wrong. So? All that means is that I have complex and frequently ambivalent preferences and that I use an inaccurate and ambivalent language to describe them. What important insight am I missing?
The important thing about wireheading in this context is that desires after being wireheaded do not matter. The pleasure is irrelevant for this purpose; we could just as easily imagine humans being wireheaded to feel pain, but to desire continuing to feel pain. The point is that what is right should be pursued because it is right, not because people desire it. People’s desires are useful as a way of determining what is right, but if it is known that people desires were altered in some way, they stop providing evidence as to what is right. This understanding is essential to a superintelligence considering the best way to alter peoples brains.
That’s expressed very clearly, thanks. I don’t want to sound rude, I honestly want to understand this. I’m reading your comment and can’t help but think that you are arguing about some kind of universal right. I still can’t pinpoint the argument. Why isn’t it completely arbitrary if we desire to feel pain or pleasure? Is the right answer implied by our evolutionary history? That’s a guess, I’m confused.
Aren’t our desires altered constantly by mutation, nurture, culture and what we experience and learn? Where can you find the purity of human desire?
I get that you are having trouble understanding this; it is hard and I am much worse at explaining thing in text than in person.
What is right is universal in the sense that what is right would not change if our brains were different. The fact that we care about what is right is caused by our evolutionary history. If we evolved differently, we would have different values, wanting what is gleerp rather than what is right. The differences would be arbitrary to most minds, but not to us. One of the problems of friendliness is ensuring that it is not arbitrary to the AI either.
There are two types of this; we may learn more about our own values, which is good and which Eliezer believes to be the cause of “moral progress”, or our values may really change. The second type of changes to our desires really are bad. People actually do this, like those who refuse to expose themselves to violence because they think that it will desensitize them from violence. They are really just refusing to take Gandhi’s murder pill, but on a smaller scale. If you have a transtemporal disagreement with your future self on what action you future self should take, your future self will win, because you will no longer exist. The only way to prevent this is to simply refuse to allow your values to change, preventing your future self from disagreeing with you in the first place.
I don’t know what you mean by “purity of human desire”.
Yep, with the caveat that endoself added below: “should” refers to humanity’s goals, no matter who is using the term (on EY’s theory and semantics).
And if you modify this to say a certain subset of what you want—the subset you’d still call “right” given omniscience, I think—then it seems correct, as far as it goes. It just doesn’t get you any closer to a more detailed answer, specifying the subset in question.
Or not much closer. At best it tells you not to worry that you ‘are’ fundamentally evil and that no amount of information would change that.
For what it’s worth, I’m also one of those people, and I never did have religion. I don’t know if there’s a correlation there.
It is useful to think of right and wrong as being some agent’s preferences. That agent doesn’t have to be you—or even to exist IRL. If you are a sadist (no slur intended) you might want to inflict pain—but that would not make it “right”—in the eyes of conventional society.
It is fairly common to use “right” and “wrong” to describe society-level preferences.
Why would a sadistic Boltzmann brain conclude that it is wrong to be a sadistic Boltzmann brain? Whatever some society thinks is completely irrelevant to an agent with outlier preferences.
Morality serves several functions:
It is a guide relating to what to do;
It is a guide relating to what behaviour to punish;
It allows for the signalling of goodness and virtue;
It allows agents to manipulate others, by labelling them or their actions as bad;
The lower items on the list have some significance, IMO.