Moral Error and Moral Disagreement

Eliezer YudkowskyAug 10, 2008, 11:32 PM

26 points

Followup to: Inseparably Right, Sorting Pebbles Into Correct Heaps

Richard Chappell, a pro, writes:

“When Bob says “Abortion is wrong”, and Sally says, “No it isn’t”, they are disagreeing with each other.

I don’t see how Eliezer can accommodate this. On his account, what Bob asserted is true iff abortion is prohibited by the morality_Bob norms. How can Sally disagree? There’s no disputing (we may suppose) that abortion is indeed prohibited by morality_Bob...

Since there is moral disagreement, whatever Eliezer purports to be analysing here, it is not morality.”

The phenomena of moral disagreement, moral error, and moral progress, on terminal values, are the primary drivers behind my metaethics. Think of how simple Friendly AI would be if there were no moral disagreements, moral errors, or moral progress!

Richard claims, “There’s no disputing (we may suppose) that abortion is indeed prohibited by morality_Bob.”

We may not suppose, and there is disputing. Bob does not have direct, unmediated, veridical access to the output of his own morality.

I tried to describe morality as a “computation”. In retrospect, I don’t think this is functioning as the Word of Power that I thought I was emitting.

Let us read, for “computation”, “idealized abstract dynamic”—maybe that will be a more comfortable label to apply to morality.

Even so, I would have thought it obvious that computations may be the subjects of mystery and error. Maybe it’s not as obvious outside computer science?

Disagreement has two prerequisites: the possibility of agreement and the possibility of error. For two people to agree on something, there must be something they are agreeing about, a referent held in common. And it must be possible for an “error” to take place, a conflict between “P” in the map and not-P in the territory. Where these two prerequisites are present, Sally can say to Bob: “That thing we were just both talking about—you are in error about it.”

Richard’s objection would seem in the first place to rule out the possibility of moral error, from which he derives the impossibility of moral agreement.

So: does my metaethics rule out moral error? Is there no disputing that abortion is indeed prohibited by morality_Bob?

This is such a strange idea that I find myself wondering what the heck Richard could be thinking. My best guess is that Richard, perhaps having not read all the posts in this sequence, is taking my notion of morality_Bob to refer to a flat, static list of valuations explicitly asserted by Bob. “Abortion is wrong” would be on Bob’s list, and there would be no disputing that.

But on the contrary, I conceive of morality_Bob as something that unfolds into Bob’s morality—like the way one can describe in 6 states and 2 symbols a Turing machine that will write 4.640 × 10¹⁴³⁹ 1s to its tape before halting.

So morality_Bob refers to a compact folded specification, and not a flat list of outputs. But still, how could Bob be wrong about the output of his own morality?

In manifold obvious and non-obvious ways:

Bob could be empirically mistaken about the state of fetuses, perhaps believing fetuses to be aware of the outside world. (Correcting this might change Bob’s instrumental values but not terminal values.)

Bob could have formed his beliefs about what constituted “personhood” in the presence of confusion about the nature of consciousness, so that if Bob were fully informed about consciousness, Bob would not have been tempted to talk about “the beginning of life” or “the human kind” in order to define personhood. (This changes Bob’s expressed terminal values; afterward he will state different general rules about what sort of physical things are ends in themselves.)

So those are the obvious moral errors—instrumental errors driven by empirical mistakes; and erroneous generalizations about terminal values, driven by failure to consider moral arguments that are valid but hard to find in the search space.

Then there are less obvious sources of moral error: Bob could have a list of mind-influencing considerations that he considers morally valid, and a list of other mind-influencing considerations that Bob considers morally invalid. Maybe Bob was raised a Christian and now considers that cultural influence to be invalid. But, unknown to Bob, when he weighs up his values for and against abortion, the influence of his Christian upbringing comes in and distorts his summing of value-weights. So Bob believes that the output of his current validated moral beliefs is to prohibit abortion, but actually this is a leftover of his childhood and not the output of those beliefs at all.

(Note that Robin Hanson and I seem to disagree, in a case like this, as to exactly what degree we should take Bob’s word about what his morals are.)

Or Bob could believe that the word of God determines moral truth and that God has prohibited abortion in the Bible. Then Bob is making metaethical mistakes, causing his mind to malfunction in a highly general way, and add moral generalizations to his belief pool, which he would not do if veridical knowledge of the universe destroyed his current and incoherent metaethics.

Now let us turn to the disagreement between Sally and Bob.

You could suggest that Sally is saying to Bob, “Abortion is allowed by morality_Bob”, but that seems a bit oversimplified; it is not psychologically or morally realistic.

If Sally and Bob were unrealistically sophisticated, they might describe their dispute as follows:

Bob: “Abortion is wrong.”

Sally: “Do you think that this is something of which most humans ought to be persuadable?”

Bob: “Yes, I do. Do you think abortion is right?”

Sally: “Yes, I do. And I don’t think that’s because I’m a psychopath by common human standards. I think most humans would come to agree with me, if they knew the facts I knew, and heard the same moral arguments I’ve heard.”

Bob: “I think, then, that we must have a moral disagreement: since we both believe ourselves to be a shared moral frame of reference on this issue, and yet our moral intuitions say different things to us.”

Sally: “Well, it is not logically necessary that we have a genuine disagreement. We might be mistaken in believing ourselves to mean the same thing by the words right and wrong, since neither of us can introspectively report our own moral reference frames or unfold them fully.”

Bob: “But if the meaning is similar up to the third decimal place, or sufficiently similar in some respects that it ought to be delivering similar answers on this particular issue, then, even if our moralities are not in-principle identical, I would not hesitate to invoke the intuitions for transpersonal morality.”

Sally: “I agree. Until proven otherwise, I am inclined to talk about this question as if it is the same question unto us.”

Bob: “So I say ‘Abortion is wrong’ without further qualification or specialization on what wrong means unto me.”

Sally: “And I think that abortion is right. We have a disagreement, then, and at least one of us must be mistaken.”

Bob: “Unless we’re actually choosing differently because of in-principle unresolvable differences in our moral frame of reference, as if one of us were a paperclip maximizer. In that case, we would be mutually mistaken in our belief that when we talk about doing what is right, we mean the same thing by right. We would agree that we have a disagreement, but we would both be wrong.”

Now, this is not exactly what most people are explicitly thinking when they engage in a moral dispute—but it is how I would cash out and naturalize their intuitions about transpersonal morality.

Richard also says, “Since there is moral disagreement...” This seems like a prime case of what I call naive philosophical realism—the belief that philosophical intuitions are direct unmediated veridical passports to philosophical truth.

It so happens that I agree that there is such a thing as moral disagreement. Tomorrow I will endeavor to justify, in fuller detail, how this statement can possibly make sense in a reductionistic natural universe. So I am not disputing this particular proposition. But I note, in passing, that Richard cannot justifiably assert the existence of moral disagreement as an irrefutable premise for discussion, though he could consider it as an apparent datum. You cannot take as irrefutable premises, things that you have not explained exactly; for then what is it that is certain to be true?

I cannot help but note the resemblance to Richard’s assumption that “there’s no disputing” that abortion is indeed prohibited by morality_Bob—the assumption that Bob has direct veridical unmediated access to the final unfolded output of his own morality.

Perhaps Richard means that we could suppose that abortion is indeed prohibited by morality_Bob, and allowed by morality_Sally, there being at least two possible minds for whom this would be true. Then the two minds might be mistaken about believing themselves to disagree. Actually they would simply be directed by different algorithms.

You cannot have a disagreement about which algorithm should direct your actions, without first having the same meaning of should—and no matter how you try to phrase this in terms of “what ought to direct your actions” or “right actions” or “correct heaps of pebbles”, in the end you will be left with the empirical fact that it is possible to construct minds directed by any coherent utility function.

When a paperclip maximizer and a pencil maximizer do different things, they are not disagreeing about anything, they are just different optimization processes. You cannot detach should-ness from any specific criterion of should-ness and be left with a pure empty should-ness that the paperclip maximizer and pencil maximizer can be said to disagree about—unless you cover “disagreement” to include differences where two agents have nothing to say to each other.

But this would be an extreme position to take with respect to your fellow humans, and I recommend against doing so. Even a psychopath would still be in a common moral reference frame with you, if, fully informed, they would decide to take a pill that would make them non-psychopaths. If you told me that my ability to care about other people was neurologically damaged, and you offered me a pill to fix it, I would take it. Now, perhaps some psychopaths would not be persuadable in-principle to take the pill that would, by our standards, “fix” them. But I note the possibility to emphasize what an extreme statement it is to say of someone:

“We have nothing to argue about, we are only different optimization processes.”

That should be reserved for paperclip maximizers, not used against humans whose arguments you don’t like.

Part of The Metaethics Sequence

Next post: “Abstracted Idealized Dynamics”

Previous post: “Sorting Pebbles Into Correct Heaps”

What links here?

Eliezer YudkowskyAug 10, 2008, 11:32 PM

26 points

133 comments6 min readLW link Archive

Metaethics

Nick_Tarleton Aug 10, 2008, 11:59 PM
5 points

Far more extreme, I would think, to say that zero out of 6.5 billion humans are stable psychopaths.
- benelliott Jun 9, 2011, 1:16 PM
  3 points
  Parent
  
  I believe that zero out of 6.5 billion humans are unicorns.
  
  The big number does not create an implausibility all on its own, there may be stable psychopaths, but I wouldn’t be very surprised if there weren’t.
  - Arandur Aug 12, 2011, 12:22 PM
    6 points
    Parent
    
    Far less likely to suppose that 1 out of 6.5 billion humans is a stable psychopath.
steven Aug 11, 2008, 12:22 AM
3 points

I wonder if the distinction between 1) something implementing the same dynamic as a typical human but mistaken about what it says and 2) something implementing a completely different dynamic and not mistaken about anything, is the same as the distinction people normally make between 1) immoral and 2) amoral.
Jadagul Aug 11, 2008, 1:14 AM
8 points
0

Steven: quite possibly related. I don’t think they’re exactly the same (the classic comic book/high fantasy “I’m evil and I know it” villain fits A2, but I’d describe him as amoral), but it’s an interesting parallel.

Eliezer: I’m coming more and more to the conclusion that our main area of disagreement is our willingness to believe that someone who disagrees with us really “embodies a different optimization process.” There are infinitely many self-consistent belief systems and infinitely many internally consistent optimization processes; while I believe mine to be the best I’ve found, I remain aware that if I held any of the others I would believe exactly the same thing. And that I would have no way of convincing the anti-Occam intelligence that Occam’s Razor was a good heuristic, or of convincing the psychopath who really doesn’t care about other people that he ‘ought’ to. So I hesitate to say that I’m right in any objective sense, since I’m not sure exactly what standard I’m pointing to when I say ‘objective.’

And I’ve had extended moral conversations with a few different people that led to us, eventually, concluding that our premises were so radically different that we really couldn’t have a sensible moral conversation. (to wit: I think my highest goal in life is to make myself happy. Because I’m not a sociopath making myself happy tends to involve having friends and making them happy. But the ultimate goal is me. Makes it hard to talk to someone who actually believes in some form of altruism).
TGGP4 Aug 11, 2008, 1:48 AM
0 points

If a person’s morality is not defined as what they believe about morals, I don’t know how it can be considered to meaningfully entail any propositions at all. A General AI should be able to convince it just about anything, right?
Eliezer Yudkowsky Aug 11, 2008, 2:02 AM
6 points

Far more extreme, I would think, to say that zero out of 6.5 billion humans are stable psychopaths.

Heck, what about babies? What do they want, and would they be complicated enough to want anything different if they knew more and thought faster?

There are infinitely many self-consistent belief systems and infinitely many internally consistent optimization processes; while I believe mine to be the best I’ve found, I remain aware that if I held any of the others I would believe exactly the same thing.

You would not believe exactly the same thing. If you held one of the others, you would believe that your new system was frooter than any of the others, where “frooter” is not at all the same thing as “better”. And you would be correct.

If a person’s morality is not defined as what they believe about morals, I don’t know how it can be considered to meaningfully entail any propositions at all. A General AI should be able to convince it just about anything, right?

If you make matters that complicated to begin with, i.e., we’re not discussing metaethics for human usage anymore, then you should construe entailment / extrapolation / unfolding in more robust ways than “anything a superintelligence can convince you of”. E.g. CEV describes a form of entailment.

As for what a person’s morality is, surely you extrapolate it at least a little beyond their instantaneous beliefs. Would you agree that many people would morally disapprove of being shot by you, even if the actual thought has never crossed their mind and they don’t know you exist?
conchis Aug 11, 2008, 2:06 AM
0 points

I must be starting to get it. That unpacked in exactly the way I expected.

On the other hand, this:

If a person’s morality is not defined as what they believe about morals, I don’t know how it can be considered to meaningfully entail any propositions at all.

makes no sense to me at all.
Sebastian_Hagen2 Aug 11, 2008, 2:24 AM
0 points

I think my highest goal in life is to make myself happy. Because I’m not a sociopath making myself happy tends to involve having friends and making them happy. But the ultimate goal is me.
If you had a chance to take a pill which would cause you to stop caring about your friends by permanently maxing out that part of your hapiness function regardless of whether you had any friends, would you take it?
Do non-psychopaths that given the chance would self-modify into psychopaths fall into the same moral reference frame as stable psychopaths?
jsalvatier Aug 11, 2008, 2:54 AM
1 point

Sebastian Hagen:

My intuition is that a good deal of people would take the psychopath pill. At least if the social consequences were minimal, which is besides the point.
Hopefully_Anonymous Aug 11, 2008, 3:16 AM
2 points

Frame it defensively rather than offensively and a whole heck of a lot of people would take that pill. Of course some of us would also take the pill that negates the effects of our friends taking the first pill, hehehe.
Hopefully_Anonymous Aug 11, 2008, 3:17 AM
0 points

Weird, jsalvati is not my sock puppet, but the 11:16pm post above is mine.
Eliezer Yudkowsky Aug 11, 2008, 3:21 AM
0 points

Fixed. How very odd. (And for the record, jsalvati’s IP is not HA’s.)
Richard4 Aug 11, 2008, 3:32 AM
2 points

“Perhaps Richard means that we could suppose that abortion is indeed prohibited by morality_Bob...”

That’s right. (I didn’t mean to suggest that there’s never any disputing what someone’s moral commitments are; just that this wasn’t supposed to be in dispute in the particular case I was imagining.) I take it that Sally and Bob could disagree even so, and not merely be talking past each other, even if one or both of them was impervious to rational argument. It is at least a significant cost of your theory that it denies this datum. (It doesn’t have to be ‘irrefutable’ to nonetheless be a hefty bullet to bite!)

I like your account of everyday moral disagreement, but would take it a step further: it is no mere accident that your Sally and Bob expect to be able to persuade the other. Rather, it is essential to the concept of morality that it involves shared standards common to all fully reasonable agents.

It’s worth emphasizing, though, that humans are not fully reasonable. Some are even irrevocably unreasonable, incapable of rationally updating (some of) their beliefs. So while, I claim, we all aspire to the morality_Objective norms, I doubt there’s any empirically specifiable procedure that could ensure our explicit affirmation of those norms (let alone their unfolded implications). Bob may stubbornly insist that abortion is wrong, and this may conflict with other claims he makes, but there’s simply no way (short of brain surgery) to shake him from his illogic. What then? I say he’s mistaken, even though he can’t be brought to recognize this himself. It’s not clear to me whether you can say this, since I’m not sure exactly what your ‘extrapolation’ procedure for defining morality_Bob is. But if it’s based on any simple empirical facts about what Bob would believe if we told him various facts and arguments, then it doesn’t look like you’ll be able to correct for the moral errors that result from sheer irrationality, or imperviousness to argument.

Could you say a little more about exactly which empirical facts serve to define morality_Bob?
Mike_Blume Aug 11, 2008, 4:25 AM
2 points

I do not eat steak, because I am uncertain of what my own morality outputs with respect to steak-eating. It seems reasonable to me to imagine that cows are capable of experiencing pain, of fearing death. Of being, and ceasing to be. If you are like the majority of human beings, you do eat steak. The propositions I have suggested do not seem reasonable to you.

Do you imagine that there are facts about the brains of cattle which we could both learn—facts drawn from fMRI scans, or from behavioral science experiments, perhaps—which would bring us into agreement on the issue?
- Unknowns Feb 1, 2010, 8:15 AM
  3 points
  Parent
  
  No. I agree with those facts and think that eating steak is a good idea anyway.
  - Eliezer Yudkowsky Feb 1, 2010, 8:33 AM
    0 points
    Parent
    
    I’m curious: Would you eat Yoda?
    
    BTW, if you’re the original Unknown, please see Open Thread for a query about our outstanding bet.
    - Unknowns Feb 1, 2010, 10:05 AM
      2 points
      Parent
      
      By the way, is there any way to get back my original username? I didn’t know how to do it when things were moved over from OB.
    - Unknowns Feb 1, 2010, 9:58 AM
      0 points
      Parent
      
      I would eat sentient creatures but not intelligent creatures. And even in the case of sentient non-intelligent ones it depends on what I’m used to.
Jadagul Aug 11, 2008, 6:22 AM
0 points

Eliezer: for ‘better’ vs ‘frooter,’ of course you’re right. I just would have phrased it differently; I’ve been known to claim that the word ‘better’ is completely meaningless unless you (are able to) follow it with “better at or for something.” So of course, Jadagul_real would say that his worldview is better for fulfilling his values. And Jadagul_hypothetical would say that his worldview is better for achieving his values. And both would (potentially) be correct. (or potentially wrong. I never claimed to be infallible, either in reality or in hypothesis). But phrasing issues aside, I do believe that I think this happens more often than you think it happens.

Sebastian Hagen: That’s actually a very good question. So a few answers. First is that I tend to go back and forth on whether by ‘happiness’ I mean something akin to “net stimulation of pleasure centers in brain,” or to “achievement of total package of values” (at which point the statement nears tautology, but I think doesn’t actually fall into it). But my moral code does include such statements as “you have no fundamental obligation to help other people.” I help people because I like to. So I lean towards formulation 1; but I’m not altogether certain that’s what I really mean.

Second is that your question, about the sociopath pill, is genuinely difficult for me. It reminds me of Nozick’s experience machine thought experiment. But I know that I keep getting short-circuited by statements like, “but I’d be miserable if I were a sociopath,” which is of course false by hypothesis. I think my final answer is that I’m such a social person and take such pleasure in people that were I to become a sociopath I would necessarily be someone else. That person wouldn’t be me. And while I care about whether I’m happy, I don’t know that I care about whether he is.

Of course, this all could be “I know the answer and now let me justify it.” On the other hand, the point of the exercise is to figure out what my moral intuitions are...
Tim_Tyler Aug 11, 2008, 6:50 AM
0 points

Re: “We have nothing to argue about, we are only different optimization processes.” That seems to apply in the case when a man wants to rescue copies of his genes from eternal oblivion—by convincing his mate not to abort his prospective offspring. Of course, not many would actually say that under those circumstances.
Virge2 Aug 11, 2008, 6:51 AM
2 points

Eliezer: “But this would be an extreme position to take with respect to your fellow humans, and I recommend against doing so. Even a psychopath would still be in a common moral reference frame with you, if, fully informed, they would decide to take a pill that would make them non-psychopaths. If you told me that my ability to care about other people was neurologically damaged, and you offered me a pill to fix it, I would take it.”

How sure are you that most human moral disagreements are attributable to
- lack of veridical information, or
- lack of ability/tools to work through that information, or
- defects? You talk freely about psychopaths and non-psychopaths as though these were distinct categories of non-defective and defective humans. I know you know this is not so. The arguments about psychological unity of humankind only extend so far. e.g., would you be prepared to tell a homosexual that, if they were fully informed, they would decide to take a pill to change their orientation? There are demonstrable differences where our fellow humans really are “different optimization processes”. Why should we ignore the spread of differences in moral computations?
I’ve been enjoying your OB posts and your thought experiments are powerful, but I’m curious as to the empirical data that have led you to update your beliefs so strongly in favour of psychological unity and so strongly against differences in computation. Your arguments that mention psychopaths smack a little of a “no true Scotsman” definition of human morality.
John_T._Kennedy Aug 11, 2008, 7:02 AM
0 points

Mike,

“I do not eat steak, because I am uncertain of what my own morality outputs with respect to steak-eating. It seems reasonable to me to imagine that cows are capable of experiencing pain, of fearing death. Of being, and ceasing to be. If you are like the majority of human beings, you do eat steak. The propositions I have suggested do not seem reasonable to you.”

Accepting your propositions for the sake of argument, I still find that eating steak seems reasonable.
Carl_Shulman Aug 11, 2008, 7:06 AM
2 points

“Rather, it is essential to the concept of morality that it involves shared standards common to all fully reasonable agents.”

Richard,

If you’re going to define ‘fully reasonable’ to mean sharing your moral axioms, so that a superintelligent pencil maximizer with superhuman understanding of human ethics and philosophy is not a ‘reasonable agent,’ doesn’t this just shift the problem a level? Your morality_objectivenorms is only common to all agents with full reasonableness_RichardChappell, and you don’t seem to have any compelling reason for the latter (somewhat gerrymandered) account of reasonableness save that it’s yours/your culture’s/your species.′
Tim_Tyler Aug 11, 2008, 7:10 AM
0 points

Other moral issues where there are a gender differences include: “should prostitution be legalised” and “should there be tighter regulation of pornography”.

Again, it seems that part of the effect is due to people’s idea of what is right being influenced by their own personal role—i.e. the “different optimization processes” effect.

Gender is the most obvious source of such issues, but I’m sure you can find them in other areas of life. Race politics, for instance.
TGGP4 Aug 11, 2008, 7:35 AM
1 point

People in general do not want to be shot. The person doing the shooting, the lethal weapon being fired, the location in which the shooting occurs and the time of day are all pretty much irrelevant. You can ask people if they want to be shot and they’ll say no, without even specifying those details. That seems a very different case from Bob, who is considering a moral proposition and outright rejecting it.
Simon_M Aug 11, 2008, 7:43 AM
0 points

Given all of Bacon’s idols of the Mind can you ever know definitely if there is an error in your own reasoning, let alone the other persons?

You cannot rely on your moral intuition, nor the cultural norms of your time, nor academic authority, nor your internal reasoning or ability to know the soundness of your argument.

Socialization, severe biases, faulty reasoning can all make you think you are ‘correct’, but can leave you with the incorrect impression of the ‘correctness’ of your thinking. & even if presented with all the correct or relevant information some people still make these errors, so if they can so could you.
Roko Aug 11, 2008, 10:37 AM
1 point
0

Eliezer: “When a paperclip maximizer and a pencil maximizer do different things, they are not disagreeing about anything, they are just different optimization processes. You cannot detach should-ness from any specific criterion of should-ness and be left with a pure empty should-ness that the paperclip maximizer and pencil maximizer can be said to disagree about—unless you cover “disagreement” to include differences where two agents have nothing to say to each other.

But this would be an extreme position to take with respect to your fellow humans, and I recommend against doing so. Even a psychopath would still be in a common moral reference frame with you, if, fully informed, they would decide to take a pill that would make them non-psychopaths. If you told me that my ability to care about other people was neurologically damaged, and you offered me a pill to fix it, I would take it.”
- no, you wouldn’t. The only reason that you are now saying that you would take it is that you currently have the ability to care about other people. Surely this is obvious? Eliezer, you are ignoring your own advice and summoning a “ghost of perfect human unity” into every human mind, even that of a psychopath. Your ability to want to make yourself to care about other people only comes because you already care about other people.
I find your position somewhat outlandish: that every single moral disagreement between humans is simply there because the humans involved in it aren’t fully informed, but that there is no “information” that I (or perhaps someone vastly more intelligent than me) could give to a paperclip maximizer that would persuade them that their ability to love human babies is damaged and they need to take a pill to fix it. (Hat tip to TGGP here) And this coming from the inventor of the AI-box thought experiment!

Let me spell it out. Every human mind comes with an evolved set of “yuck” factors (and their opposite, which I might call “yum” factors?). This is the “psychological unity of humankind”. Unfortunately, these cover only those situations which we were likely to run into in our EEA. Abortion probably did not exist in our EEA: so people have to compare it to something that did. There are two ways to do this—either you think of it as being just like helping a fellow member of your tribe, and become pro-abortion, or you think of it as being infanticide and become anti abortion. Beyond these “yuck” factors, there is no further unity to the moral views of humankind.

In the modern world, people have to make moral choices using their general intelligence, because there aren’t enough “yuck” and “yum” factors around to give guidance on every question. As such, we shouldn’t expect much more moral agreement from humans than from rational (or approximately rational) AIs.

But in my opinion, there is a certain “psychological unity of approximately rational minds”, and I think Eliezer is being irrational to deny this. Why he is overselling the extent to which humans agree, in flagrant contradiction with the facts, and underselling the extent to which all rational agents share common instrumental values, I do not know.
Roko Aug 11, 2008, 10:47 AM
0 points
0

I should qualify this statement: “As such, we shouldn’t expect much more moral agreement from humans than from rational (or approximately rational) AIs.”

to instead read:

“As such, on ethical questions that had no precedent in our EEA, we shouldn’t expect much more moral agreement from humans than from rational (or approximately rational) AIs, apart, of course, from the fact that most humans share a common set of cognitive biases”

one can see that this is true by looking at the vast disagreements between different moral philosophers consequentialists vs. deontological ethicists, or atheists vs. christians vs. muslims, or libertarians vs. liberals vs. communists.
Will_Pearson Aug 11, 2008, 11:56 AM
0 points

You have been priming people to think in terms of functions. (Pure) Functions do not change. They map an input to an output, and can be substituted by a list, e.g. a function that tests for primality can can be performed by an (infinitely) long list.

You may want to describe impure functions for people without a functional programming background if you want to keep terminology like morality_john().
Schizo Aug 11, 2008, 12:27 PM
0 points

Virge’s point seems particularly important, and hopefully Eliezer can address it.
Zubon Aug 11, 2008, 1:27 PM
0 points

I find Roko on-point. The psychological unity of humankind is important, but it can be over-stated. While human beings may occupy a very small area in the space of all possible minds, it is still an area and not a single point. When we shut up and multiply by sufficiently large numbers, very small differences in the starting point are very meaningful. If we are talking about a difference valued at 0.0000001% of a human life, and you extrapolate it over a billion lives, we are talking about life and death matters. Successful AI will affect more than a billion lives.

If you are the advocate of 3^^^3 specks, you must take seriously the notion that 0.0000001% differences in morality will be meaningful when multiplied over all minds into the indefinite future.
Eliezer Yudkowsky Aug 11, 2008, 1:39 PM
5 points

Virge:

How sure are you that most human moral disagreements are attributable to - lack of veridical information, or - lack of ability/tools to work through that information, or - defects? You talk freely about psychopaths and non-psychopaths as though these were distinct categories of non-defective and defective humans. I know you know this is not so. The arguments about psychological unity of humankind only extend so far. e.g., would you be prepared to tell a homosexual that, if they were fully informed, they would decide to take a pill to change their orientation?

I don’t claim to be sure at all. It does seem to me that most modern humans are very much creatures of their own time; they don’t consider that future moral progress could be as radical as the change between their own time and Archimedes’s; they think of themselves as the wise, the educated, the modern, not as the savage barbarian children the Future will very likely regard us as. It also seems to me that people fail to systematically distinguish between terminal and instrumental disputes, and that they demonize their enemies (which is correspondence bias). The basic ev-bio necessity behind the psychological unity of human brains is not widely understood.

And even more importantly, the portion of our values that we regard as transpersonal, the portion we would intervene to enforce against others, is not all of our values; it’s not going to include a taste for pepperoni pizza, or in my case, it’s not going to include a notion of heterosexuality or homosexuality.

If there are distinct categories of human transpersonal values, I would expect them to look like “male and female babies”, “male children”, “male adults”, “female children”, “female adults”, “neurological damage 1″, “neurological damage 2”, not “Muslims vs. Christians!”

Roko:

If you told me that my ability to care about other people was neurologically damaged, and you offered me a pill to fix it, I would take it. - no, you wouldn’t. The only reason that you are now saying that you would take it is that you currently have the ability to care about other people.

I said “damaged” not “missing”. The notion is that I am my current self, but one day you inform me that, relative to other humans, my ability to care about others is damaged. Do I want a pill to fix the damage, even though it will change my values? Yes, because I value humanity and want to stay with humanity; I don’t want to be off in some lonely unoccupied volume of mindspace. This is one of the arguments that moves me.

(Albeit if the damage was in any way entwined with my ability to do Singularity work, I would delay taking the pill.)

Let me spell it out. Every human mind comes with an evolved set of “yuck” factors (and their opposite, which I might call “yum” factors?). This is the “psychological unity of humankind”. Unfortunately, these cover only those situations which we were likely to run into in our EEA. Abortion probably did not exist in our EEA: so people have to compare it to something that did. There are two ways to do this—either you think of it as being just like helping a fellow member of your tribe, and become pro-abortion, or you think of it as being infanticide and become anti abortion. Beyond these “yuck” factors, there is no further unity to the moral views of humankind.

The question is how much of this disagreement would persist if the disputants had full veridical knowledge of everything that goes on inside the developing fetus and full veridical knowledge of a naturalistic universe. Given the same “yum” and “yuck” factors, why would they be differently targeted? If there are no interpersonally compelling reasons to target them one way or the other, how would a group of fully informed minds come to believe that this was a proper issue of transpersonal morality?

By the way, quite a number of hunter-gatherer tribes practice infanticide as a form of birth control.

Zubon:

If we are talking about a difference valued at 0.0000001% of a human life, and you extrapolate it over a billion lives, we are talking about life and death matters. Successful AI will affect more than a billion lives.

If I have a value judgment that would not be interpersonally compelling to a supermajority of humankind even if they were fully informed, then it is proper for me to personally fight for and advocate that value judgment, but not proper for me to preemptively build an AI that enforces that value judgment upon the rest of humanity. The notion of CEV reflects this statement of morality I have just made.
What links here?
- A1987dM's comment on The Meaning of Right by Eliezer Yudkowsky (Nov 26, 2013, 1:25 PM; 2 points)
steven Aug 11, 2008, 2:04 PM
0 points

I must be missing something—why would you advocate something that you know you can’t justify to anyone else?
Zubon Aug 11, 2008, 2:09 PM
7 points

I said “damaged” not “missing”. The notion is that I am my current self, but one day you inform me that, relative to other humans, my ability to care about others is damaged. Do I want a pill to fix the damage, even though it will change my values? Yes, because I value humanity and want to stay with humanity; I don’t want to be off in some lonely unoccupied volume of mindspace. This is one of the arguments that moves me.

Does that work in the other direction? The notion is that you are your current self, but one day I inform you that, relative to other humans, your ability to care about others is damaged in the sense that it is hyperactive. Do you want a pill that will fix the damage and change your values so that you value humanity less (putting you in a less lonely volume of mindspace)?

(And if you do not think that there is a meaningful chance that you should value humanity less, doesn’t that mean you are most likely undervaluing humanity right now, so you should update and start valuing humanity more? This is a general you, for everyone: if you do not think you should be a bit more sociopathic, what are the odds you have exactly the right amounts of empathy and altruism?)
Sebastian_Hagen2 Aug 11, 2008, 3:04 PM
0 points

Jadagul:

But my moral code does include such statements as “you have no fundamental obligation to help other people.” I help people because I like to.
While I consider myself an altruist in principle (I have serious akrasia problems in practice), I do agree with this statement. Altruists don’t have any obligation to help people, it just often makes sense for them to do so; sometimes it doesn’t, and then the proper thing for them is not to do it.

Roko:

In the modern world, people have to make moral choices using their general intelligence, because there aren’t enough “yuck” and “yum” factors around to give guidance on every question. As such, we shouldn’t expect much more moral agreement from humans than from rational (or approximately rational) AIs.
There might not be enough “yuck” and “yum” factors around to offer direct guidance on every question, but they’re still the basis for abstract rational reasoning. Do you think “paperclip optimizer”-type AIs are impossible? If so, why? There’s nothing incoherent about a “maximize the number of paperclips over time” optimization criterion; if anything, it’s a lot simpler than those in use by humans.

Eliezer Yudkowsky:

If I have a value judgment that would not be interpersonally compelling to a supermajority of humankind even if they were fully informed, then it is proper for me to personally fight for and advocate that value judgment, but not proper for me to preemptively build an AI that enforces that value judgment upon the rest of humanity.
I don’t understand this at all. How is building a superintelligent AI not just a (highly effective, if you do it right) special method of personally fighting for your value judgement? Are you saying it’s ok to fight for it, as long as you don’t do it too effectively?
Sebastian_Hagen2 Aug 11, 2008, 3:06 PM
0 points

Quick correction: s/abstract rational reasoning/abstract moral reasoning/
Larry_D'Anna Aug 11, 2008, 5:04 PM
0 points

Will Pearson: Why not just treat them as pure functions in the State monad?
Tim_Tyler Aug 11, 2008, 5:40 PM
1 point

Re: If there are distinct categories of human transpersonal values, I would expect them to look like “male and female babies”, “male children”, “male adults”, “female children”, “female adults”, “neurological damage 1″, “neurological damage 2”, not “Muslims vs. Christians!”

That seems like the position you would get if you thought that cultural evolution could not affect people’s values.
Richard4 Aug 11, 2008, 6:46 PM
−1 points

Carl—“If you’re going to define ‘fully reasonable’ to mean sharing your moral axioms, so that a superintelligent pencil maximizer with superhuman understanding of human ethics and philosophy is not a ‘reasonable agent,’ doesn’t this just shift the problem a level? Your morality_objectivenorms is only common to all agents with full reasonableness_RichardChappell, and you don’t seem to have any compelling reason for the latter (somewhat gerrymandered) account of reasonableness save that it’s yours/your culture’s/your species.’”

I don’t mean to define ‘fully reasonable’ at all (though it is meant to be minimally ad hoc or gerrymandered). I take this normative notion as a conceptual primitive, and then hypothesize that it entails a certain set of moral norms. They’re probably not even my norms (in any way Eliezer could accommodate), since I’m presumably not fully reasonable myself. But they’re what I’m trying to aim for, even if I don’t always grasp them correctly.

This may sound mysterious and troublingly ungrounded to you. Yet you use terms like ‘superintelligent’ and ‘superhuman understanding’, which are no less normative than my ‘reasonable’. I think that reasonableness is a component of intelligence and (certainly) understanding, so I don’t see how these terms could properly apply to a pencil maximizer. Maybe you simply mean that it is a pencil maximizer that is instrumentally rational and perfectly proficient at Bayesian updating. But that’s not to say it’s intelligent. It might, for example, be a counterinductivist (didn’t someone mention anti-Occamists up-thread?), with completely wacky priors. I take it as a datum that this is simply unreasonable—there are other norms, besides conditionalization and instrumental rationality, which govern ‘intelligent’ or good thinking.

So I say there are brute, unanalysable facts about what’s reasonable. The buck’s gotta stop somewhere. I don’t see that any alternative theory does better than this one.
What links here?
- Wei Dai's comment on What is Eliezer Yudkowsky’s meta-ethical theory? by lukeprog (Feb 3, 2011, 7:12 AM; 4 points)
Z._M._Davis Aug 11, 2008, 6:59 PM
1 point

“If there are distinct categories of human transpersonal values, I would expect them to look like [...] ‘male adults’, [...] ‘female adults’, [...] not ‘Muslims vs. Christians!’”

Really? In the ways that are truly important, don’t you think you have more in common with Natasha Vita-More than Osama bin Laden?
steven Aug 11, 2008, 7:37 PM
0 points

ZM, he means after volition-extrapolating.
Z._M._Davis Aug 11, 2008, 7:59 PM
2 points

Steven, even so, I think the basic question stands. Why should cultural differences and within-sex individual differences wash out of the CEV?
steven Aug 11, 2008, 8:18 PM
0 points

Supposedly genetics allows for people of different ages or sexes to have different mental machinery, whereas individual genetic differences just represent low-complexity differences in tweaking. I’m not sure why that makes Eliezer’s point though, if the aforementioned differences in tweaking mean different complex machinery gets activated. Cultural differences I’d expect to wash out just through people learning about different cultures that they could have grown up in.
Roko Aug 11, 2008, 8:35 PM
0 points

Zubon: “if you do not think you should be a bit more sociopathic, what are the odds you have exactly the right amounts of empathy and altruism?”
- that was exactly what I was going to say ; - ) My personal “gut” feeling is that have exactly the right amount of empathy, no more, no less. Intellectually, I realize that this is means I have failed the reversal test, and that I am therefore probably suffering from status quo bias. (assuming, as I believe, that there is an objectively true ethics. If you’re a moral relativist, you get a blank cheque for status quo bias in your ethical views)
Sebastian Hagen: “There might not be enough “yuck” and “yum” factors around to offer direct guidance on every question, but they’re still the basis for abstract rational reasoning. Do you think “paperclip optimizer”-type AIs are impossible? If so, why? There’s nothing incoherent about a “maximize the number of paperclips over time” optimization criterion; if anything, it’s a lot simpler than those in use by humans.
- yes, that’s true. As soon as I made that post I realized I had gone too far; let me restate a more reasonable position: any moral agreement or psychological unity that humankind has over and above the unity that all approximately rational minds have will stem from our evolved “yuck” and “yum” factors in our EEA.
As for the “rational economic agents” like the paperclip maximizer: yes, they do exist, but they, together with a wide class of rational agents (i.e. ones that are not utility maximizers) also have a certain axiological unity called universal instrumental values. I’m more interested in this “unity of agent kind”.

One could summarize by saying:

Axiological agreement = (unity of agent kind) + (yuck and yum)

Personally I find it hard to really identify with my genes’ desires. Perhaps one could consider this a character flaw, but I just cannot honestly make myself identify with these arbitrary choices.
Z._M._Davis Aug 11, 2008, 8:39 PM
3 points

Steven: “Cultural differences I’d expect to wash out just through people learning about different cultures that they could have grown up in.”

I suspect a category error here hinging around personal identity. We say “if I had grown up in a different culture …” when I think we mean “if the baby that grew into me had grown up in a different culture...” If the baby that grew into me had grown up in a radically different culture, I don’t think ve’d be me in any meaningful sense, although of course there would be many similarities due to evo-psych and genetics. Whereas I would likely consider a hypothetical member of the opposite sex who shared my ideas and concerns to be me, even if ve was much better or worse than this me at mentally rotating 3D objects, or finding things in the refrigerator.
Tim_Tyler Aug 11, 2008, 10:37 PM
0 points

Re: Roko’s “more reasonable position”:

Human psychological unity, to the extent that it exists, includes brain speed, brain degree-of-parallelism, brain storage capacity, brain reliability—and other limitations that have little to do with human goals.
Bernard_Guerrero2 Aug 11, 2008, 10:41 PM
0 points

If you had a chance to take a pill which would cause you to stop caring about your friends by permanently maxing out that part of your hapiness function regardless of whether you had any friends, would you take it?

I’m not sure this proves anything. I’ll take Jadagul’s description as my own. I’m maximizing my happiness, but my non-sociopath status means that I like having friends/loved ones, so maximizing my own happiness entails caring about them to some degree or another. Under my existing moral code I wouldn’t take the pill, but that’s because it will in the future entail behavior that I would currently, as I am, find obnoxious.

In theory, taking the pill will allow me to be happier by obviating the need to worry about anybody else (along with the resources that entails.) But you can’t get there from here. I already know what the pill will make me later, and the prospect of taking it makes me unhappy now. Taking the pill would be a viable option only if the behavior the pill would induce could somehow be made acceptable to current, non-sociopath me.
Doug_S.Aug 12, 2008, 12:32 AM
3 points

I, too, wonder if the “psychological unity of humankind” has been a bit overstated. All [insert brand and model here] computers have identical hardware, but you can install different software on them. We’re running different

Consider the case of a “something maximizer”. It’s given an object, and then maximizes the number of copies of that object.

You give one something maximizer a paperclip, and it becomes a paperclip maximizer. You give another a pencil, and that one becomes a pencil maximizer.

There’s no particular reason to expect the paperclip maximizer and the pencil maximizer to agree on values, even though they are both something maximizers, implemented on identical hardware. The extrapolated volition of a something-maximizer is not well-defined.

There are probably blank spaces in human Eliezer::morality that are written on by experience, and such writing may well be irrevocable, or mostly irrevocable. If humans’ terminal values are, in fact, contingent on experiences, then you could have two people disagreeing on the same level that the two something maximizers do.

As a practical matter, people generally do seem to have similar sets of terminal values, but different people rank the values differently. Consider the case of “honor”. Is it better to die with honor or to live on in disgrace? People raised in different cultures will give different answers to this question.
Virge2 Aug 12, 2008, 4:38 AM
5 points

Eliezer: “The basic ev-bio necessity behind the psychological unity of human brains is not widely understood.”

I agree. And I think you’ve over-emphasized the unity and ignored evidence of diversity, explaining it away as defects.

Eliezer: “And even more importantly, the portion of our values that we regard as transpersonal, the portion we would intervene to enforce against others, is not all of our values; it’s not going to include a taste for pepperoni pizza, or in my case, it’s not going to include a notion of heterosexuality or homosexuality.”

I think I failed to make my point clearly on the idea of a sexual orientation pill. I didn’t want to present homosexuality as a moral issue for your judgment, but as an example of the psychological non-unity of human brains. Many people have made the mistake of assuming that heterosexuality is “normal” and that homosexuals need re-education or “fixing” (defect removal). I hold that sexuality is distributed over a spectrum. The modes of that distribution represent heterosexual males and females—an evolutionary stable pattern. The other regions of that distribution remain non-zero despite selection pressure.

Clearly we do not have a psychological unity of human sexual preference. People with various levels of homosexual preference are not merely defective or badly educated/informed.

Sexuality is a complex feature arising in a complex brain. Because of the required mutual compatability of brain-construction genetics, we can be sure that we all have extremely similar machinery, but our sexual dimorphism requires that a single set of genes can code flexibly for either male or female. Since that flexibility doesn’t implement a pure binary male/female switch, we find various in-between states in both physical and mental machinery. The selection pressure from sexual dimorphism means we should expect far more non-unity of sexual preference than in other areas of our psychology.

But the fact that our genes can code for that level of flexibility, yet still remain biologically compatible, tells us that there are likely to be many other computations that could exhibit a superficial unity, but with a broad spread. The spread of propensities to submit to authority and override empathy observed in the Stanford Prison experiment gives good reason to question the supposed unity. (Yes I know you could shoehorn that diversity into your model as a effect of lack of moral training.)

Now let’s reconsider psychopathy or the broader diagnosis of antisocial personality disorder. What should we do with those humans whose combination of narcissism and poor social cognition is beyond some particular limit? Lock them up, or elect them to govern? From my limited understanding of the subject, it seems that the “condition” is considered untreatable.

It’s an easy path to stick to the psychological unity of humans and declare those in the tails of the distribution to be defective. But is it statistically justified? Does your unity model actually fit the data or just give a better model than the tabula rasa model that Tooby and Cosmides reacted against?

That’s why the idea of some idealised human moral computation, that everyone would agree to if they knew enough and weren’t defective, seems like question begging. That’s why I was asking for the empirical data that have led you to update your beliefs. Then I could update mine from them and maybe we’d be in agreement.

I’m open to the idea that we can identify some best-fit human morality—a compromise that minimizes the distance/disagreement to our population of (veridically informed) humans. That seems to me to be the best we can do.
roko3 Aug 12, 2008, 12:25 PM
0 points

virge makes a very good point here. The human mind is probably rather flexible in terms of it’s ethical views; I suspect that Eli is overplaying our psychological unity.
Eliezer Yudkowsky Aug 12, 2008, 12:37 PM
5 points

I think that views are being attributed to me that I do not possess—perhaps on the Gricean notion that if someone attacks me for holding these views, I ought to hold them; but on a blog like this one, that leaves you prey to everyone else’s misunderstanding.

I do not assert that all humans end up in the same moral frame of reference (with regard to any particular extrapolation method). I do think that psychological unity is typically underestimated, and I have a hard time taking modern culture at face value (we’re the ancient Greeks, guys, not a finished product) - read Tooby and Cosmides’s “The Psychological Foundations of Culture” to get a picture of where I’m coming from.

But if you read “Coherent Extrapolated Volition” you’ll see that it’s specifically designed to handle, among other problems, the problem of, “What if we don’t all want the same thing?” What then can an AI programmer do that does not constitute being a jerk? That was my attempt to answer.

Michael Vassar has tried to convince me that I should be more worried about a majority of humanity being “selfish bastards” as defined in CEV. I have a couple of thoughts on this problem, may post on them later.
steven Aug 12, 2008, 1:18 PM
0 points

“Being a jerk” here means “being a jerk according to other people’s notion of morality, but not according to my own notion of morality”, right?

I sort of take offense to “we’re the ancient Greeks”; I make sure to disagree with Western morality whenever it’s wrong, and I have no reason to believe the resulting distribution of errors is biased toward agreement with Western morality. If you meant to say “most of them are the ancient Greeks”, then sure.
steven Aug 12, 2008, 1:22 PM
0 points

On second thought I suppose it could mean “being a jerk according to ‘ethics’”, where “ethics” is conceived not as something intrinsically moral but as a practical way for agents with different moralities to coordinate on a mutually acceptable solution.
Caledonian2 Aug 12, 2008, 2:26 PM
5 points

The notion is that I am my current self, but one day you inform me that, relative to other humans, my ability to care about others is damaged. Do I want a pill to fix the damage, even though it will change my values? Yes, because I value humanity and want to stay with humanity; I don’t want to be off in some lonely unoccupied volume of mindspace. This is one of the arguments that moves me.

Eliezer, relative to other humans, your ability to believe in a personal creative deity is damaged.

Do you want a pill to help you be religious?
Larry_D'Anna Aug 12, 2008, 2:45 PM
1 point

Virge: The argument for psychological unity is that, as a sexually reproducing species, it is almost impossible for one gene to rise in relative frequency if the genes it depends on are not already nearly universal. So the all the diversity within any species at any given time consists of only one-step changes; no complex adaptations. The one exception of course is that males can have complex adaptations that females lack, and vice versa.

So, with respect to your specific examples:

Homosexuals: sexual preference certainly is a complex adaptation, but obviously one that differs between males and females. Homosexuals just got the wrong sexual preference for their equipment. And it doesn’t do any good to say that they aren’t defective. They aren’t defective from a human, moral point of view, but that’s not the point. From evolutions view, there’s hardly anything more defective, except perhaps a fox that voluntarily restrains it’s own breeding.

Stanford Prison Experiment, Psychopaths: I’m not sure if I see where the complex adaptation is here. Some people have more empathy, some less. Even if the difference is supposed to be genetic, there seem to be a lot of these flexible parameters in our genome. Empathy-level could be like skin-color, height, hairiness, etc. We all have the machinery to compute empathy (we all have the same complex adaptation), but it’s used more often, or carries more influence in some people and less in others. Those that totally lack empathy are like albinos. They have the genes that are supposed to code for empathy, but they’re broken.

Of course you are right that empirical data on this question is needed. But absent that, we have what looks like a strong theoretical argument for psycological unity.
Roko Aug 12, 2008, 3:47 PM
0 points

Eli: “I do not assert that all humans end up in the same moral frame of reference (with regard to any particular extrapolation method). I do think that psychological unity is typically underestimated,”

- right, thanks for the clarification.

“But if you read “Coherent Extrapolated Volition” you’ll see that it’s specifically designed to handle, among other problems, the problem of, “What if we don’t all want the same thing?” What then can an AI programmer do that does not constitute being a jerk? That was my attempt to answer.”

- this is where I think I disagree with you somewhat. Your notion of what constitutes a “jerk” thing to do differs from my notion of what constitutes a jerk thing to do. I would not extrapolate the volitions of people whose volitions I deem to be particularly dangerous, in fact I would probably only extrapolate the volition of a small subset (perhaps 1 thousand − 1 million) people whose outward philosophical stances on life were at least fairly similar to mine. I would consider it a jerk-like thing to extrapolate, with equal weight, the volitions of every human on the planet, including all the religious fundamentalists, etc. In any case, the problem for you is that if you and I disagree on exactly whose volitions should be taken into account in the CEV algorithm, how exactly do we settle our dispute? You can’t appeal to the psychological unity of humankind, for I am human and I disagree with you. You’re an antirealist, so there’s no objective fact of the matter either.

Or are you a closet realist? Is there some objective notion of “non-jerkness” that transcends the views of any particular human?

As a realist, I don’t have this problem. I think I have found objective criteria upon which to judge the question.
Larry_D'Anna Aug 12, 2008, 4:06 PM
0 points

Roko: I think Eliezer has explicitly stated that he is a realist.
Virge2 Aug 12, 2008, 4:08 PM
2 points

Larry D’Anna: “And it doesn’t do any good to say that they aren’t defective. They aren’t defective from a human, moral point of view, but that’s not the point. From evolutions view, there’s hardly anything more defective, except perhaps a fox that voluntarily restrains it’s own breeding.”

Why is it “not the point”? In this discussion we are talking about differences in moral computation as implemented within individual humans. That the blind idiot’s global optimization strategy defines homosexuality as a defect is of no relevance.

Larry D’Anna: “I’m not sure if I see where the complex adaptation is here. Some people have more empathy, some less. Even if the difference is supposed to be genetic, there seem to be a lot of these flexible parameters in our genome.”

I wasn’t claiming a complex adaptation. I was claiming “other computations that could exhibit a superficial unity, but with a broad spread.”

I think we are already in substantial agreement, and having seen Eliezer’s last comment, I see that much of what I’ve been rambling on about comes from reading more than was warranted into the last paragraphs of his blog entry.
Eliezer Yudkowsky Aug 12, 2008, 4:32 PM
5 points

Roko:

I would not extrapolate the volitions of people whose volitions I deem to be particularly dangerous, in fact I would probably only extrapolate the volition of a small subset (perhaps 1 thousand − 1 million) people whose outward philosophical stances on life were at least fairly similar to mine.

Then you are far too confident in your own wisdom. The overall FAI strategy has to be one that would have turned out okay if Archimedes of Syracuse had been able to build an FAI, because when you zoom out to the billion-year view, we may not be all that much wiser than they.

I’m sure that Archimedes of Syracuse thought that Syracuse had lots of incredibly important philosophical and cultural differences with the Romans who were attacking his city.

Had it fallen to Archimedes to build an AI, he might well have been tempted to believe that the whole fate of humanity would depend on whether the extrapolated volition of Syracuse or of Rome came to rule the world—due to all those incredibly important philosophical differences.

Without looking in Wikipedia, can you remember what any of those philosophical differences were?

And you are separated from Archimedes by nothing more than a handful of centuries.
What links here?
- Wei Dai's comment on Hacking the CEV for Fun and Profit by Wei Dai (Jun 4, 2010, 2:23 PM; 9 points)
- cousin_it Nov 18, 2010, 6:49 AM
  10 points
  Parent
  
  
  The overall FAI strategy has to be one that would have turned out okay if Archimedes of Syracuse had been able to build an FAI, because when you zoom out to the billion-year view, we may not be all that much wiser than they.
  
  “Wiser”? What’s that mean?
  
  Your comment makes me think that, as of 12 August 2008, you hadn’t yet completely given up on your dream of finding a One True Eternal Morality separate from the computation going on in our heads. Have you changed your opinion in the last two years?
  What links here?
  - Two questions about CEV that worry me by cousin_it (Dec 23, 2010, 3:58 PM; 37 points)
  - wedrifid Nov 18, 2010, 7:18 AM
    4 points
    Parent
    
    I like what Roko has to say here and find myself wary of Eliezer’s reply. He may have just been signalling naivety and an irrational level of egalitarianism so people are more likely to ‘let him out of the box’. Even so, taking this and the other statements EY has made on FAI behaviours (yes, those that he would unilaterally label friendly) scares me.
    - cousin_it Nov 18, 2010, 7:31 AM
      14 points
      Parent
      
      
      unilaterally label friendly
      
      I love your turn of phrase, it has a Cold War ring to it.
      
      The question why anyone would ever sincerely want to build an AI which extrapolates anything other than their personal volition is still unclear to me. It hinges on the definition of “sincerely want”. If Eliezer can task the AI with looking at humanity and inferring its best wishes, why can’t he task it with looking at himself and inferring his best idea of how to infer humanity’s wishes? How do we determine, in general, which things a document like CEV must spell out and which things can/should be left to the mysterious magic of “intelligence”?
      - wedrifid Nov 18, 2010, 7:53 AM
        12 points
        Parent
        
        
        The question why anyone would ever sincerely want to build an AI which extrapolates anything other than their personal volition is still unclear to me. It hinges on the definition of “sincerely want”. If Eliezer can task the AI with looking at humanity and inferring its best wishes, why can’t he task it with looking at himself and inferring his best idea of how to infer humanity’s wishes?
        
        This has been my thought exactly. Barring all but the most explicit convolution any given person would prefer their own personal volition to be extrapolated. If by happenstance I should be altruistically and perfectly infatuated by, say Sally, then that’s the FAI’s problem. It will turn out that extrapolating my volition will then entail extrapolating Sally’s volition. The same applies to caring about ‘humanity’, whatever that fuzzy concept means when taken in the context of unbounded future potential.
        
        I am also not sure how to handle those who profess an ultimate preference for a possible AI that extrapolates other than their own volition. I mean, clearly they are either lying, crazy or naive. It seems safer to trust someone who says “I would ultimately prefer FAI but I am creating FAI for the purpose of effective cooperation.”
        
        Similarly, if someone wanted to credibly signal altruism to me it would be better to try to convince me that CEV has a lot of similarities with CEV that arise due to altruistic desires rather than saying that they truly sincerely prefer CEV. Because the later is clearly bullshit of some sort.
        
        How do we determine, in general, which things a document like CEV must spell out, and which things can/should be left to the mysterious magic of “intelligence”?
        
        I have no idea, I’m afraid.
        Eugine_Nier Nov 18, 2010, 8:29 AM
        9 points
        Parent
        
        Eliezer appears to be asserting that CEV is equal for all humans. His arguments leave something to be desired. In particular, this is an assertion about human psychology, and requires evidence that is entangled with reality.
        
        Leaving aside the question of whether even a single human’s volition can be extrapolated into a unique coherent utility function, this assertion has two major components:
        
        1) humans are sufficiently altruistic that say CEV doesn’t in any way favor Alice over Bob.
        
        2) humans are sufficiently similar that any apparent moral disagreement between Alice and Bob is caused by one or both having false beliefs about the physical world.
        
        I find both these statements dubious, especially the first, since I see on reason why evolution would make us that altruistic.
        timtyler Nov 20, 2010, 10:16 AM
        1 point
        Parent
        
        
        Eliezer appears to be asserting that CEV is equal for all humans.
        
        The “C” in “CEV” stands for “Coherent”. The concept refers to techniques of combining the wills of a bunch of agents. The idea is not normally applied to a population consisting of single human. That would just be EV. I am not aware of any evidence that Yu-El thinks that EV is independent of the .
        Perplexed Nov 18, 2010, 6:49 PM
        0 points
        Parent
        
        
        Eliezer appears to be asserting that CEV is equal for all humans.
        
        The phrase “is equal for all humans” is ambiguous. Even if all humans had identical psychologies, that could still all be selfish. The scare-quoted “source code” for Values and Values might be identical, but I think that both will involve self “pointers” resolving to Eliezer in one case and to Archimedes in the other.
        
        We can define that two persons values are “parametrically identical” if they can be expressed in the same “source code”, but the code contains one or more parameters which are interpreted differently for different persons. A self pointer is one obvious parameter that we might be prepared to permit in “coherent” human values. That people are somewhat selfish does not necessarily conflict with our goal of determining a fair composite CEV of mankind—there are obvious ways of combining selfish values into composite values by giving “equal weight” (more scare quotes) to the values of each person.
        
        The question then arises, are there other parameters we should expect besides self? I believe there are. One of them can be called the now pointer—it designates the current point in time. The now pointer in Values resolves to ~150 BC whereas Values resolves to ~2010 AD. Both are allowed to be more interested in the present and immediate future than in the distant future. (Whether they should be interested at all in the recent past is an interesting question, but somewhat orthogonal to the present topic.)
        
        How do we combine now pointers of different persons when constructing a CEV for mankind. Do we do it by assigning “equal weights” to the now of each person as we did for the self pointers? I believe this would be a mistake. What we really want, I believe, is a weighting scheme which changes over time—a system of exponential discounting. Actions taken by an FAI in the year 2100 should mostly be for the satisfaction of the desires of people alive in 2100. The FAI will give some consideration in 2100 to the situation in 2110 because the people around in 2100 will also be interested in 2110 to some extent. It will (in 2100) give less consideration to the prospects in 2200, because people in 2100 will be not that interested in 2200. “After all”, they will rationally say to themselves, “we will be paying the year 2200 its due attention in 2180, and 2190, and especially 2199. Let the future care for itself. It certainly isn’t going to care for us!”
        
        There are various other parameters that may appear in the idealized common “source code” for Values. For example, there may be different preferences regarding the discount rate used in the previous paragraph, and there may be different preferences regarding the “Malthusian factor”—how many biological descendents or clones one accumulates and how fast. It is not obvious to me whether we need to come up with rules for combining these into a CEV or whether the composite versions of these parameters fall out automatically from the rules for combining self and now parameters.
        
        Sorry for the long response, but your comment inspired me.
        timtyler Nov 18, 2010, 11:55 PM
        0 points
        Parent
        
        
        What we really want, I believe, is a weighting scheme which changes over time—a system of exponential discounting. Actions taken by an FAI in the year 2100 should mostly be for the satisfaction of the desires of people alive in 2100. The FAI will give some consideration in 2100 to the situation in 2110 because the people around in 2100 will also be interested in 2110 to some extent. It will (in 2100) give less consideration to the prospects in 2200, because people in 2100 will be not that interested in 2200. “After all”, they will rationally say to themselves, “we will be paying the year 2200 its due attention in 2180, and 2190, and especially 2199.
        
        I don’t think you need a “discounting” scheme. Or at least, you would get what is needed there “automatically”—if you just maximise expected utility. The same way Deep Blue doesn’t waste its time worrying about promoting pawns on the first move of the game—even if you give it the very long term (and not remotely “discounted”) goal of winning the whole game.
        Jack Nov 19, 2010, 12:31 PM
        0 points
        Parent
        
        
        The same way Deep Blue doesn’t waste its time worrying about promoting pawns on the first move of the game—even if you give it the very long term (and not remotely “discounted”) goal of winning the whole game.
        
        Is this really true? My understanding is that Deep Blue’s position evaluation function was determined by an analysis of a hundreds of thousands of games. Presumably it ranked openings which had a tendency to produce more promotion opportunities higher than openings which tended to produce fewer promotion opportunities (all else being equal and assuming promoting pawns correlates with wins).
        timtyler Nov 19, 2010, 8:40 PM
        0 points
        Parent
        
        I wasn’t talking about that—I meant it doesn’t evaluate board positions with promoted pawns at the start of the game—even though these are common positions in complete chess games. Anyway, forget that example if you don’t like it, the point it illustrates is unchanged.
        Perplexed Nov 19, 2010, 12:18 AM
        0 points
        Parent
        
        
        I don’t think you need a “discounting” scheme. Or at least, you would get what is needed there “automatically”—if you just maximise expected utility.
        
        Could you explain why you say that? I can imagine two possible reasons why you might, but they are both wrong. Your “Deep Blue” example suggests that you are laboring under some profound misconceptions about utility theory and the nature of instrumental values.
        timtyler Nov 19, 2010, 8:04 AM
        −2 points
        Parent
        
        This is this one again. You don’t yet seem to agree with it—and it isn’t clear to me why not.
        Perplexed Nov 19, 2010, 4:47 PM
        0 points
        Parent
        
        Nor is it clear to me why you did not respond to my question / request for clarification.
        Expand this thread
        timtyler Nov 19, 2010, 8:26 PM
        1 point
        Parent
        
        I did respond. I didn’t have an essay on the topic prepared—but Yu-El did, so I linked to that.
        
        If you want to hear it in my own words:
        
        Wiring in temporal discounting is usually bad—since the machine can usually figure out what temporal discounting is appropriate for its current circumstances and abilities much better than you can. It is the same as with any other type of proximate goal.
        
        Instead you are usually best off just telling the machine your preferences about the possible states of the universe.
        
        If you are thinking you want the machine to mirror your own preferences, then I recommend that you consider carefully whether your ultimate preferences include temporal discounting—or whether all that is just instrumental.
        Perplexed Nov 20, 2010, 12:53 AM
        1 point
        Parent
        
        
        I did respond.
        
        I don’t see how. My question was:
        
        Could you explain why you say that?
        
        Referring to this that you said:
        
        Or at least, you would get what is needed there [instead of discounting] “automatically”—if you just maximise expected utility.
        
        You have still not explained why you said this. The question that discounting answers is, “Which is better: saving 3 lives today or saving 4 lives in 50 years?” Which is the same question as “Which of the two has the higher expected utility in current utilons?” We want to maximize expected current utility regardless of what we decide regarding discounting.
        
        However, since you do bring up the idea of maximizing expected utility, I am very curious how you can simultaneously claim (elsewhere on this thread) that utilities are figures of merit attached to actions rather than outcomes. Are you suggesting that we should be assessing our probability distribution over actions and then adding together the products of those probabilities with the utility of each action?
        timtyler Nov 20, 2010, 9:19 AM
        2 points
        Parent
        
        Regarding utility, utilities are just measures of satisfaction. They can be associated with anything.
        
        It is a matter of fact that utilities are associated with actions in most agents—since agents have evolved to calculate utilities in order to allow them to choose between their possible actions.
        
        I am not claiming that utilities are not frequently associated with outcomes. Utilities are frequently linked to outcomes—since most evolved agents are made so in such a way that they like to derive satisfaction by manipulating the external world.
        
        However, nowhere in the definition of utility does it say that utilities are necessarily associated with external-world outcomes. Indeed, in the well-known phenomena of “wireheading” and “drug-taking” utility is divorced from external-world outcomes—and deliberately manufactured.
        Perplexed Nov 20, 2010, 4:40 PM
        0 points
        Parent
        
        
        utilities are just measures of satisfaction. They can be associated with anything.
        
        True. But in most economic analysis, terminal utilities are associated with outcomes; the expected utilities that become associated with actions are usually instrumental utilities.
        
        Nevertheless, I continue to agree with you that in some circumstances, it makes sense to attach terminal utilities to actions. This shows up, for example, in discussions of morality from a deontological viewpoint. For example, suppose you have a choice of lying or telling the truth. You assess the consequences of your actions, and are amused to discover that there is no difference in the consequences—you will not be believed in any case. A utilitarian would say that there is no moral difference in this case between lying and telling the truth. A Kant disciple would disagree. And the way he would explain this disagreement to the utilitarian would be to attach a negative moral utility to the action of speaking untruthfully.
        timtyler Nov 20, 2010, 6:32 PM
        2 points
        Parent
        
        Utilities are often associated with states of the world, yes. However, here you seemed to balk at utilities that were not so associated. I think such values can still be called “utilities”—and “utility functions” can be used to describe how they are generated—and the standard economic framework accommodates this just fine.
        
        What this idea doesn’t fit into is the von Neumann–Morgenstern system—since it typically violates the independence axiom. However, that is not the end of the world. That axiom can simply be binned—and fairly often it is.
        Perplexed Nov 20, 2010, 8:11 PM
        0 points
        Parent
        
        
        What this idea doesn’t fit into is the von Neumann–Morgenstern system—since it typically violates the independence axiom.
        
        Unless you supply some restrictions, it is considerably more destructive than that. All axioms based on consequentialism are blown away. You said yourself that we can assign utilities so as to rationalize any set of actions that an agent might choose. I.e. there are no irrational actions. I.e. decision theory and utility theory are roughly as useful as theology.
        timtyler Nov 20, 2010, 8:52 PM
        −2 points
        Parent
        
        No, no! That is like saying that a universal computer is useless to scientists—because it can be made to predict anything!
        
        Universal action is a useful and interesting concept partly because it allows a compact, utility-based description of arbitrary computable agents. Once you have a utility function for an agent, you can then combine and compare its utility function with that of other agents, and generally use the existing toolbox of economics to help model and analyse the agent’s behaviour. This is all surely a Good Thing.
        Perplexed Nov 20, 2010, 9:05 PM
        0 points
        Parent
        
        I’ve never seen the phrase universal action before. Googling didn’t help me. It certainly sounds like it might be an interesting concept. Can you provide a link to an explanation more coherent than the one you have attempted to give here?
        
        As to whether a “utility-based” description of an agent that does not adhere to the standard axioms of utility is a “good thing”—well I am doubtful. Surely it does not enable use of the standard toolbox of economics, because that toolbox takes for granted that the participants in the economy are (approximately) rational agents.
        timtyler Nov 20, 2010, 10:44 PM
        0 points
        Parent
        
        You have an alternative model of arbitrary computable agents to propose?
        
        You don’t think the ability to model an arbitrary computable agent is useful?
        
        What is the problem here? Surely a simple utility-based framework for modelling the computable agent of your choice is an obvious Good Thing.
        Perplexed Nov 20, 2010, 11:26 PM
        2 points
        Parent
        
        I see no problem modeling computable agents without even mentioning “utility”.
        
        I don’t yet see how modeling them as irrational utility maximizers is useful, since a non-utility-based approach will probably be simpler.
        timtyler Nov 21, 2010, 12:03 AM
        0 points
        Parent
        
        Part of the case for using a utility maximization framework is that we can see that many agents naturally use an internal representation of utility. This is true for companies, and other “economic” actors. It is true to some extent for animal brains—and it is true for many of the synthetic artificial agents that have been constructed. Since so many agents are naturally utility-based, that makes the framework an obvious modelling medium for intelligent agents.
        timtyler Nov 20, 2010, 11:56 PM
        0 points
        Parent
        
        
        I see no problem modeling computable agents without even mentioning “utility”.
        
        Similarly, you can model serial computers without mentioning Turing machines and parallel computers without mentioning cellular automata. Yet in those cases, the general abstraction turns out to be a useful and important concept. I think this is just the same.
        timtyler Nov 20, 2010, 9:22 PM
        0 points
        Parent
        
        Universal action is named after universal computation and universal construction.
        
        Universal computation—calculating anything computable;
        
        Universal construction—building anything;
        
        Universal action—doing anything;
        
        Universal construction and universal action have some caveats about being compatible with constraints imposted by things like physical law. “Doing anything” means something like: being able to feed arbitrary computable sequences in parallel to your motor outputs. Sequences that fail due to severing your own head don’t violate the spirit of the idea, though. As with universal computation, universal action is subject to resource limitations in practice. My coinage—AFAIK. Attribution: unpublished manuscript ;-)
        Perplexed Nov 20, 2010, 11:47 PM
        0 points
        Parent
        
        Well, I’ll just ignore the fact that universal construction means to me something very different than it apparently means to you. Your claim seems to be that we can ‘program’ a machine (which is already known to maximize utility) so as to output any sequence of symbols we wish it to output; program it by the clever technique of assigning a numeric utility to each possible infinite output string, in such a way that we attach the largest numeric utility to the specific string that we want.
        
        And you are claiming this in the same thread in which you disparage all forms of discounting the future.
        
        What am I missing here?
        timtyler Nov 21, 2010, 12:17 AM
        0 points
        Parent
        
        For my usage, see:
        
        According to von Neumann [18], a constructor is endowed with universal construction if it is able to construct every other automaton, i.e. an automaton of any dimensions.
        
        http://carg2.epfl.ch/Publications/2004/PhysicaD04-Mange.pdf
        
        The term has subsequently become overloaded, it is true.
        
        If I understand it correctly, the rest of your comment is a quibble about infinity. I don’t “get” that. Why not just take things one output symbol at a time?
        Perplexed Nov 21, 2010, 12:51 AM
        0 points
        Parent
        
        
        According to von Neumann …
        
        Wow. I didn’t see that one coming. Self-reproducing cellular automata. Brings back memories.
        
        If I understand it correctly, the rest of your comment is a quibble about infinity. I don’t “get” that. Why not just take things one output symbol at a time?
        
        Well, it wasn’t just a quibble about infinity. There was also the dig about discount rates. ;)
        
        But I really am mystified. Is a ‘step’ in this kind of computation to output a symbol and switch to a different state? Are there formulas for calculating utilities? What data go into the calculation?
        
        Exactly how does computation work here? Perhaps I need an example. How would you use this ‘utility maximization as a programming language’ scheme to program the machine to compute the square root of 2? I really don’t understand how this is related to either lambda calculus or Turing machines. Why don’t you take some time, work out the details, and then produce one of your essays?
        timtyler Nov 21, 2010, 9:20 AM
        0 points
        Parent
        
        I didn’t (and still don’t) understand how discount rates were relevant—if not via considering the comment about infinite output strings.
        
        What data go into the calculation of utilities? The available history of sense data, memories, and any current inputs. The agent’s internal state, IOW.
        
        Exactly how does computation work here?
        
        Just like it normally does? You just write the utility function in a Turing-complete language—which you have to do anyway if you want any generality. The only minor complication is how to get a (single-valued) “function” to output a collection of motor outputs in parallel—but serialisation provides a standard solution to this “problem”.
        
        Universal action might get an essay one day.
        
        ...and yes, if I hear too many more times that humans don’t have utility functions (we are better than that!) - or that utility maximisation is a bad implementation plan - I might polish up a page that debunks those—ISTM—terribly-flawed concepts—so I can just refer people to that.
        Perplexed Nov 21, 2010, 4:43 PM
        0 points
        Parent
        
        
        I didn’t (and still don’t) understand how discount rates were relevant
        
        What is it that the agent acts so as to maximize?
        
        The utility of the next action (ignoring the utility of expected future actions)
        The utility of the next action plus a discounted expectation of future utilities.
        The simple sum of all future expected utilities.
        
        To me, only the first two options make mathematical sense, but the first doesn’t really make sense as a model of human motivation.
        timtyler Nov 21, 2010, 4:58 PM
        −2 points
        Parent
        
        
        What is it that the agent acts so as to maximize?
        
        I would usually answer this with a measure of inclusive fitness. However, it appears here that we are just talking about the agent’s brain—so in this context what that maximises is just utility—since that is the conventional term for such a maximand.
        
        Your options seem to be exploring how agents calculate utilities. Are those all the options? An agent usually calculates utilities associated with its possible actions—and then chooses the action associated with the highest utility. That option doesn’t seem to be on the list. It looks a bit like 1 - but that seems to specifiy no lookahead—or no lookahead of a particular kind. Future actions are usually very important influences when choosing the current action. Their utilities are usually pretty important too.
        
        If you are trying to make sense of my views in this area, perhaps see the bits about pragmatic and ideal utility functions—here:
        
        http://timtyler.org/expected_utility_maximisers/
        shokwave Nov 21, 2010, 5:44 PM
        2 points
        Parent
        
        
        Are those all the options?
        
        Yes. In fact, 2 strictly contains both 1 and 3, by virtue of setting the discount rate to either 0 or 1.
        
        Future actions are usually very important influences when choosing the current action.
        
        But not strictly as important as the utility of the outcome of the current action. The amount by which future actions are less important than the outcome of the current action, and the methods by which we determine that, are what we mean when we say discount rates.
        timtyler Nov 21, 2010, 6:08 PM
        0 points
        Parent
        
        
        Are those all the options?
        
        Yes. In fact, 2 strictly contains both 1 and 3, by virtue of setting the discount rate to either 0 or 1.
        
        That helps understand the options. I am not sure I had enough info to figure out what you meant before.
        
        1 corresponds to eating chocolate gateau all day and not brushing your teeth—not very realistic as you say. 3 looks like an option containing infinite numbers—and 2 is what all practical agents actually do.
        
        However, I don’t think this captures what we were talking about. Pragmatic utility functions are necessarily temporally discounted—due to resource limitations and other effects. The issue is more whether ideal utility functions can be expected to be so discounted. I can’t think why they should be—and can think of several reasons why they shouldn’t be—which we have already covered.
        
        Infinity is surely not a problem—you can just maximise utility over T years and let T increase in an unbounded fashion. The uncertainty principle limits the predictions of embedded agents in practice—so T won’t ever become too large to deal with.
        Perplexed Nov 21, 2010, 6:45 PM
        0 points
        Parent
        
        
        However, I don’t think this captures what we were talking about. Pragmatic utility functions are necessarily temporally discounted—due to resource limitations and other effects.
        
        My understanding is that “pragmatic utility functions” are supposed to be approximations to “ideal utility functions”—preferable only because the “pragmatic” are effectively computable whereas the ideal are not.
        
        Our argument is that we see nothing constraining ideal utility functions to be finite unless you allow discounting at the ideal level. And if ideal utilities are infinite, then pragmatic utilities that approximate them must be infinite too. And comparison of infinite utilities in the hope of detecting finite differences cannot usefully guide choice. Hence, we believe that discounting at the ideal level is inevitable. Particularly if we are talking about potentially immortal agents (or mortal agents who care about an infinite future).
        
        Your last paragraph made no sense. Are you claiming that the consequence of actions made today must inevitably have negligible effect upon the distant future? A rather fatalistic stance to find in a forum dealing with existential risk. And not particularly realistic, either.
        timtyler Nov 21, 2010, 7:06 PM
        −1 points
        Parent
        
        You seem obsessed with infinity :-( What about the universal heat death? Forget about infinity—just consider whether we want to discount on a scale of 1 year, 10 years, 100 years, 1,000 years, 10,000 years—or whatever.
        
        I think “ideal” short-term discounting is potentially problematical. Once we are out to discounting on a billion year timescale, that is well into the “how many angels dance on the head of a pin” territory—from my perspective.
        
        Some of the causes of instrumental discounting look very difficult to overcome—even for a superintelligence. The future naturally gets discounted to the extent that you can’t predict and control it—and many phenomena (e.g. the weather) are very challenging to predict very far into the future—unless you can bring them actively under your control.
        
        Are you claiming that the consequence of actions made today must inevitably have negligible effect upon the distant future?
        
        No, The idea was that predicting those consequences is often hard—and it gets harder the further out you go. Long term predictions thus often don’t add much to what short-term ones give you.
        shokwave Nov 22, 2010, 11:56 AM
        0 points
        Parent
        
        
        What about the universal heat death?
        
        Flippantly: we’re going to have billions of years to find a solution to that problem.
        timtyler Nov 20, 2010, 8:56 AM
        2 points
        Parent
        
        Many factors “automatically” lead to temporal discounting if you don’t wire it in. The list includes:
        
        Agents are mortal—they might die before the future utility arrives
        Agents exhibit senescence—the present is more valuable to them than the future, because they are younger and more vital;
        The future is uncertain—agents have limited capacities to predict the future;
        The future is hard to predicably influence by actions taken now;
        
        I think considerations such as the ones listed above adequately account for most temporal discounting in biology—though it is true that some of it may be the result of adaptations to deal with resource-limited cognition, or just plain stupidity.
        
        Note that the list is dominated by items that are a function of the capabilities and limitations of the agent in question. If the agent conquers senescence, becomes immortal, or improves its ability to predict or predictably influence the future, then the factors all change around. This naturally results in a different temporal discounting scheme—so long as it has not previously been wired into the agent by myopic forces.
        
        Basically, temporal discounting can often usefully be regarded as instrumental. Like energy, or gold, or warmth. You could specify how much each of these things is valued as well—but if you don’t they will be assigned instrumental value anyway. Unless you think you know their practical value better than a future superintelligent agent, perhaps you are better off leaving such issues to it. Tell the agent what state of affairs you actually want—and let it figure out the details of how best to get it for you.
        
        Temporal discounting contrasts with risk aversion in this respect.
        Perplexed Nov 20, 2010, 4:25 PM
        0 points
        Parent
        
        
        Basically, temporal discounting can often usefully be regarded as instrumental.
        
        Quite true. I’m glad you included that word “often”. Now we can discuss the real issue: whether that word “often” should be changed to “always” as EY and yourself seem to claim. Or whether utility functions can and should incorporate the discounting of the value of temporally distant outcomes and pleasure-flows for reasons over and above considerations of instrumentality.
        
        Temporal discounting contrasts with risk aversion in this respect.
        
        A useful contrast/analogy. You seem to be claiming that risk aversion is not purely instrumental; that it can be fundamental; that we need to ask agents about their preferences among risky alternatives, rather than simply axiomatizing that a rational agent will be risk neutral.
        
        But I disagree that this is in contrast to the situation with temporal discounting. We need to allow that rational and moral agents may discount the value of future outcomes and flows for fundamental, non-instrumental reasons. We need to ask them. This is particularly the case when we consider questions like the moral value of a human life.
        
        The question before us is whether I should place the same moral value now on a human life next year and a human life 101 years from now. I say ‘no’; EY (and you?) say yes. What is EY’s justification for his position? Well, he might invent a moral principle that he might call “time invariance of moral value” and assert that this principle absolutely forces me to accept the equality:
        
        value@t(life@t+1) = value@t(life@t+101).
        
        I would counter that EY is using the invalid “strong principle of time invariance”. If one uses the valid “weak principle of time invariance” then all that we can prove is that:
        
        value@t(life@t+1) = value@t+100(life@t+101)
        
        So, we need another moral principle to get to where EY wants to go. EY postulates that the moral discount rate must be zero. I simply reject this postulate (as would the bulk of mankind, if asked). EY and I can both agree to a weaker postulate, “time invariance of moral preference”. But this only shows that the discounting must be exponential in time; it doesn’t show that the rate must be zero.
        
        Neither EY nor you has provided any reason (beyond bare assertion) why the moral discount rate should be set to zero. Admittedly, I have yet to give any reason why it should be set elsewhere. This is not the place to do that. But I will point out that a finite discount rate permits us to avoid the mathematical absurdities arising from undiscounted utilities with an unbounded time horizon. EY says “So come up with better math!”—a response worth taking seriously. But until we have that better math in hand, I am pretty sure EY is wearing the crackpot hat here, not me.
        timtyler Nov 20, 2010, 5:03 PM
        2 points
        Parent
        
        
        Now we can discuss the real issue: whether that word “often” should be changed to “always” as EY and yourself seem to claim.
        
        You can specify a method temporal discounting if you really want to. Just as you can specify a value for collecting gold atoms if you really want to. However, there are side effects and problems associated with introducing unnecessary constraints.
        
        We need to allow that rational and moral agents may discount the value of future outcomes and flows for fundamental, non-instrumental reasons. We need to ask them.
        
        If we think that such creatures are common and if we are trying to faithfully mirror and perpetuate their limitations, you mean.
        
        Neither EY nor you has provided any reason (beyond bare assertion) why the moral discount rate should be set to zero.
        
        I don’t really see this as a “should” question. However, there are consequences to wiring in instrumental values. You typically wind up with a handicapped superintelligence. I thought I already gave this as my reasoning, with comments such as “unless you think you know their practical value better than a future superintelligent agent, perhaps you are better off leaving such issues to it.”
        
        I will point out that a finite discount rate permits us to avoid the mathematical absurdities arising from undiscounted utilities with an unbounded time horizon.
        
        Not a practical issue—IMO. We are resource-limited creatures, who can barely see 10 years into the future. Instrumental temporal discounting protects us from infinite maths with great effectiveness.
        
        This is the same as in biology. Organisms act as though they want to become ancestors—not just parents or grandparents. That is the optimisation target, anyway. However, instrumental temporal discounting protects them from far-future considerations with great effectiveness.
        Perplexed Nov 20, 2010, 5:28 PM
        1 point
        Parent
        
        
        there are consequences to wiring in instrumental values. You typically wind up with a handicapped superintelligence. I thought I already gave this as my reasoning …
        
        You did indeed. I noticed it, and meant to clarify that I am not advocating any kind of “wiring in”. Unfortunately, I failed to do so.
        
        My position would be that human beings often have discount factors “wired in” by evolution. It is true, of course, that like every other moral instinct analyzed by EvoPsych, the ultimate adaptationist evolutionary explanation of this moral instinct is somewhat instrumental, but this doesn’t make it any less fundamental from the standpoint of the person born with this instinct.
        
        As for moral values that we insert into AIs, these too are instrumental in terms of their final cause—we want the AIs to have particular values for our own instrumental reasons. But, for the AI, they are fundamental. But not necessarily ‘wired in’. If we, as I believe we should, give the AI a fundamental meta-value that it should construct its own fundamental values by empirically constructing some kind of CEV of mankind—if we do this then the AI will end up with a discount factor because his human models have discount factors. But it won’t be a wired-in or constant discount factor. Because the discount factors of mankind may well change over time as the expected lifespan of humans changes, as people upload and choose to run at various rates, as people are born or as they die.
        
        I’m saying that we need to allow for an AI discount factor or factors which are not strictly instrumental, but which are not ‘wired in’ either. And especially not a wired-in discount factor of exactly zero!
        timtyler Nov 20, 2010, 7:23 PM
        0 points
        Parent
        
        I think we want a minimally myopic superintelligence—and fairly quickly. We should not aspire to program human limitations into machines—in a foolish attempt to mirror their values. If the Met. Office computer is handling orders asking it to look three months out—and an ethtics graduate says that it too future-oriented for a typical human, and it should me made to look less far out, so it better reflects human values—he should be told what an idiot he is being.
        
        We use machines to complement human capabilities, not just to copy them. When it comes to discounting the future, machines will be able to see and influence furtther—and we would be well-advised let them.
        
        Much harm is done today due to temporal discounting. Governments look no further than election day. Machines can help put a stop to such stupidity and negligence—but we have to know enough to let them.
        
        As Eleizer says, he doesn’t propose doing much temporal discounting—except instrumentally. That kind of thing can be expected to go up against the wall as part of the “smarter, faster, wiser, better” part of his CEV.
        Perplexed Nov 20, 2010, 7:43 PM
        −1 points
        Parent
        
        And so we are in disagreement. But I hope you now understand that the disagreement is because our values are different rather than because I don’t understand the concept of values. Ironically our values differ in that I prefer to preserve my values and those of my conspecifics beyond the Singularity, whereas you distrust those values and the flawed cognition behind them, and you wish to have those imperfect human things replaced by something less messy.
        timtyler Nov 20, 2010, 9:10 PM
        0 points
        Parent
        
        I don’t see myself as doing any non-instrumental temporal discounting in the first place. So, for me personally, losing my non-instrumental temporal discounting doesn’t seem like much of a loss.
        
        However, I do think that our temporal myopia is going to fall by the wayside. We will stop screwing over the immediate future because we don’t care about it enough. Myopic temporal discounting represents a primitive form of value—which is destined to go the way of cannibalism and slavery.
      - nshepperd Nov 20, 2010, 11:43 AM
        2 points
        Parent
        
        A CEV optimizer is less likely to do horrific things while its ability to extrapolate volition is “weak”. If it can’t extrapolate far from the unwise preferences people have now with the resources it has, it will notice that the EV varies a lot among the population, and take no action. Or if the extrapolation system has a bug in it, this will hopefully show up as well. So coherence is a kind of “sanity test”.
        
        That’s one reason that leaps to mind anyway.
        
        Of course the other is that there is no evidence any single human is Friendly anyway, so cooperation would be impossible among EV maximizing AI researchers. As such, an AI that maximizes EV is out of the question already. CEV is the next best thing.
  - Vladimir_Nesov Nov 18, 2010, 10:44 AM
    3 points
    Parent
    
    The argument seems to be, if Preference1 is too different from Preference1, then Preference1 is a bad method of preference-extraction and should be rethought. A good method Preference2 for preference-extraction should have Preference2 much closer to Preference2. And since Preference1 is inadequate, as demonstrated by this test case, Preference1 is also probably hugely worse for cousin_it than Preference2, even if Preference2 is better than Preference2.
    
    We are not that wise in the sense that any moral progress we’ve achieved, if it’s indeed progress (so that on reflection, both past and future would agree that the direction was right) and not arbitrary change, shouldn’t be a problem for an AI to repeat, and thus this progress in particular (as opposed to other possible differences) shouldn’t contribute to differences in extracted preference.
    - Eugine_Nier Nov 18, 2010, 4:23 PM
      0 points
      Parent
      
      
      The argument seems to be, if Preference1 is too different from Preference1, then Preference1 is a bad method of preference-extraction and should be rethought. A good method Preference2 for preference-extraction should have Preference2 much closer to Preference2. And since Preference1 is inadequate, as demonstrated by this test case, Preference1 is also probably hugely worse for cousin_it than Preference2, even if Preference2 is better than Preference2.
      
      Of course the above constraint isn’t nearly enough to uniquely specify Preference2.
steven Aug 12, 2008, 5:19 PM
0 points

Just to check, surely you’re not saying an extrapolated version of Archimedes and a thousand people who agreed with him wouldn’t have turned out OK?

It seems to me that we have some quite strong evidence against rotten bastards theory in that intelligent and well-informed people IRL seem to converge away from bastardly beliefs. Still, rotten bastards theory seems worth thinking about for negative-utilitarian reasons.
Tim_Tyler Aug 12, 2008, 5:36 PM
1 point


The basic ev-bio necessity behind the psychological unity of human brains is not widely understood.

“Necessity” may be over-stating it. Humans are not very diverse, due to recent genetic bottlenecks. We do have enormous psychological differences between us—due to neotony-induced developmental plasticity—and because there are many ways to break a human. We haven’t yet lived in the human hive long enough for the developmental plasticity to result in clearly-discrete queen/worker/warrior castes, with associated mental adaptations, though.

Go back in time, and often proto-humanity was divided into multiple tribes: the neanderthals, the cro-magnons, etc—with consequently reduced psychological unity. Travel and the modern gene pool expansion and stirring has obliterated the competeting tribes.
Tim_Tyler Aug 12, 2008, 5:47 PM
0 points


Had it fallen to Archimedes to build an AI, he might well have been tempted to believe that the whole fate of humanity would depend on whether the extrapolated volition of Syracuse or of Rome came to rule the world

I doubt there would have been much controversy. The fate of humanity is bound to be obliteration by far more advanced technology. The idea that our crappy evolved slug bodies, with their sluggish meat brains might go the distance would have seemed pretty ridiculous, even back then.
Roko Aug 12, 2008, 7:03 PM
0 points

@Eliezer: “I’m sure that Archimedes of Syracuse thought that Syracuse had lots of incredibly important philosophical and cultural differences with the Romans who were attacking his city. Had it fallen to Archimedes to build an AI, he might well have been tempted to believe that the whole fate of humanity would depend on whether the extrapolated volition of Syracuse or of Rome came to rule the world—due to all those incredibly important philosophical differences.”

- I don’t see how adding the romans to the CEV algorithm would have made much of a difference either way. If Archimedes had chosen himself and a few thousand people who he deemed to be “wise”, I suspect things would have gone somewhat better than if he had decided, on grounds of “non-jerkness”, to extrapolate the volition of the whole of humanity at that time, including all the barbarians, hunter gatherers, cannibal pygmies, etc.

Why can we expect this to generalize? Well, people who design AIs self-select for intelligence. People who design AIs with friendliness in mind self-select for intelligence, caution etc.
Larry_D'Anna Aug 12, 2008, 7:13 PM
0 points

Virge: Why is it “not the point”? In this discussion we are talking about differences in moral computation as implemented within individual humans. That the blind idiot’s global optimization strategy defines homosexuality as a defect is of no relevance.

well because we’re trying to characterize the sort of psychological diversity that can exist within our species. And this psychological unity argument is saying “we’re all the same, except for a mix of one-step changes”. This means that any complex adaptation in any human is in almost all humans. The exceptions being: sexual dimorphism, and the fact that certain individuals are “defective”, in the sense that one of their complex adaptations is broken. So if you’re arguing against this position and saying: look at homosexuals, they are different but not broken, then you aren’t talking about the same kind of “broken”. I’m not arguing that we should make base any moral judgment on evolution’s silly ideas of what’s broken. I’m just arguing that homosexuals don’t serve as a counterexample to the idea of psychological unity.

Virge: I wasn’t claiming a complex adaptation. I was claiming “other computations that could exhibit a superficial unity, but with a broad spread.”

I think you’re right here. But hopefully the spread is not so much that we could not come to agree if we “knew more, thought faster, were more the people we wished we were, and had grown up farther together.”

An example that gives me hope that this could be true is Vengeance. We seem to be born with a terminal value saying that it is good to see our enemies suffer. In terms of explicit moral argument, if not in deeds, we have mostly come to agree that it is wrong to take vengeance; even though some of us came from cultures that once reveled in it; even though some of us may be more naturally inclined towards it.

When you do Bayesian updates, the specific values of your initial priors become less important as you gather more evidence. The evidence piles up exponentially, and you need exponentially bad priors to keep the wrong answer in the face of it. Perhaps our moral debates are similar. Perhaps the “broad spread” is not so great that we cannot come to agree, if we consider enough evidence, if we hear enough arguments. Psychological unity does not prove that this is so, but without psychological unity, there would be no reason to hope it is possible.
Doug_S.Aug 12, 2008, 10:30 PM
2 points

Many people seem to have set their priors to 1 on several facts. I suspect even MiniLuv would have a hard time getting these guys to believe that Christianity is false.

In other words:

Bob: “Abortion is wrong.”
Sally: “Do you think that this is something of which most humans ought to be persuadable?”

Bob: “Yes, I do. Do you think abortion is right?”

Sally: “Yes, I do. And I don’t think that’s because I’m a psychopath by common human standards. I think most humans would come to agree with me, if they knew the facts I knew, and heard the same moral arguments I’ve heard.”

Bob and Sally are wrong. Well, half-wrong, anyway. A human’s beliefs depend on the order in which he hears arguments. If I info-dumped all my experiences into the Pope’s mind, and vice-versa, he’d still be a Christian and I’d still be an atheist.

I do not think that rational arguments and evidence are sufficient to convince people to change a sufficiently stubborn belief.

Eliezer, you’ve used your AI-box experiment as an example that people can be persuaded. If you had the Pope in a holodeck that you completely controlled, do you think you could convince him that Jesus was never resurrected? I really doubt it.
Jadagul Aug 13, 2008, 12:26 AM
0 points

Doug raises another good point. Related to what I said earlier, I think people really do functionally have prior probability=1 on some propositions. Or act as if they do. If “The Bible is the inerrant word of God” is a core part of your worldview, it is literally impossible for me to convince you this is false, because you use this belief to interpret any facts I present to you. Eliezer has commented before that you can rationalize just about anything; if “God exists” or “The Flying Spaghetti Monster exists” or “reincarnation exists” is part of the machinery you use to interpret your experience, in a deep enough way, your experiences can’t disprove it.
Eliezer Yudkowsky Aug 13, 2008, 1:12 AM
6 points

If you had the Pope in a holodeck that you completely controlled, do you think you could convince him that Jesus was never resurrected?

Oh hell yeah. Then again, I’d also go for it if you offered me 24 hours. Then again, the Pope almost certainly already believes that Jesus was never resurrected “except in our hearts” or some such, which makes the job much harder.
Doug_S.Aug 13, 2008, 2:19 AM
0 points

If the Pope knows that he’s in a holodeck, and that you are controlling it, would that change your answer?
Tim_Tyler Aug 13, 2008, 7:18 AM
3 points


The argument for psychological unity is that, as a sexually reproducing species, it is almost impossible for one gene to rise in relative frequency if the genes it depends on are not already nearly universal. So the all the diversity within any species at any given time consists of only one-step changes; no complex adaptations. The one exception of course is that males can have complex adaptations that females lack, and vice versa.

That argument is not right—as fig wasps demonstrate. Organisms can exhibit developmental plasticity, and show different traits under different circumstances. Humans are a good example of this: we show remarkable developmental plasticity due to neoteny. E.g. the phenotypes of the jockey and the sumo wrestler are remarkably different. The jockey is continuously on calorie restriction—and so exhibits a radically different gene expression profile in almost every tissue.
Roko Aug 13, 2008, 10:25 AM
0 points

Doug S: “A human’s beliefs depend on the order in which he hears arguments.”

- indeed. Anyone who has spent any time arguing with Christians should know this. And the effect is auto-catalytic—our beliefs about both facts and values tend to self-reinforce; c.f. “affective death spirals”.

As I have claimed earlier in the comments, many issues in the modern world are not issues that our in-built evolutionary urges or “yuck factors” can advise us on directly. So human beliefs on issues such as which politics is best, whether to be a bioluddite or a technoprogressive, whether and which religion to believe in, whether men and women should be given equal rights, etc, etc are decided by our developmental plasticity as Tim Tyler has described above.

The state of human belief and value is a mess, mostly based on irrationalities and self-sustaining cognitive biases which Eliezer has painstakingly pointed out on this blog. Hardly what I would call “psychological unity”. Yes, there is some unity: everyone favors their friends and family over strangers. Everyone has the same sexuality(ies). Everyone prefers pleasure over pain. But on important non-EEA questions, there is widespread and deep disagreement.

The CEV algorithm, as I currently understand it, would attempt to rectify this by taking the beliefs of every human, with equal weight, and editing incorrect factual beliefs. I am not convinced that I would like the output of such an algorithm.
Recovering_irrationalist Aug 13, 2008, 1:19 PM
0 points

Eliezer: The overall FAI strategy has to be one that would have turned out okay if Archimedes of Syracuse had been able to build an FAI.

I’d feel a lot safer if you’d extend this back at least to the infanticidal hunter-gatherers, and preferably to apes fighting around the 2001 monolith.
Eliezer Yudkowsky Aug 13, 2008, 1:30 PM
2 points

Recovering, if I write CEV then it ought to work on hunter-gatherers, but the hunter-gatherers could not plausible have understood the concepts involved in CEV (whereas I might be willing to try to explain it to Archimedes, if not any other Syracusans). So “think of a strategy such that it would work for hunter-gatherers” fails here; I can’t visualize that counterfactual unless the hunter-gatherers have the concept of math.

The apes fighting around the monolith are outside my moral reference frame and a CEV focused on them would produce different results.
steven Aug 13, 2008, 1:41 PM
0 points

Roko, it doesn’t just edit incorrect factual beliefs, there’s also the “what if we thought faster, were smarter, were more like the people we wished we were”, etc.
Roko Aug 13, 2008, 3:35 PM
0 points

Steven: “what if we thought faster, were smarter, were more like the people we wished we were”

- yes, I’m aware of this, but the first two act in essentially the same way—they cause simulees to more quickly come to factually correct beliefs, and the last is just a “consistent under reflection” condition.

These conditions make little difference to my concern: the algorithm will end up mixing my values (which I like) with values I hate (religious dogma, sharia law, Christian fundamentalism, the naturalistic fallacy/bioluddism … ), where my beliefs recieve a very small weighting, and those that I dislike receive a very large weighting.
steven Aug 13, 2008, 3:51 PM
0 points

If you think an IQ-200 extreme-polymath pope who pondered all current and future arguments for atheism would still remain a Christian, then maybe.
Roko Aug 13, 2008, 4:21 PM
1 point

Steven: If you think an IQ-200 extreme-polymath pope who pondered all current and future arguments for atheism would still remain a Christian, then maybe.

Unfortunately, I know from personal experience that there are people who are more intelligent than I am who have been “hooked” by religious memes. I know of several Cambridge mathematicians who are currently at PhD level, one of them in mathematical logic, who are members of this evangelical Christian organization. One in particular is extremely bright: triple first, British mathematics Olympiad, can grasp abstract mathematical concepts twice as fast as I can. When you question these kinds of people, you don’t get anywhere, I’ve tried it. They just keep coming up with ever more bizzare (but not quite logically contradictory) positions, up to and including abandonment of Occam’s razor, re-defining words in the ethical vocabulary, doubting the entire scientific method, etc. I’m sure many people here have had similar experiences. What is to say that the CEV of (all of) humanity won’t go like this?
Tim_Tyler Aug 13, 2008, 4:37 PM
−2 points

If there are smart Christians out there, then where are their works?

The nearest I thing I ever saw was probably Philip Johnson—and he’s not that smart.
- Peterdjones Jun 9, 2011, 3:15 PM
  2 points
  Parent
  
  There’s plenty in history, eg Newton.
Constant2 Aug 13, 2008, 5:01 PM
2 points

Time—Philip Johnson is not just a Christian but a creationist. Do you mean, “if there are smart creationists out there...”? I don’t really pay much attention to the religious beliefs of the smartest mathematicians and scientists and I’m not especially keen on looking into it now, but I would be surprised if all top scientists without exception were atheists. This page seems to suggest that many of the best scientists are something other than atheist, many of those Christian.
steven Aug 13, 2008, 5:35 PM
1 point

PhD in mathematical logic != extreme polymath. I guess Dyson and Tipler come closest to refuting my position, but I wouldn’t expect their beliefs to remain constant under a few thousand years of pondering the overcoming bias archives, say.
Caledonian2 Aug 13, 2008, 5:58 PM
0 points

You can bring a brain to data, but you can’t make it think.

Things like an IQ of 200 indicate that a person has certain cognitive strengths. They DO NOT indicate that those strengths will be utilized. Someone who is not concerned about being dishonest with themselves can self-convince of whatever they please.

At this point, there are no rational grounds for a person to accept the truth claims of, say, the various Christian doctrinal groups. A person who accepts them regardless has already decided to suspend rationality—unless a desire for honesty greater than their desire to continue to believe can be induced in them, they will never be reasoned out of their convinctions. Such induction is impossible in most people and is highly problematic in the remaining few.

Having the capacity to do a thing isn’t enough—you need also the motivation to do it. Lacking that motivation, potential becomes irrelevant.
Tim_Tyler Aug 13, 2008, 8:06 PM
0 points

What I mean is, if there are smart Christians out there, why can’t they put together a coherent argument favouring Christianity?

I mean, is this really the best they can do?
frelkins Aug 13, 2008, 9:17 PM
1 point

@Tim Tyler

“a coherent argument favouring Christianity”

Catholic people believe that they have such a thing, resting on strong philosophical definitions of ontology and truth. Traditionally I think that if one was truly interested in hearing a coherent argument for Christian belief, the Jesuit order specialized in teaching and expounding the philosophy. Short a long session with a Jesuit, you might consult the Catholic Encyclopedia for Christian arguments.

While perhaps you will ultimately agree that their system is coherent, that is, “marked by an orderly, logical, and aesthetically consistent relation of parts,” I doubt you will cotton to their first principles.

I do agree with Collins in your Time link on one point and one point alone: sincere discussion is probably the best way for honest minds. Otherwise, he’s not my demitasse of anything.
Jadagul Aug 14, 2008, 3:31 AM
0 points

Caledonian and Tim Tyler: there are lots of coherent defenses of Christianity. It’s just that many of them rest on statements like, “if Occam’s Razor comes into conflict with Revealed Truth, we must privilege the latter over the former.” This isn’t incoherent; it’s just wrong. At least from our perspective. Which is the point I’ve been trying to make. They’d say the same thing about us.

Roko: I sent you an email.
Peterdjones Jun 9, 2011, 9:15 PM
−2 points


Well, it is not logically necessary that we have a genuine disagreement. We might be mistaken in believing ourselves to mean the same thing by the words right and wrong, since neither of us can introspectively report our own moral reference frames or unfold them fully.”

I think the idea of associating the meaning of a word with a detailed theory of fine-grained set of criteria, allowing you to apply the term in all cases, has disadvantages.

Newtonian theory has a different set of fine grained criteria about gravity than relativistic theory. If we take those criteria as define the meaning of gravity, then they must be talking about different things. If we take that as the end of the story, then there is no way we can make sense of their being contrasting theories, or of one theory superseding another. The one is a theory of Newton-gravity, the other of Einstein-gravity. If we take them as both being theories of some more vaguely and coarsely defined notion gravity, we can explain their disagreement and differing success. And we don’t have to give up on Einstein-gravity and Newton-gravity.

You cannot have a disagreement about which algorithm should direct your actions, without first having the same meaning of should—and no matter how you try to phrase this in terms of “what ought to direct your actions” or “right actions” or “correct heaps of pebbles”, in the end you will be left with the empirical fact that it is possible to construct minds directed by any coherent utility function.

I don’t see the point of this comment. No-one holds that the constraint that makes some sets of values genuinely moral is the same as the constrain that makes them implementable.

When a paperclip maximizer and a pencil maximizer do different things, they are not disagreeing about anything, they are just different optimization processes.

That are both in the superclass of optimisation processes. Why should they not both be in the class of genuinely moral optimisation processes?

You cannot detach should-ness from any specific criterion of should-ness and be left with a pure empty should-ness that the paperclip maximizer and pencil maximizer can be said to disagree about—unless you cover “disagreement” to include differences where two agents have nothing to say to each other.

Meta ethics can supply a meaning of should/ought without specifying anything specific. For instance, if you ought to maximise happiness, that doesn’t specify any action without further information about what leads to happiness.