gRR comments on Holden’s Objection 1: Friendliness is dangerous

gRR 18 May 2012 13:28 UTC
−4 points
They are probably lying, trolling, joking, or psychos (=do not have enough extrapolated intelligence and knowledge).
- DanArmak 19 May 2012 15:40 UTC
  1 point
  Parent
  If you’re launching an irreversible CEV, it’s not very safe to rely on your intuition that other people’s expressed desires are “probably lying, trolling, joking” and so wouldn’t affect the CEV outcome.
  - gRR 19 May 2012 16:36 UTC
    0 points
    Parent
    I only proposed a hypothesis, which will become testable earlier than the time when CEV could be implemented.
    - DanArmak 19 May 2012 16:52 UTC
      2 points
      Parent
      How do you propose to test it without actually running a CEV calculation?
      - gRR 19 May 2012 17:01 UTC
        0 points
        Parent
        How can we even start defining CEV without brain scanning technology able to do much more than answering the original question?
        wedrifid 26 May 2012 4:19 UTC
        0 points
        Parent
        
        How can we even start defining CEV without brain scanning technology able to do much more than answering the original question?
        
        It would seem that we can define the algorithm which can be used to manipulate and process a given input of loosely defined inconsistent preferences. This would seem to be a necessary thing to do before any actual brain scanning becomes involved.
        DanArmak 19 May 2012 17:19 UTC
        0 points
        Parent
        Well part of my point is that indeed we can’t even define CEV today, let alone solve it, and so a lot of conclusions/propositions people put forward about what CEV’s output would be like are completely unsupported by evidence; they are mere wishful thinking.
        
        More on-topic: today you have humans as black boxes, but you can still measure what they value, by 1) offering them concrete tradeoffs and measuring behavior and 2) asking them.
        
        Tomorrow, suppose your new brain scanning tech allows you to perfectly understand how brains work. You can now explain how these values are implemented. But they are the same values you observed earlier. So the only new knowledge relevant to CEV would be that you might derive how people would behave in a hypothetical situation, without actually putting them in that situation (because that might be unethical or expensive).
        
        Now, suppose someone expresses a value that you think they are merely “lying, trolling or joking” about. In all of their behavior throughout their lives, and in their own words today, they honestly have this value. But your brain scanner shows that in some hypothetical situation, they would behave consistently with valuing this value less.
        
        By construction, since you couldn’t derive this knowledge from their life histories (already known without a brain scanner), these are situations they have (almost) never been in. (And therefore they aren’t likely to be in them in the future, either.)
        
        So why do you effectively say that for purposes of CEV, their behavior in such counterfactual situations is “their true values”, while their behavior in the real, common situations throughout their lives isn’t? Yes, humans might be placed in totally novel situations which can cause them to reconsider their values; because humans have conflicting values, and non-explicit values (but rather behaviors responding to situations), and no truly top-level goals (so that all values may change). But you could just as easily say that there are probably situations in which you could be placed so that you would come to value their values more.
        
        Your approach places people in the unfortunate position where they might live their whole lives believing in a value, and fighting for it, and then you (or the CEV AI) come up to them and says: I’m going to destroy everything you’ve valued so far. Not because of objective ethics or decree of God or majority vote or anything objective and external. But because they themselves actually “really” prefer completely different values even though on the conscious level, no matter how long they might think and talk and read about it, they would never reach that conclusion.
        gRR 19 May 2012 17:28 UTC
        0 points
        Parent
        
        In all of their behavior throughout their lives, and in their own words today, they honestly have this value
        
        This is the conditional that I believe is false when I say “they are probably lying, trolling, joking”. I believe that when you use the brain scanner on those nihilists, and ask them whether they would prefer the world where everyone is dead to any other possible world, and they say yes, the brain scanner would show they are lying, trolling or joking.
        DanArmak 19 May 2012 17:31 UTC
        2 points
        Parent
        OK. That’s possible. But why do you believe that, despite their large numbers and lifelong avowal of those beliefs?
        JoshuaZ 19 May 2012 17:40 UTC
        0 points
        Parent
        How would you respond if you were subject to such a brain scan and then informed that deep inside you actually are a nihilist who prefers the complete destruction of all life?
        gRR 19 May 2012 17:43 UTC
        0 points
        Parent
        I’d think someone’s playing a practical joke on me.
        JoshuaZ 19 May 2012 17:49 UTC
        0 points
        Parent
        And suppose we develop such brain scanning technology and scanning someone else who claims to want the destruction of all life and it says “yep, he does” how would you respond?
        gRR 19 May 2012 17:56 UTC
        0 points
        Parent
        Dunno… propose to kill them quickly and painlessly, maybe? But why do you ask? As I said, I don’t expect this to happen.
        Expand this thread
        JoshuaZ 19 May 2012 17:59 UTC
        0 points
        Parent
        That you don’t expect it to happen shouldn’t by itself be a reason not to consider it. I’m asking because it seems you are avoiding the hard questions by more or less saying you don’t think they will happen. And there are many more conflicting value sets which are less extreme (and apparently more common) than this one.
        gRR 19 May 2012 18:11 UTC
        0 points
        Parent
        Errr. This is a question of simple fact, which is either true or false. I believe it’s true, and build the plans accordingly. We can certainly think about contingency plans, of what to do if the belief turns out to be false, but so far no one agreed that the plan is good even in case the belief is true.
        TheOtherDave 19 May 2012 21:39 UTC
        0 points
        Parent
        You’ve lost me. Can you restate the question of simple fact to which you refer here, which you believe is true? Can you restate the plan that you consider good if that question is true?
        gRR 19 May 2012 22:33 UTC
        0 points
        Parent
        I believe there exist (extrapolated) wishes universal for humans (meaning, true for literally everyone). Among these wishes, I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.
        TheOtherDave 20 May 2012 4:15 UTC
        2 points
        Parent
        OK, cool.
        
        To answer your question: sure, if I assume (as you seem to) that the extrapolation process is such that I would in fact endorse the results, and I also assume that the extrapolation process is such that if it takes as input all humans it will produce at least one desire that is endorsed by all humans (even if they themselves don’t know it in their current form), then I’d agree that’s a good plan, if I further assume that it doesn’t have any negative side-effects.
        
        But the assumptions strike me as implausible, and that matters.
        
        I mean, if I assume that everyone being thrown into a sufficiently properly designed blender and turned into stew is a process I would endorse, and I also assume that the blending process has no negative side-effects, then I’d agree that that’s a good plan, too. I just don’t think any such blender is ever going to exist.
        gRR 20 May 2012 10:44 UTC
        2 points
        Parent
        Ok, but do you grant that running a FAI with “unanimous CEV” is at least (1) safe, and (2) uncontroversial? That the worst problem with it is that it may just stand there doing nothing—if I’m wrong about my hypothesis?
        TheOtherDave 20 May 2012 16:07 UTC
        0 points
        Parent
        I don’t know how to answer that question. Again, it seems that you’re trying to get an answer given a whole bunch of assumptions, but that you resist the effort to make those assumptions clear as part of the answer.
        
        It is not clear to me that there exists such a thing as a “unanimous CEV” at all, even in the hypothetical sense of something we might be able to articulate some day with the right tools.
        
        If I nevertheless assume that a unanimous CEV exists in that hypothetical sense, it is not clear to me that only one exists; presumably modifications to the CEV-extraction algorithm would result in different CEVs from the same input minds, and I don’t see any principled grounds for choosing among that cohort of algorithms that don’t in effect involve selecting a desired output first. (In which case CEV extraction is a complete red herring, since the output was a “bottom line” written in advance of CEV’s extraction, and we should be asking how that output was actually arrived at and whether we endorse that process. )
        
        If I nevertheless assume that a single CEV-extraction algorithm is superior to all the others, and further assume that we select that algorithm via some process I cannot currently imagine and run it, and that we then run a superhuman environment-optimizer with its output as a target, it is not clear to me that I would endorse that state change as an individual. So, no, I don’t agree that running it is uncontroversial. (Although everyone might agree afterwards that it was a good idea.)
        
        If the state change nevertheless gets implemented, I agree (given all of those assumptions) that the resulting state-change improves the world by the standards of all humanity. “Safe” is an OK word for that, I guess, though it’s not the usual meaning of “safe.”
        
        I don’t agree that the worst that happens, if those assumptions turn out to be wrong, is that it stands there and does nothing. The worst that happens is that the superhuman environment-optimizer runs with a target that makes the world worse by the standards of all humanity.
        
        (Yes, I understand that the CEV-extraction algorithm is supposed to prevent that, and I’ve agreed that if I assume that’s true, then this doesn’t happen. But now you’re asking me to consider what happens if the “hypothesis” is false, so I am no longer just assuming that’s true. You’re putting a lot of faith in a mysterious extraction algorithm, and it is not clear to me that a non-mysterious algorithm that satisfies that faith is likely, or that the process of coming up with one won’t come up with a different algorithm that antisatisfies that faith instead.)
        gRR 20 May 2012 17:25 UTC
        0 points
        Parent
        What I’m trying to do is find some way to fix the goalposts. Find a set of conditions on CEV that would satisfy. Whether such CEV actually exists and how to build it are questions for later. Lets just pile up constraints until a sufficient set is reached. So, lets assume that:
        
        “Unanimous” CEV exists
        And is unique
        And is definable via some easy, obviously correct, and unique process, to be discovered in the future,
        And it basically does what I want it to do (fulfil universal wishes of people, minimize interference otherwise),
        
        would you say that running it is uncontroversial? If not, what other conditions are required?
        TheOtherDave 20 May 2012 17:36 UTC
        0 points
        Parent
        No, I wouldn’t expect running it to be uncontroversial, but I would endorse running it.
        
        I can’t imagine any world-changing event that would be uncontroversial, if I assume that the normal mechanisms for generating controversy aren’t manipulated (in which case anything might be uncontroversial).
        
        Why is it important that it be uncontroversial?
        gRR 20 May 2012 17:56 UTC
        0 points
        Parent
        
        Why is it important that it be uncontroversial?
        
        I’m not sure. But it seems a useful property to have for an AI being developed. It might allow centralizing the development. Or something.
        
        Ok, you’re right in that a complete lack of controversy is impossible, because there are always trolls, cranks, conspiracy theorists, etc. But is it possible to reach a consensus among all sufficiently well-informed sufficiently intelligent people? Where “sufficiently” is not a too high threshold?
        TheOtherDave 20 May 2012 19:08 UTC
        0 points
        Parent
        There probably exists (hypothetically) some plan such that it wouldn’t seem unreasonable to me to declare anyone who doesn’t endorse that plan either insufficiently well-informed or insufficiently intelligent.
        
        In fact, there probably exist several such plans, many of which would have results I would subsequently regret, and some of which do not.
        gRR 20 May 2012 19:43 UTC
        0 points
        Parent
        I think seeking and refining such plans would be a worthy goal. For one thing, it would make LW discussions more constructive. Currently, as far as I can tell, CEV is very broadly defined, and its critics usually point at some feature and cast (legitimate) doubt on it. Very soon, CEV is apparently full of holes and one may wonder why is it not thrown away already. But they may be not real holes, just places where we do not know enough yet. If these points are identified and stated in a form of questions of fact, which can be answered by future research, then a global plan, in the form of a decision tree, could be made and reasoned about. That would be a definite progress, I think.
        TheOtherDave 20 May 2012 20:07 UTC
        0 points
        Parent
        Agreed that an actual concrete plan would be a valuable thing, for the reasons you list among others.
        Desrtopa 19 May 2012 22:36 UTC
        0 points
        Parent
        
        I believe there exist (extrapolated) wishes universal for humans (meaning, true for literally everyone). Among these wishes, I think there is the wish for humans to continue existing.
        
        Does the existence of the Voluntary Human Extinction Movement affect your belief in this proposition?
        gRR 19 May 2012 22:40 UTC
        0 points
        Parent
        
        VHEMT supports human extinction primarily because, in the group’s view, it would prevent environmental degradation. The group states that a decrease in the human population would prevent a significant amount of man-made human suffering.
        
        Obviously, human extinction is not their terminal value.
        Desrtopa 19 May 2012 22:44 UTC
        0 points
        Parent
        Or at least, not officially. I have known at least one person who professed to desire that the human race go extinct because he thought the universe as a whole would simply be better if humans did not exist. It’s possible that he was stating such an extreme position for shock value (he did have a tendency to display some fairly pronounced antisocial tendencies,) and that he had other values that conflicted with this position on some level. But considering the diversity of viewpoints and values I’ve observed people to hold, I would bet quite heavily against nobody in the world actually desiring the end of human existence.