Eliezer Yudkowsky comments on Two questions about CEV that worry me

Eliezer Yudkowsky 23 Dec 2010 21:32 UTC
−1 points

2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition?

I am honestly not sure what to say to people who ask this question with genuine incredulity, besides (1) “Don’t be evil” and (2) “If you think clever arguments exist that would just compel me to be evil, see rule 1.”
What links here?
- Vladimir_Nesov's comment on The Urgent Meta-Ethics of Friendly Artificial Intelligence by lukeprog (2 Feb 2011 0:06 UTC; 7 points)
- cousin_it 23 Dec 2010 23:22 UTC
  27 points
  Parent
  I don’t understand your answer. Let’s try again. If “something like CEV” is what you want to implement, then an AI pointed at your volition will derive and implement CEV, so you don’t need to specify it in detail beforehand. If CEV isn’t what you want to implement, then why are you implementing it? Assume all your altruistic considerations, etc., are already folded into the definition of “you want”—just like a whole lot of other stuff-to-be-inferred is folded into the definition of CEV.
  
  ETA: your “don’t be evil” looks like a confusion of levels to me. If you don’t want to be evil, there’s already a term for that in your volition—no need to add any extra precautions.
  What links here?
  - XiXiDu's comment on Two questions about CEV that worry me by cousin_it (24 Dec 2010 14:55 UTC; 5 points)
  - wedrifid 24 Dec 2010 2:47 UTC
    15 points
    Parent
    
    If CEV isn’t what you want to implement, then why are you implementing it?
    
    The sane answer is that it solves a cooperation problem. ie. People will not kill you for trying it and may instead donate money. As we can see here this is not the position that Eliezer seems to take. He goes for the ‘signal naive morality via incomprehension’ approach.
    - XiXiDu 24 Dec 2010 13:05 UTC
      3 points
      Parent
      
      People will not kill you for trying it and may instead donate money.
      
      I do not think this would work. Take the viewpoint of a government. What does CEV do? It does deprive them of some amount of ultimate power. The only chance I see to implement CEV using an AI going FOOM is either secretly or due to the fact that nobody takes you serious enough. Both routes are rather unlikely. Military analysis of LW seems to be happening right now. And if no huge unforeseeable step towards AGI happens, it will move forward gradually enough for governments (or other groups), who already investigate LW and the SIAI, to notice and take measures to disable anyone trying to implement CEV.
      
      The problem is that once CEV becomes feasible, governments will consider anyone working on it as an attempted coup. Regardless of the fact that the people involved might not perceive it to be politics, working on a CEV is indeed an highly political activity. At least this will be the viewpoint of many who do not understand CEV or oppose it for different reasons.
      - wedrifid 24 Dec 2010 13:31 UTC
        0 points
        Parent
        
        I do not think this would work.
        
        Pardon me. To be more technically precise: “Implementing an AI that extrapolates the volition of something other or broader than yourself may facilitate cooperation. It would reduce the chance that people will kill you for the attempt and increase the chance of receiving support.”
        XiXiDu 24 Dec 2010 14:55 UTC
        5 points
        Parent
        Aha, I see. My mistake, ignoring the larger context.
        
        Seen this? Anyway, I feel that it is really hard to tackle this topic because of its vagueness. As multifoliaterose implied here, at the moment the task to recognize humans as distinguished beings already seems to me too broad a problem to tackle directly. Talking about implementing CEV indirectly, by derivation from Yudkowsky’s mind, versus specifying the details beforehand, seems to be fun to argue but ultimately ineffective at this point. In other words, an organisation that claims to solve some meta problem by means of CEV is only slightly different from one proclaiming to make use of magic. I’d be much more comfortable to donate to a decision theory workshop for example.
        
        I digress, but I thought I should clarify some of my intention for always getting into discussions involving the SIAI. It is highly interesting, sociological I suppose. On the one hand people take this topic very serious, the most important topic indeed, yet seem to be very relaxed about the only organisation involved in shaping the universe. There is simply no talk about more transparency to prove the effectiveness of the SIAI and its objectives. Further, without transparency you simply cannot conclude that because someone writes a lot of ethical correct articles and papers that that output is reflective of their true goals. Also people don’t seem to be worried very much about all the vagueness involved here, as this post proves once again. Where is the progress that would justify further donations? As I said, I digress. Excuse me but this topic is the most fascinating issue for me on LW.
        
        Back to your comment, it makes sense. Surely if you tell people to also take care of what they want, they’ll be less opposed than if you told them that you’ll just do what you want because you want to make them happy. Yet there will be those who don’t want you to do it, regardless of wanting to make them happy. There will be those who only want you to implement their personal volition. So whenever CEV will be taken serious it will become really hard to implement it, because people will get mad about it, really mad. People already oppose small-impact policies just because it’s the other party that is trying to implement them. What will they do if one person or organisation tries to implement a policy for the whole universe and the rest of infinity?
        paulfchristiano 28 Dec 2010 23:30 UTC
        7 points
        Parent
        
        There is simply no demand for more transparency to prove the effectiveness of the SIAI and its objectives.
        
        Are you sure? I imagine there are many people interested in evaluating the effectiveness of the SIAI. At least I am, and from the small number of real discussions I have had about the SIAI’s project I extrapolate that uncertainty is the main inhibitor of enthusiasm (although of course if the uncertainty was removed this may create more fundamental problems).
        TheOtherDave 28 Dec 2010 23:48 UTC
        6 points
        Parent
        The counterargument I’ve read in earlier (“unreal”) discussions on the subject is, roughly, that people who claim their support for SIAI is contingent on additional facts, analyses, or whatever are simply wrong… that whatever additional data is provided along those lines won’t actually convince them, it will merely cause them to ask for different data.
        Nick_Tarleton 29 Dec 2010 17:40 UTC
        2 points
        Parent
        I assume you’re referring to Is That Your True Rejection?.
        TheOtherDave 29 Dec 2010 20:16 UTC
        0 points
        Parent
        (nods) I think so, yes.
        Vaniver 29 Dec 2010 9:51 UTC
        1 point
        Parent
        This strikes me as a difficult thing to know, and the motives that lead to assuming it are not particularly pleasant.
        TheOtherDave 29 Dec 2010 13:09 UTC
        1 point
        Parent
        While the unpleasant readings are certainly readily available, more neutral readings are available as well.
        
        By way of analogy: it’s a common relationship trope that suitors who insist on proof of my love and fidelity won’t be satisfied with any proofs I can provide. OTOH, it’s also a common trope that suitors who insist that I should trust in their love and fidelity without evidence don’t have them to offer in the first place.
        
        If people who ask me a certain type of question aren’t satisfied with the answer I have, I can either look for different answers or for different people; which strategy I pick depends on the specifics of the situation. If I want to infer something about someone else based on their choice of strategy I similarly have to look into the specifics of the situation. IME there is no royal road to the right answer here.
        Vaniver 29 Dec 2010 14:19 UTC
        −1 points
        Parent
        
        While the unpleasant readings are certainly readily available, more neutral readings are available as well.
        
        It is a shame that understatement is so common it’s hard to be precise quickly; I meant to include neutral readings in “not particularly pleasant.”
        Expand this thread
        TheOtherDave 29 Dec 2010 14:25 UTC
        0 points
        Parent
        Huh. Interesting.
        
        Yes, absolutely, I read your comment as understatement… but if you meant it literally, I’m curious as to the whole context of your comment.
        
        For example, what do you mean to contrast that counterargument with? That is: what’s an example of an argument for which the motives for assuming it are actively pleasant? What follows from their pleasantness?
        Vaniver 29 Dec 2010 15:07 UTC
        1 point
        Parent
        
        That is: what’s an example of an argument for which the motives for assuming it are actively pleasant? What follows from their pleasantness?
        
        A policy like “assume good faith” strikes me as coming from not unpleasant motives. What follows is that you should attribute a higher probability of good faith to someone who assumes good faith. If someone assumes that other people cannot be convinced by evidence, my knowledge of projection suggests that should increase my probability estimate that they cannot be convinced by evidence.
        
        That doesn’t entirely answer your question- since I talked about policies and you’re talking about motives- but it should suggest an answer. Policies and statements represent a distribution of sets of possible motives, and so while the motives themselves unambiguously tell you how to respond the policies just suggest good guesses. But, in general, pleasantness begets pleasantness and unpleasantness begets unpleasantness.
        wedrifid 29 Dec 2010 11:33 UTC
        0 points
        Parent
        
        This strikes me as a difficult thing to know,
        
        It strikes me as a tendency that can either be observed as a trend or noted to be absent.
        
        and the motives that lead to assuming it are not particularly pleasant.
        
        This strikes meas a difficult thing to know. And distastefully ironic.
        Vaniver 29 Dec 2010 14:17 UTC
        −1 points
        Parent
        
        This strikes meas a difficult thing to know.
        
        There are a large number of possible motives that could lead to assuming that the people in question are simply wrong. None of them are particularly pleasant (but not all of them are unpleasant). I don’t need to know which motivates them in order to make the statement I made. However, the statement as paraphrased by TheOtherDave is much more specific; hence the difficulty.
        
        As a more general comment, I strongly approve of people kicking tires, even if they’re mine. When I see someone who doesn’t have similar feelings, I can’t help but wonder why. Like with my earlier comment, not all the reasons are unpleasant. But some are.
        XiXiDu 29 Dec 2010 9:22 UTC
        3 points
        Parent
        Please read this comment. It further explains why I actually believe that transparency is important to prove the effectiveness of the SIAI. I also edited my comment above. I seem to have messed up on correcting some grammatical mistakes. It originally said, there is simply no talk about more transparency....
        XiXiDu 29 Dec 2010 9:15 UTC
        0 points
        Parent
        I didn’t intend to write that. I don’t know what happened there.
        timtyler 28 Dec 2010 22:36 UTC
        1 point
        Parent
        
        On the one hand people take this topic very serious, the most important topic indeed, yet seem to be very relaxed about the only organisation involved in shaping the universe.
        
        “The only organisation involved in shaping the universe”?!? WTF? These folks have precious little in terms of resources. They apparently haven’t even started coding yet. You yourself assign them a miniscule chance of succeeding at their project. How could they possibly be the “the only organisation involved in shaping the universe”?!?
        paulfchristiano 29 Dec 2010 0:49 UTC
        6 points
        Parent
        
        They apparently haven’t even started coding yet.
        
        Really? Even if they were working on a merely difficult problem, you would expect coding to be the very last step of the project. People don’t solve hard algorithmic problems by writing some code and seeing what happens. I wouldn’t expect an organization working optimally on AGI to write any code until after making some remarkable progress on the problem.
        
        How could they possibly be the “the only organisation involved in shaping the universe”?!?
        
        There could easily be no organization at all trying to deliberately control the long-term future of the human race; we’d just get whatever we happened to stumble into. You are certainly correct that there are many, many organizations which are involved in shaping our future; they just rarely think about the really long-term effects (I think this is what XiXiDu meant).
        timtyler 29 Dec 2010 1:05 UTC
        0 points
        Parent
        
        Really? Even if they were working on a merely difficult problem, you would expect coding to be the very last step of the project. People don’t solve hard algorithmic problems by writing some code and seeing what happens. I wouldn’t expect an organization working optimally on AGI to write any code until after making some remarkable progress on the problem.
        
        IMO, there’s a pretty good chance of an existing organisation being involved with getting there first. The main problem with not having any working products is that it is challenging to accumulate resources—which are needed to hire researchers and programmers—which you need to fuel your self-improvement cycle.
        
        Google, hedge funds, and security agencies have their self-improvement cycle already rolling—they are evidently getting better and better as time passes. That results in accumulated resources, which can be used to drive further development.
        
        If you were a search company who aimed directly at a human-level search agent, you are now up against a gorilla with an android army who already has most of the pieces of the puzzle. Waiting until you have done all the relevant R+D is just not how software development works. You get up and running as fast as you can—or else someone else does that first—and eats your lunch.
        timtyler 29 Dec 2010 10:41 UTC
        0 points
        Parent
        
        So whenever CEV will be taken serious it will become really hard to implement it, because people will get mad about it, really mad. People already oppose small-impact policies just because it’s the other party that is trying to implement them. What will they do if one person or organisation tries to implement a policy for the whole universe and the rest of infinity?
        
        Right—but this seems as though it isn’t how things are likely to go down. CEV is a pie-in-the-sky wishlist—not an engineering proposal. Those attempting to directly implement things like it seem practically guaranteed to get to the plate last. For example Ben’s related proposal involved “non-invasive” scanning of the human brain. That just isn’t technology we will get before we have sophisticated machine intelligence, I figure. So: either the proposals will be adjusted so they are more practical en route—or else, the proponents will just fail.
        
        Most likely there will be an extended stage where people tell the machines what to do—much as Asimov suggested. The machines will “extrapolate” in much the same way that Google Instant “extrapolates”—and the human wishes will “cohere”—to the extent that large-scale measures in society encourage cooperation.
        What links here?
        timtyler's comment on Which are the useful areas of AI study? by PeterisP (17 Jan 2011 21:39 UTC; 0 points)
        timtyler 28 Dec 2010 22:45 UTC
        −1 points
        Parent
        
        There is simply no demand for more transparency to prove the effectiveness of the SIAI and its objectives.
        
        FWIW, I mostly gave up on them a while back. As a spectator, I mostly look on, grimacing, while wondering whether there are any salvage opportunities.
        XiXiDu 29 Dec 2010 9:19 UTC
        0 points
        Parent
        Here is the original comment. It wasn’t my intention to say that, it originally said there is simply no talk about more transparency.… I must have messed up on correcting some mistakes.
        timtyler 29 Dec 2010 10:10 UTC
        −1 points
        Parent
        I just copied-and-pasted verbatim. However the current edit does seem to make more sense.
      - timtyler 28 Dec 2010 16:26 UTC
        −2 points
        Parent
        That is more-or-less my own analysis. Notoriously:
        
        Politics is the gentle art of getting votes from the poor and campaign funds from the rich by promising to protect each from the other.
        
        CEV may get some the votes from the poor—but offers precious little to the rich. Since those are the folk who are running the whole show, it is hard to see how they will approve it. They won’t approve it—there isn’t anything in it for them. So, I figure, the plan is probably pretty screwed—the hopeful plan of a bunch of criminal (their machine has no respect for the law!) and terrorist (if they can make it stick!) outlaws—who dream of overthrowing their own government.
        What links here?
        timtyler's comment on Which are the useful areas of AI study? by PeterisP (17 Jan 2011 21:39 UTC; 0 points)
    - cousin_it 24 Dec 2010 9:13 UTC
      3 points
      Parent
      Awesome comment, thanks. I’m going to think wishfully and take that as SIAI’s answer.
    - timtyler 28 Dec 2010 23:38 UTC
      −3 points
      Parent
      
      The sane answer is that it solves a cooperation problem.
      
      Reciprocal altruism sometimes sends a relatively weak signal—it says that you will cooperate so long as the “shadow of the future” is not too ominous.
      
      Invoking “good” and “evil” signals more that you believe in moral absolutes: the forces of good and evil.
      
      On the one hand, that is a stronger signalling technique—it attempts to signal that you won’t defect—no matter what!
      
      On the other hand, it makes you look a bit as though you are crazy, don’t understand rationality or game theory—and this can make your behaviour harder to model.
      
      As with most signalling, it should be costly to be credible. Alas, practically anyone can rattle on about good and evil. I am not convinced it is very effective overall.
  - Kutta 24 Dec 2010 9:39 UTC
    0 points
    Parent
    
    then an AI pointed at your volition will derive and implement CEV
    
    Also, from OP:
    
    Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of “intelligence”, and what other such things are there?
    
    If what you want is to have something pointed at your volition then you first have to design the AI that points to it rather than something else. This whole CEV stuff was an attempt at answering the “design an AI that points to it” question, and the crucial consideration that led to it was that there is no magically intelligent system that would automatically converge to what we’d prefer. Of course, there remains the question of balance between AI structure determined by what I want and AI structure determined by what the AI thinks I want. The realization of FAI is that you cannot eliminate the first item from the balance and get an acceptable result. It is better to ask “How could I best solve the FAI problem using my brain rather than something else?” than to ask “Could I use something else than my brain to solve the FAI problem?”.
    
    If CEV isn’t what you want to implement, then why are you implementing it?
    
    If a CEV isn’t what I want to implement, it is still good to implement CEV because it’ll find out what I want to implement—plus more stuff that I would agree to implementing but not think of in the first place.
  - Tyrrell_McAllister 24 Dec 2010 4:53 UTC
    −3 points
    Parent
    
    ETA: your “don’t be evil” looks like a confusion of levels to me. If you don’t want to be evil, there’s already a term for that in your volition—no need to add any extra precautions.
    
    Eliezer didn’t realize that you meant his own personal CEV, rather than his current incoherent, unextrapolated volition.
- DanArmak 23 Dec 2010 21:47 UTC
  14 points
  Parent
  You have a personal definition for evil, like everyone else. Many people have definitions of good that include things you see as evil; some of your goals are in conflict. Taking that into account, how can you precommit to implementing the CEV of the whole of humanity when you don’t even know for sure what that CEV will evaluate to?
  
  To put this another way: why not extrapolate from you, and maybe from a small group of diverse individuals whom you trust, to get the group’s CEV? Why take the CEV of all humanity? Inasmuch as these two CEVs differ, why would you not prefer your own CEV, since it more closely reflects your personal definitions of good and evil?
  
  I don’t see how this can be consistent unless you start out with “implementing humanity’s CEV” as a toplevel goal, and any divergence from that is slightly evil.
- TheOtherDave 23 Dec 2010 22:32 UTC
  5 points
  Parent
  One thing you could say that might help is if you were clearer about when you consider it evil to ignore the volition of an intelligence, since it’s clear from your writing that sometimes you don’t.
  
  For example, “don’t be evil” clearly isn’t enough of an argument to convince you to build an AI that fulfills Babykiller or Pebblesorter or SHFP volition, for example, should we encounter any… although at least some of those would indisputably be intelligences.
  
  Given that, it might reassure people to explicitly clarify why “don’t be evil” is enough of an argument to convince you to build an AI that fulfills the volition of all humans, rather than (let’s say) the most easily-jointly-satisfied 98% of humanity, or some other threshold for inclusion.
  
  If this has already been explained somewhere, a pointer would be handy. I have not read the whole site, but thus far everything I’ve seen to this effect seems to boil down to assuming that there exists a single volition V such that each individual human would prefer V upon reflection to every other possible option, or at least a volition that approximates that state well enough that we can ignore the dis-satisfied minority.
  
  If that assumption is true, the answer to the question you quote is “Because they’d prefer the results of doing so,” and evil doesn’t enter into it.
  
  If that assumption is false, I’m not sure how “don’t be evil” helps.