Costanza comments on The Friendly AI Game

Costanza 16 Mar 2011 4:04 UTC
0 points
Serious question:

Is this addressed to the coherent extrapolated volition of humankind, as expressed by SIAI? I’m under the impression it is not.
- bentarm 16 Mar 2011 12:14 UTC
  4 points
  Parent
  As far as I can tell, it’s literally impossible for me to prefer an AI that would implement CEV over one that would implement CEV—if what I want is actually CEV then the AI will figure this out while extrapolating my vision and implement that. On the other hand, it’s clearly possible for me to prefer CEV to CEV.
  - ArisKatsaris 16 Mar 2011 13:10 UTC
    3 points
    Parent
    How likely do you consider it for CEV to be the first superintelligent AI to be created, compared to CEV?
    
    Unless you’re a top AI researcher working solo to create your own AI, you may have to support CEV as the best compromise possible under the circumstances. It’ll probably be far closer to CEV than CEV or CEV would be.
    - [deleted] 20 Mar 2011 15:51 UTC
      0 points
      Parent
      However, CEV<$randomAIresearcher> is probably even closer to mine than CEV is… CEV is likely to be very, very far from the preferences of most decent people...
    - DanArmak 16 Mar 2011 20:32 UTC
      0 points
      Parent
      A far more likely compromise would be CEV.
      
      The people who get to choose the utility function of the first AI have the option of ignoring the desires of the rest of humanity. I think they are likely to do so, because:
      
      They know each other, and so can predict each other’s CEV better than that of the whole of humanity
      They can explicitly trade utility with each other and encode compromises into the utility function (so that it won’t be a pure CEV)
      The fact they were in this project together indicates a certain commonality of interests and ideas, and may serve to exclude memes that AI-builders would likely consider dangerous (e.g., fundamentalist religion)
      They have had the opportunity of excluding people they don’t like from participating in the project to begin with
      
      Also, Putin and Ahmadinejad are much more likely than the average human to influence the first AI’s utility function, simply because they have a lot of money and power.
      - ArisKatsaris 16 Mar 2011 21:44 UTC
        2 points
        Parent
        I disagree with all of these four claims
        
        They know each other, and so can predict each other’s CEV better than that of the whole of humanity
        
        I believe the idea is that the AI will need to calculate the CEV, not the programmers (or it’s not CEV). And the AI will have a whole lot more statistical data to calculate the CEV of humanity than the CEV of individual contributors. Unless we’re talking uploaded personalities, which is a whole different discussion.
        
        They can explicitly trade utility with each other and encode compromises into the utility function (so that it won’t be a pure CEV)
        
        So you want hard-coded compromises that opposes and overrides what these people would collectively prefer to do if they were more intelligent, more competent and more self-aware?
        
        I don’t think that’s a good idea at all.
        
        The fact they were in this project together indicates a certain commonality of interests and ideas, and may serve to exclude memes that AI-builders would likely consider dangerous (e.g., fundamentalist religion)
        
        Do you believe that fundamentalist religion would exist if fundamentalist religionists believed that their religion was false, and were also completely self-aware? Why do you think a CEV (which essentially means what people would want if they were as intelligent as the AI) would support a dangerous meme?
        
        They have had the opportunity of excluding people they don’t like from participating in the project to begin with
        
        I don’t think that the 9999 first contributors get to vote on whether they’ll accept a donation from the 10,000th one. And unless you believe these 10,000 people can create and defend their own country BEFORE the AI gets created, I’d urge not being vocal about them excluding everyone else, when developments in AI become close enough that the whole world starts paying serious attention.
        
        Also, Putin and Ahmadinejad are much more likely than the average human to influence the first AI’s utility function, simply because they have a lot of money and power.
        
        That’s why CEV is far better than CEV.
        DanArmak 18 Mar 2011 13:21 UTC
        2 points
        Parent
        
        I believe the idea is that the AI will need to calculate the CEV, not the programmers (or it’s not CEV). And the AI will have a whole lot more statistical data to calculate the CEV of humanity than the CEV of individual contributors.
        
        The programmers want the AI to calculate CEV because they expect CEV to be something they will like. We can’t calculate CEV ourselves, but that doesn’t mean we don’t know any of CEV’s (expected) properties.
        
        However, we might be wrong about what CEV will turn out to be like, and we may come to regret pre-committing to CEV. That’s why I think we should prefer CEV, because we can predict it better.
        
        So you want hard-coded compromises that opposes and overrides what these people would collectively prefer to do if they were more intelligent, more competent and more self-aware?
        
        What I meant was that they might oppose and override some of the input to the CEV from the rest of humanity.
        
        However, it might also be a good idea to override some of your own CEV results, because we don’t know in advance what the CEV will be. We define the desired result as “the best possible extrapolation”, but our implementation may produce something different. It’s very dangerous to precommit the whole future universe to something you don’t yet know at the moment of precommitment (my point number 1). So, you’d want to include overrides about things you’re certain should not be in the CEV.
        
        Do you believe that fundamentalist religion would exist if fundamentalist religionists believed that their religion was false, and were also completely self-aware?
        
        This is a misleading question.
        
        If you are certain that the CEV will decide against fundamentalist religion, you should not oppose precommitting the AI to oppose fundamentalist religion, because you’re certain this won’t change the outcome. If you don’t want to include this modification to the AI, that means you 1) accept there is a possibility of religion being part of the CEV, and 2) want to precommit to living with that religion if it is part of the CEV.
        
        Why do you think a CEV (which essentially means what people would want if they were as intelligent as the AI) would support a dangerous meme?
        
        Maybe intelligent people like dangerous memes. I don’t know, because I’m not yet that intelligent. I do know though that having high intelligence doesn’t imply anything about goals or morals.
        
        Broadly, this question is similar to “why do you think this brilliant AI-genie might misinterpret our request to alleviate world hunger?”
        
        I don’t think that the 9999 first contributors get to vote on whether they’ll accept a donation from the 10,000th one.
        
        Why not? If they’re controlling the project at that point, they can make that decision.
        
        And unless you believe these 10,000 people can create and defend their own country BEFORE the AI gets created, I’d urge not being vocal about them excluding everyone else, when developments in AI become close enough that the whole world starts paying serious attention.
        
        I’m not being vocal about any actual group I may know of that is working on AI :-)
        
        I might still want to be vocal about my approach, and might want any competing groups to adopt it. I don’t have good probabilitiy estimates on this, but it might be the case that I would prefer CEV to CEV.
        
        That’s why CEV is far better than CEV.
        
        Why are you certain of this? At the very least it depends on who the person contributing money is.
        
        “Humanity” includes a huge variety of different people. Depending on the CEV it may also include an even wider variety of people who lived in the past and counterfactuals who might live in the future. And the CEV, as far as I know, is vastly underspecified right now—we don’t even have a good conceptual test that would tell us if a given scenario is a probable outcome of CEV, let alone a generative way to calculate that outcome.
        
        Saying that the CEV “will best please everyone” is just handwaving this aside. Precommitting the whole future lightcone to the result of a process we don’t know in advance is very dangerous, and very scary. It might be the best possible compromise between all humans, but it is not the case that all humans have equal input into the behavior of the first AI. I have not seen any good arguments claiming that implementing CEV is a better strategy than just trying to be to build the first AI before anyone else and then making it implement a narrow CEV.
        
        Suppose that the first AI is fully general, and can do anything you ask of it. What reason is there for its builders, whoever they are, to ask to it to implement CEV rather than CEV?
        TheOtherDave 18 Mar 2011 15:18 UTC
        4 points
        Parent
        In an idealized form, I agree with you.
        
        That is, if I really take the CEV idea seriously as proposed, there simply is no way I can prefer CEV(me + X) to CEV(me)… if it turns out that I would, if I knew enough and thought about it carefully enough and “grew” enough and etc., care about other people’s preferences (either in and of themselves, as in “I hadn’t thought of that but now that you point it out I want that too”, or by reference to their owners, as in “I don’t care about that but if you do then fine let’s have that too,” for which distinction I bet there’s a philosophical term of art that I don’t know), then the CEV-extraction process will go ahead and optimize for those preferences as well, even if I don’t actually know what they are, or currently care about them; even if I currently think they are a horrible evil bad no-good idea. (I might be horrified by that result, but presumably I should endorse it anyway.)
        
        This works precisely because the CEV-extraction process as defined depends on an enormous amount of currently-unavailable data in the course of working out the target’s “volition” given its current desires, including entirely counterfactual data about what the target would want if exposed to various idealized and underspecified learning/”growing” environments.
        
        That said, the minute we start talking instead about some actual realizable thing in the world, some approximation of CEV-me computable by a not-yet-godlike intelligence, it stops being quite so clear that all of the above is true.
        
        An approximate-CEV extractor might find things in your brain that I would endorse if I knew about them (given sufficient time and opportunity to discuss it with you and “grow” and so forth) but that it wasn’t able to actually compute based on just my brain as a target, in which case pointing it at both of us might be better (in my own terms!) than pointing it at just me.
        
        It comes down to a question of how much we trust the seed AI that’s doing the extraction to actually solve the problem.
        
        It’s also perhaps worth asking what happens if I build the CEV-extracting seed AI and point it at my target community and it comes back with “I don’t have enough capability to compute CEV for that community. I will have to increase my capabilities in order to solve that problem.”
- wedrifid 16 Mar 2011 5:31 UTC
  3 points
  Parent
  
  Is this addressed to the coherent extrapolated volition of humankind, as expressed by SIAI?
  
  Yes. The CEV really could suck. There isn’t a good reason to assume that particular preference system is a good one.
  - HughRistik 16 Mar 2011 23:52 UTC
    1 point
    Parent
    How about CEV?
    - wedrifid 25 Mar 2011 1:31 UTC
      2 points
      Parent
      
      How about CEV?
      
      Yes, that would be preferable. But only because I assert a correlation between the attributes that produce what we measure as g and with personality traits and actual underlying preferences. A superintelligence extrapolating on ’s preferences would, in fact, produce a different outcome than one extrapolating on .
      
      ArisKataris’s accusation that you don’t understand CEV means misses the mark. You can understand CEV and still not conclude that CEV is necessarily a good thing.
    - Dorikka 17 Mar 2011 1:02 UTC
      0 points
      Parent
      And, uh, how do you define that?
      - HughRistik 18 Mar 2011 7:32 UTC
        2 points
        Parent
        Something like g, perhaps?
    - ArisKatsaris 17 Mar 2011 23:06 UTC
      −1 points
      Parent
      What would that accomplish? It’s the intelligence of the AI that will be getting used, not the intelligence of the people in question.
      
      I’m getting the impression that some people don’t understand what CEV even means. It’s not about the programmers predicting a course of action, it’s not about the AI using people’s current choice, it’s about the AI using the extrapolated volition—what people would choose if they were as smart and knowledgeable as the AI.
  - ArisKatsaris 16 Mar 2011 23:21 UTC
    0 points
    Parent
    Good one according to which criteria? CEV is perfect according to humankind’s criteria if humankind were more intelligent and more sane than it currently is.
    - wedrifid 25 Mar 2011 1:24 UTC
      2 points
      Parent
      
      Good one according to which criteria?
      
      Mine. (This is tautological.) Anything else that is kind of similar to mine would be acceptable.
      
      CEV is perfect according to humankind’s criteria if humankind were more intelligent and more sane than it currently is.
      
      Which is fine if ‘sane’ is defined as ‘more like what I would consider ‘sane’. But that’s because sane has all sorts of loaded connotations with respect to actual preferences—and “humanity’s” may very well not qualify as not-insane.