Johnicholas comments on How to get that Friendly Singularity: a minority view

Johnicholas 10 Oct 2009 16:31 UTC
2 points
This is reasonable—but what is odd to me is the world-conquering part. The justifications that I’ve seen for creating a singleton soon (e.g. either we have a singleton or we have unfriendly superintelligence) seem insufficient.

How certain are you that there is no third alternative? Suppose that you created an entity which is superhuman in some respects (a task that has already been done already done many times over) and asked it to find third alternatives. Wouldn’t this be a safer, saner, more moral and more feasible task than conquering the world and installing a singleton?

Note that “entity” isn’t necessarily a pure software agent—it could be a computer/human team, or even an organization consisting only of humans interacting in particular ways—both of these kinds of entity already exist, and are more capable than humans in some respects.
- DanArmak 10 Oct 2009 16:48 UTC
  7 points
  Parent
  The purpose of of installing a singleton is to prevent anyone, anywhere, ever, doing something I disapprove of. (I can give the usual examples of massive simulated torture, but a truly superhuman intelligence could be much more inventively and unexpectedly unpleasant than anything we’re likely to imagine.) Even if an unfriendly superintelligence isn’t a certainty (and there are arguments that it is) - why take the huge risk?
  
  Now, can there be anything other than a singleton which would give me comparable certainty? I would need to predict what all the rest of the universe will do, and to be able to stop anything I didn’t like, and to predict things sufficiently in advance to stop them in time. (As a minimum requirement, if a place is X light seconds away, I need to predict based on X-second-old information, and it takes another X seconds to intervene.)
  
  This includes stopping anyone from gaining any form of power that might possibly defy me in the future. And it must be effective for the whole universe, even if Earth-descended technology of some sort spreads out at nearly the speed of light, because I don’t know what might come back at me from out there.
  
  Suppose there was another way to accomplish all this that wasn’t an outright singleton, i.e. didn’t rewrite the effective laws or physics or replaced the whole universe with a controlled simulation. What possible advantage could it have over a singleton?
  - Johnicholas 10 Oct 2009 20:47 UTC
    2 points
    Parent
    This sounds to me like an irresistible force/immovable object problem—two people who are focused on different (large or intense) aspects of a problem disagree—but the real solution is to reframe the problem as a balance of considerations.
    
    As I understand it, on the one hand, there are the arguments (e.g. Eliezer Yudkowksky’s document “Creating Friendly AI”) that technological progress is mostly not stoppable and enthusiasts and tinkerers are accidentally going to build recursively self-improving entities that probably do not share your values. On the other hand, striving to conquer the world and impose one’s values by force is (I hope we agree) a reprehensible thing to do.
    
    If, for example, there was a natural law that all superintelligences must necessarily have essentially the same ethical system, then that would tip the balance against striving to conquer the world. In this hypothetical world, enthusiasts and tinkerers may succeed, but they wouldn’t do any harm. John C. Wright posits this in his Golden Transcendence books and EY thought this was the case earlier in his life.
    
    If there was a natural law that there’s some sort of upper bound on the rate of recursive self-improvement, and the world as a whole (and the world economy in particular) is already at the maximum rate, then that would also tip the balance against striving to conquer the world. In this hypothetical world, the world as a whole will continue to be more powerful than the tinkerers, the enthusiasts, and you. Robin Hanson might believe some variant of this scenario.
    - DanArmak 10 Oct 2009 20:58 UTC
      4 points
      Parent
      
      On the other hand, striving to conquer the world and impose one’s values by force is (I hope we agree) a reprehensible thing to do.
      
      Not at all. It’s the only truly valuable thing to do. If I thought I had even a tiny chance of succeeding, or if I had any concrete plan, I would definitely try to build a singleton that would conquer the world.
      
      I hope that the values I would impose in such a case are sufficiently similar to yours, and to (almost) every other human’s, that the disadvantage of being ruled by someone else would be balanced for you by the safety from ever being ruled by someone you really wouldn’t like.
      
      A significant part of the past discussion here and in other singularity-related forums has been about verifying that our values are in fact compatible in this way. This is a necessary condition for community efforts.
      
      If, for example, there was a natural law that all superintelligences must necessarily have essentially the same ethical system, then that would tip the balance against striving to conquer the world.
      
      But there isn’t, so why bring it up? Unless you have a reason to think some other condition holds that changes the balance in some way. Saying some condition might hold isn’t enough. And if some such condition does hold, we’ll encounter it anyway while trying to conquer the world, so no harm done :-)
      
      If there was a natural law that there’s some sort of upper bound on the rate of recursive self-improvement, and the world as a whole (and the world economy in particular) is already at the maximum rate, then that would also tip the balance against striving to conquer the world. In this hypothetical world, the world as a whole will continue to be more powerful than the tinkerers, the enthusiasts, and you. Robin Hanson might believe some variant of this scenario.
      
      I’m quite certain that we’re nowhere near such a hypothetical limit. Even if we are, this limit would have to be more or less exponential, and exponential curves with the right coefficients have a way of fooming that tends to surprise people. Where does Robin talk about this?
      - Vladimir_Nesov 10 Oct 2009 21:11 UTC
        1 point
        Parent
        
        A significant part of the past discussion here and in other singularity-related forums has been about verifying that our values are in fact compatible in this way. This is a necessary condition for community efforts.
        
        Not so much. Multiple FAIs of different values (cooperating in one world) are equivalent to one FAI of amalgamated values, so a community effort can be predicated on everyone getting their share (and, of course, that includes altruistic aspects of each person’s preference). See also Bayesians vs. Barbarians for an idea of when it would make sense to do something CEV-ish without an explicitly enforced contract.
        wedrifid 11 Oct 2009 15:24 UTC
        0 points
        Parent
        You describe one form of compatibility.
        Vladimir_Nesov 11 Oct 2009 15:38 UTC
        0 points
        Parent
        How so? I don’t place restrictions on values, more than what’s obvious in normal human interaction.
    - Vladimir_Nesov 10 Oct 2009 21:04 UTC
      3 points
      Parent
      
      On the other hand, striving to conquer the world and impose one’s values by force is (I hope we agree) a reprehensible thing to do.
      
      No it’s not. We are talking about “my values”, and so if I believe it’s improper to impose them using procedure X, then part of “my values” is that procedure X shouldn’t have been performed, and so using procedure X to impose my values is unacceptable (not a valid subgoal of “imposing my values”). Whatever means are used to “impose my values” must be good according to my values. Thus, not implementing the dark aspects of “conquering the world”, such as “by force”, is part of “conquering the world” as instrumental action for achieving one’s goals. You create a singleton that chooses to be nice to the conquered.
      
      There is also a perhaps much more important aspect of protecting from mistakes: even if I was the only person in the world, and not in immediate danger from anything, it still would make sense to create a “singleton” that governs my own actions. Thus the intuition for CEV, where you specify an everyone’s singleton, not particularly preferring given people.
      - Johnicholas 10 Oct 2009 22:39 UTC
        3 points
        Parent
        Possibly you’re using technical jargon here. When non-LessWrong-reading humans talk about one person imposing their values on everyone else, they would generally consider it immoral. Are we in agreement here?
        
        Now, I could understand your starement (“No it’s not”) in either of two ways: Either you believe they’re mistaken about whether the action is immoral, or you are using a different (technical jargon) sense of the words involved. Which is it?
        
        My guess is that you’re using a technical sense of “values”, which includes something like the various clauses enumerated in EY’s description of CEV: “volition is what we wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together, …”.
        
        If by “values” you include those things that you don’t think you value now but you would value if you had more knowledge of them, or would be persuaded to value by a peer if you hadn’t conquered the world and therefore eliminated all of your peers, then perhaps I can see what you’re trying to say.
        
        By talking about “imposing your own values” without all of the additional extrapolated volition clauses, you’re committing an error of moral overconfidence—something which has caused vast amounts of unpleasantness throughout human history.
        
        http://www.overcomingbias.com/2009/01/moral-uncertainty-towards-a-solution.html
        DanArmak 10 Oct 2009 23:01 UTC
        3 points
        Parent
        
        When non-LessWrong-reading humans talk about one person imposing their values on everyone else, they would generally consider it immoral. Are we in agreement here?
        
        Not at all. The morality of imposing my values on you depends entirely on what you were doing, or were going to do, before I forced you to behave nicely.
        Johnicholas 10 Oct 2009 23:47 UTC
        0 points
        Parent
        You may have misread that, and answered a different question, something like “Is it moral?”. The quote actually is asking “Do non-LessWrong-reading humans generally consider it moral?”.
        DanArmak 11 Oct 2009 0:08 UTC
        3 points
        Parent
        I answered the right quote.
        
        Random examples: was the U.S. acting morally when it entered WW2 against the Nazis, and imposed their values across Western Europe and in Japan? Is the average government acting morally when it forcefully collects taxes, enforcing its wealth-redistribution values? Or when it enforces most kinds of laws?
        
        I think most people by far (to answer your question about non-LW-readers) support some value-imposing policies. Very few people are really pure personal-liberty non-interventionists. The morality of the act depends on the behavior being imposed, and on the default behavior that exists without such imposition.
        
        It remains to stipulate that the government has a single person at its head who imposes his or her values on everyone else. Some governments do run this way, some others approximate it.
        
        Edit: What you may have meant to say, is that the average non-LW-reading person, when hearing the phrase “one human imposing their values on everyone else”, will imagine some very evil and undesirable values, and conclude that the action is immoral. I agree with that—it’s all a matter of framing.
        Vladimir_Nesov 10 Oct 2009 22:51 UTC
        1 point
        Parent
        Of course, I’m talking about values as they should be, with moral mistakes filtered out, not as humans realistically enact them, especially when the situation creates systematic distortions, as is the case with granting absolute power.
        
        Posts referring to necessary background for this discussion:
        
        Ends Don’t Justify Means (Among Humans)
        Not Taking Over the World
- loqi 11 Oct 2009 21:44 UTC
  2 points
  Parent
  
  Suppose that you created an entity which is superhuman in some respects (a task that has already been done already done many times over) and asked it to find third alternatives.
  
  As far as software is concerned, this flavor of superhumanity does not remotely resemble anything that has already been done. You’re talking about assembling an “entity” capable of answering complex questions at the intersection of physics, philosophy, and human psychology. This is a far cry from the automation of relatively simple, isolated tasks like playing chess or decoding speech—I seriously doubt that any sub-AGI would be up to the task.
  
  The non-software alternatives you mention are even less predictable/controllable than AI, so I don’t see how pursuing those strategies could be any safer than a strictly FAI-based approach. Granted sufficient superhumanity (we can’t precisely anticipate how much we’re granting them), the human components of your “team” would face an enormous temptation to use their power to acquire more power. This risk would need to be weighed against its benefits, but the original aim was just the prevention of a sub-optimal singleton! So all we’ve done is step closer to the endgame without knowably improving our board position.
  - Johnicholas 12 Oct 2009 3:27 UTC
    2 points
    Parent
    Human teams, or human/software amalgams (like the LessWrong moderation system that we’re part of right now) are routinely superhuman in many ways. LessWrong, considered as a single entity, has superhumanly broad knowledge. It has a fairly short turn-around time for getting somewhat thoughtful answers—possibly more consistently short turn-around time than any of us could manage alone.
    
    An entity such as this one might be highly capable in some narrowly focused ways (indeed, it could be purpose-built for one goal—the goal of reducing the risk of the earth being paperclipped) while being utterly incapable in many other ways, and posing almost no threat to the earth or wider society.
    
    Building a purportedly-Friendly general-purpose recursive self-improving process, on the other hand, means a risk that you’ve created something that will diverge on the Nth self-improvement cycle and become unfriendly. By explicitly going for general-purpose, no-human-dependencies, and indefinitely self-improvable, you’re building in exactly the same elements that you suspect are dangerous.
    - loqi 12 Oct 2009 6:55 UTC
      3 points
      Parent
      
      By explicitly going for general-purpose, no-human-dependencies, and indefinitely self-improvable, you’re building in exactly the same elements that you suspect are dangerous.
      
      This is a fairly obvious point that becomes more complicated in a larger scope. Being charitable, you seem to be implying
      
      P(Fail | FAI-attempt) > P(Fail | ~FAI-attempt)
      where FAI-attempt means “we built and deployed an AGI that we thought was Friendly”. If our FAI efforts are the only thing that causally affects Fail, then your implication might be correct. But if we take into account all other AGI research, we need more detail. Assume FAI researchers have no inherent advantage over AGI researchers (questionable). Then we basically have
      
      P(Fail | ~FAI-attempt) = P(AGI | ~FAI)*P(Fail | AGI & ~FAI) P(Fail | FAI-attempt) = P(FAI-wins)*P(Fail | FAI) + P(~FAI-wins)*P(Fail | ~FAI-attempt)
      So in these terms, what would it mean for an FAI attempt to be riskier?
      
      P(Fail | FAI-attempt) > P(Fail | ~FAI-attempt) # definitions P(FAI-wins)*P(Fail | FAI) + P(~FAI-wins)*P(AGI | ~FAI)*P(Fail | AGI & ~FAI) > P(AGI | ~FAI)*P(Fail | AGI & ~FAI) # algebra P(FAI-wins)*P(Fail | FAI) > P(AGI | ~FAI)*P(Fail | AGI & ~FAI) - P(~FAI-wins)*P(AGI | ~FAI)*P(Fail | AGI & ~FAI) P(FAI-wins)*P(Fail | FAI) > P(AGI | ~FAI)*P(Fail | AGI & ~FAI)*(1-P(~FAI-wins)) P(FAI-wins)*P(Fail | FAI) > P(AGI | ~FAI)*P(Fail | AGI & ~FAI)*P(FAI-wins) # assume P(FAI-wins) > 0 # since P(Fail | ~FAI-attempt) == P(Fail | FAI-attempt) is the "who cares" case P(Fail | FAI) > P(AGI | ~FAI)*P(Fail | AGI & ~FAI)
      Eliezer has argued at considerable length that P(Fail | AGI & ~FAI) is very close to 1. So under these assumptions, the odds of a FAI failure must be higher than the odds of non-FAI AGI being created in order to successfully argue that FAI is more dangerous than the alternative. Do you have any objection to my assumptions and derivation, or do you believe that P(Fail | FAI) > P(AGI | ~FAI)?
      - Johnicholas 12 Oct 2009 14:42 UTC
        2 points
        Parent
        Can you send me a model? I think my objection is to the binariness of the possible strategies node, but I’m not sure how to express that best in your model.
        
        Suppose there are N rojects in the world each of which might almost-succeed and so each of which is an existential risk.
        
        The variable that I can counterfactually control is my actions. The variable that we can counterfactually control are our actions. Since we’re conversing in persuasive dialog, it is reasonable to discuss what strategies we might take to best reduce existential risk.
        
        Suppose that we distinguish between “safety strategies” and “singleton strategies”.
        
        Singleton strategies are explicitly going for fast, general-purpose power and capability, with as many stacks of iterated exponential growth in capability as the recursive self-improvement engineers can manage. It seems obvious to me that if we embarked on a singleton strategy, even with the best of intentions, there are now N+1 AGI projects, each increasing existential risk, and our best intentions might not outweigh that increase.
        
        Safety strategies would involve attempting to create entities (e.g. human teams, human/software amalgams, special-purpose software) which are explicitly limited and very unlikely to be generally powerful compared to the world at large. They would try to decrease existential risk both directly (e.g. build tools for the AGI projects that reduce the chance of the AGI projects going wrong) and indirectly, by not contributing to the problem.
        loqi 13 Oct 2009 7:44 UTC
        1 point
        Parent
        
        Can you send me a model?
        
        No, sorry, the above comment was just my attempt to explain my objection as unambiguously as possible.
        
        It seems obvious to me that if we embarked on a singleton strategy, even with the best of intentions, there are now N+1 AGI projects, each increasing existential risk, and our best intentions might not outweigh that increase.
        
        Yes, but your “N+1” hides some important detail: Our effective contribution to existential risk diminishes as N grows, while our contribution to safer outcomes stays constant or even grows (in the case that our work has a positive impact on someone else’s “winning” project).
        
        I think my objection is to the binariness of the possible strategies node, but I’m not sure how to express that best in your model. [...] They would try to decrease existential risk both directly (e.g. build tools for the AGI projects that reduce the chance of the AGI projects going wrong) and indirectly, by not contributing to the problem.
        
        Since you were making the point that attempting to build Friendly AGI contributes to existential risk, I thought it fair to factor out other actions. The two strategies you outline above are entirely independent, so they should be evaluated separately. I read you as promoting the latter strategy independently when you say:
        
        By explicitly going for general-purpose, no-human-dependencies, and indefinitely self-improvable, you’re building in exactly the same elements that you suspect are dangerous.
        
        The choice under consideration is binary: Attempt a singleton or don’t. Safety strategies may also be worthwhile, but I need a better reason than “they’re working toward the same goal” to view them as relevant to the singleton question.
- wedrifid 10 Oct 2009 18:23 UTC
  2 points
  Parent
  
  This is reasonable—but what is odd to me is the world-conquering part. The justifications that I’ve seen for creating a singleton soon (e.g. either we have a singleton or we have unfriendly superintelligence) seem insufficient.
  
  If a superintelligence is able to find a way to reliably prevent either the emergence of a rival, preventable existential risk or actions sufficiently undesirable then by all means it can do that instead.
- Vladimir_Nesov 11 Oct 2009 21:49 UTC
  1 point
  Parent
  
  Suppose that you created an entity which is superhuman in some respects (a task that has already been done already done many times over) and asked it to find third alternatives.
  
  By the way, the critical distinction is that with AGI, you are automating the whole decision-making cycle, while other kinds of tools only improve on some portion of the cycle under human control or anyway with humans somewhere in the algorithm.