Ruby comments on Towards more cooperative AI safety strategies

Ruby 18 Jul 2024 22:48 UTC
17 points
17
I think the plan implies having the capability that if you wanted to, you could take over the world, but having the power to do something and actually doing it are quite different. When you say “MIRI wanted to take over the world”, the central meanings of that that come to mind for me is “take over all the governments, be in charge of all the laws and decision-making, be world dictator, take possession of all the resources” and probably also “steer humanity’s future in a very active way”. Which is very very not their intention and if someone goes around saying MIRI’s plan was to take over the world without any clarification leaving the reader to think the above, then I think they’re being very darn misleading.
- Eli Tyre 21 Jul 2024 4:12 UTC
  8 points
  0
  Parent
  When you read the Sequences, was your visualization of a Friendly AI going to let the governments of North Korea or Saudi Arabia persist? Would it allow parents to abuse their children in ways that are currently allowed by the law (and indeed enshrined by the law, in that the law give parents authority over their children)? Does it allow the factory farms to continue to run? How about the (then contemporaneous) US occupations of Iraq and Afghanistan?
  
  (This is a non- rehtorical question. I wonder if we were visualizing different things.)
  - Eli Tyre 21 Jul 2024 4:14 UTC
    4 points
    −4
    Parent
    Speaking for myself, I would say:
    
    It’s a superintelligence, and so it can probably figure out effective peaceful ways to accomplish it’s goals. But among it’s goals will be the dismantling of many and likely all of the world’s major governments, not to mention a bunch of other existing power structures. A government being dismantled by a superhuman persuader is, in many but not all ways, as unsettling as it being destroyed by military force.
    
    Perhaps humanity as a whole, and every individual human, would be made better off by a CEV-aligned friendly singleton, but I think the US government, as an entity, would be rightly threatened.
    - Said Achmiz 21 Jul 2024 4:38 UTC
      2 points
      0
      Parent
      Doesn’t this very answer show that an AI such as you describe would not be reasonably describable as “Friendly”, and that consequently any AI worthy of the term “Friendly” would not do any of the things you describe? (This is certainly my answer to your question!)
      - Eli Tyre 21 Jul 2024 4:41 UTC
        2 points
        0
        Parent
        No. “Friendly” was a semi-technical term of art, at the time. It may turn out that a Friendly AI (in the technical sense) is not or, even can’t be, “friendly” in a more conventional sense.
        Said Achmiz 21 Jul 2024 6:21 UTC
        4 points
        0
        Parent
        Er… yes, I am indeed familiar with that usage of the term “Friendly”. (I’ve been reading Less Wrong since before it was Less Wrong, you know; I read the Sequences as they were being posted.) My comment was intended precisely to invoke that “semi-technical term of art”; I was not referring to “friendliness” in the colloquial sense. (That is, in fact, why I used the capitalized term.)
        
        Please consider the grandparent comment in light of the above.
        Eli Tyre 21 Jul 2024 6:42 UTC
        2 points
        −2
        Parent
        In that case, I answer flatly “no”. I don’t expect many existing governmental institutions to be ethical or legitimate in the eyes of CEV, if CEV converges at all. Factory Farming is right out.
        Said Achmiz 21 Jul 2024 6:45 UTC
        1 point
        0
        Parent
        You don’t think that most humans would be opposed to having an AI dismantle their government, deprive them of affordable meat, and dictate how they can raise their children?
        Eli Tyre 22 Jul 2024 18:25 UTC
        3 points
        0
        Parent
        Whether most existing humans would be opposed is not a criterion of Friendliness.
        
        I think if you described what was going to happen many and maybe humans would say they prefer the status quo to a positive CEV-directed singularity. Perhaps it depends on which parts of “what’s going to happen” you focus on, some are more obviously good or exciting than others. Curing cancer is socially regarded as 👍 while curing death and dismantling governments are typically (though not universally) regarded as 👎.
        
        I don’t think they will actually provide much opposition, because a superhuman persuader will be steering the trajectory of events. (Ostensively, by using only truth tracking arguments and inputs that allow us to converge on the states of belief that we would reflectively prefer, but we mere humans won’t be able to distinguish that from malicious superhuman manipulation.)
        
        But again, how humans would react is neither here nor there for what a Friendly AI does. The AI does what the CEV of humans would want, not what the humans want.
        Said Achmiz 22 Jul 2024 20:04 UTC
        4 points
        0
        Parent
        And… you claim that the CEV of existing humans will want those things?
        Raemon 22 Jul 2024 20:38 UTC
        9 points
        10
        Parent
        Part of the whole point of CEV is to discover at least some things that current humanity is confused about but would want if fully informed, with time to think. It’d be surprising to me if CEV-existing-humanity didn’t turn out to want some things that many current humans are opposed to.
        Expand this thread
        Said Achmiz 22 Jul 2024 22:45 UTC
        6 points
        1
        Parent
        Sure. Now, as far as I understand it, whether the extrapolated volition of humanity will even cohere is an open question (on any given extrapolation method; we set aside the technical question of selecting or constructing such a method).
        
        So Eli Tyre’s claim seems to be something like: on [ all relevant / the most likely / otherwise appropriately selected ] extrapolation methods, (a) humanity’s EV will cohere, (b) it will turn out to endorse the specific things described (dismantling of all governments, removing the supply of factory farmed meat, dictating how people should raise their children).
        
        Right?
        Eli Tyre 23 Jul 2024 2:06 UTC
        6 points
        0
        Parent
        I’m much more doubtful than most people around here about whether CEV coheres: I guess that the CEV of some humans wireheads themselves and the CEV of other humans doesn’t, for instance.
        
        But I’m bracketing that concern for this discussion. Assuming CEV coheres, then yes I predict that it will have radical (in the sense of a political radical who’s beliefs are extremely outside of the overton window, such that they are disturbing to the median voter) views about all of those things.
        
        But more confidently, I predict that it will have radical views about a very long list of things that are commonplace in 2024, even if it turns out that I’m wrong about this specific set.
        
        CEV asks what would we want if we knew everything the AI knows. There are dozens of things that I think that I know, that if the average person knew to be true, would invalidate a lot of their ideology. Basic
        If the average person knew everything that an AGI knows (which includes potentially millions of subjective years of human science, whole new fields, as foundational to one’s worldview as economics and probability theory is to my current worldview), and they had hundreds of subjective years to internalize those facts and domains, in a social context that was conducive to that, with (potentially) large increases in their intelligence, I expect their views are basically unrecognizable after a process like that.
        
        As a case in point, most people consider it catastrophically bad to have their body destroyed (duh). And if you asked them if they would prefer, given their body being destroyed, to have their brain-state recorded, uploaded, and run on a computer, many would say “no”, because it seems horrifying to them.
        
        Most LessWrongers embrace computationalism: they think that living as an upload is about as good as living as a squishy biological robot (and indeed, better in many respects). They would of course choose to be uploaded if their body was being destroyed. Many would elect to have their body destroyed specifically because they would prefer to be uploaded!
        
        That is most LessWrongers think they know something which most people don’t know, but which, if they did know it, would radically alter their preferences and behavior.
        
        I think a mature AGI knows at least thousands of things like that.
        So among the things about CEV that I’m most confident about (again, granting that it coheres at all), is that CEV has extremely radical views, conclusions which are horrifying to most people, including probably myself.
        quila 23 Jul 2024 13:59 UTC
        5 points
        8
        Parent
        If by ‘cohere’ you mean ‘the CEVs of all individual humans match’, then my belief (>99%) is that it is not the case that the CEVs of all individual humans will (precisely) match. I also believe there would be significant overlap between the CEVs of 90+% of humans^[1], and that this overlap would include disvaluing two of the three^[2] things you asked about (present factory farming and child abuse; more generally, animal and child suffering).
        (This felt mostly obvious to me, but you did ask about it a few times, in a way that suggested you expect something different; if so, you’re welcome to pinpoint where you disagree.)
        ^
        For instance, even if one human wants to create a lot of hedonium, and another human wants to create a lot of individuals living fun and interesting lives, it will remain the case that they both disvalue things like extreme suffering. Also, the former human will probably still find at least some value in what the latter human seeks.
        ^
        For the part of your question about whether their CEVs would endorse dismantling governments: note that ‘governments’ is a relevantly broad category, when considering that most configurations which are infeasible now will be feasible in the (superintelligence-governed) future. I think these statements capture most of my belief about how most humans’ CEVs would regard things in this broad category.
        Most human CEVs would be permissive of those who terminally-wish^[3] to live in contexts that have some form of harmless government structure.
        The category of ‘government’ also includes, e.g., dystopias that create suffering minds and don’t let them leave; most human CEVs would seek to prevent this kind of government from existing.
        (None of that implies any government would be present everywhere, nor that anyone would be in such a context against their will; rather, I’m imagining that a great diversity of contexts and minds will exist. I less confidently predict that most will choose to live in contexts without a government structure, considering it unnecessary given the presence of a benevolent ASI.)
        ^
        (wished for not because it is necessary, for it would not be under a benevolent ASI, but simply because it’s their vision for the context in which they want to live)
        Eli Tyre 23 Jul 2024 1:35 UTC
        2 points
        0
        Parent
        I do.
        
        I mean, it depends on the exact CEV procedure. But yes.
        quetzal_rainbow 21 Jul 2024 7:55 UTC
        2 points
        −6
        Parent
        I think majority of nations would support dismantling their governments in favor of benevolent superintelligence, especially given correct framework. And ASI can simply solve problem of meat by growing brainless bodies.
        quila 23 Jul 2024 13:15 UTC
        1 point
        0
        Parent
        Edit: Whoever mega-downvoted this, I’m interested to see you explain why.
        Meta: You may wish to know^[1] that seeing these terms replaced with the ones you used can induce stress/dissociation in the relevant groups (people disturbed by factory farming and child abuse survivors). I am both and this was my experience. I don’t know how common it would be among LW readers of those demographics specifically, though.
        The one you responded to:
        Would it allow parents to abuse their children in ways that are currently allowed by the law [...]? Does it allow the factory farms to continue to run?
        Your response:
        You don’t think that most humans would be opposed to having an AI dismantle their government, deprive them of affordable meat, and dictate how they can raise their children?
        ^
        I’m framing this as sharing info you (or a generalized altruistic person placed in your position) may care about rather than as arguing for a further conclusion.
  - Ruby 21 Jul 2024 4:29 UTC
    3 points
    0
    Parent
    I’m confused about your question. I am not imagining that the pivotal act AI (the AI under discussion) does any of those things.
    - Eli Tyre 21 Jul 2024 4:47 UTC
      2 points
      0
      Parent
      I am not imagining that the pivotal act AI (the AI under discussion) does any of those things.
      Right, but I’m asking about what you’re visualization of a Friendly AI as described in the sequences, not a limited AGI for a pivotal act.
      
      I’m confused by your confusion! Are you saying that that’s a non-sequitur, because I’m asking about a CEV-sovereign instead of a corrigible, limited genie or oracle?
      
      It seems relevant to me, because both of those were strategic goals for MIRI at various points in it’s history, and at least one of them seem well characterized as “taking over the world” (or at least something very nearby to that). Which seems germane to the discussion at hand to me.
      - Ruby 21 Jul 2024 5:05 UTC
        2 points
        0
        Parent
        I would be surprised if a Friendly AI resulted in those things being left untouched.
        
        I think that is germane but maybe needed some bridging/connecting work since this thread so far was about MIRI-as-having-pivotal-act-goal. Whereas I was less sure about whether MIRI itself would enact a pivotal act if they could than Habryka, my understanding was they had no plan to create a sovereign for most of their history (like after 2004) and so doesn’t seem like that’s a candidate for them having a plan to take over the world.
        Eli Tyre 21 Jul 2024 5:19 UTC
        3 points
        1
        Parent
        my understanding was they had no plan to create a sovereign for most of their history (like after 2004)
        Yeah, I think that’s false.
        
        The plan was “Figure out how to build a friendly AI, and then build one”. (As Eliezer stated in the video that I linked somewhere else in this comment thread).
        
        But also, I got that impression from the Sequences? Like Eliezer talks about actually building an AGI, not just figuring out the theory of how to build one. You didn’t get that impression?
        Ruby 21 Jul 2024 5:33 UTC
        4 points
        2
        Parent
        I don’t remember what exactly I thought in 2012 when I was reading the Sequences. I do recall sometime later, after DL was in full swing, it seeming like MIRI wasn’t in any position to be building AGI before others (like no compute, not the engineering prowess), and someone (not necessarily at MIRI) confirmed that wasn’t the plan. Now and at the time, I don’t know how much that was principle vs ability.
        Ruby 21 Jul 2024 5:42 UTC
        2 points
        0
        Parent
        My feeling of the plan pre-pivotal-act era was “figure out the theory of how to build a safe AI at all, and try to get whoever is building to adopt that approach”, and that MIRI wasn’t taking any steps to be the ones building it. I also had the model that due to psychological unity of mankind, anyone building an aligned[ with them] AGI was a good outcome compared to someone building unaligned. Like even if it was Xi Jinping, a sovereign aligned with him would be okay (and not obviously that dramatically different from anyone else?). I’m not sure how much this was MIRI positions vs fragments that I combined in my own head that came from assorted places and were never policy.
        Eli Tyre 21 Jul 2024 18:39 UTC
        3 points
        0
        Parent
        Well, I can tell you that they definitely planned to build the Friendly AI, after figuring out how.
        
        See this other comment.
        Ruby 21 Jul 2024 19:17 UTC
        2 points
        0
        Parent
        Pretty solid evidence.
- interstice 19 Jul 2024 2:23 UTC
  7 points
  −4
  Parent
  “Taking over” something does not imply that you are going to use your authority in a tyrannical fashion. People can obtain control over organizations and places and govern with a light or even barely-existent touch, it happens all the time.
  
  Would you accept “they plan to use extremely powerful AI to institute a minimalist, AI-enabled world government focused on preventing the development of other AI systems” as a summary? Like sure, “they want to take over the world” as a gist of that does have a bit of an editorial slant, but not that much of one. I think that my original comment would be perceived as much less misleading by the majority of the world’s population than “they just want to do some helpful math uwu” in the event that these plans actually succeeded. I also think it’s obvious that these plans indicate a far higher degree of power-seeking(in aim at least) than virtually all other charitable organizations.
  
  (..and to reiterate, I’m not taking a strong stance on the advisability of these plans. In a way, had they succeeded, that would have provided a strong justification for their necessity. I just think it’s absurd to say that the organization making them is less power-seeking than the ADL or whatever)
  - Ruby 19 Jul 2024 3:33 UTC
    8 points
    6
    Parent
    
    Would you accept “they plan to use extremely powerful AI to institute a minimalist, AI-enabled world government focused on preventing the development of other AI systems” as a summary?
    
    No. Because I don’t think that was specified or is necessary for a pivotal act. You could leave all existing government structures intact and simply create an invincible system that causes any GPU farm larger than a certain size to melt. Or something akin to that that doesn’t require replacing existing governments, but is a quite narrow intervention.
    - interstice 19 Jul 2024 3:48 UTC
      5 points
      −2
      Parent
      It wasn’t specified but I think they strongly implied it would be that or something equivalently coercive. The “melting GPUs” plan was explicitly not a pivotal act but rather something with the required level of difficulty, and it was implied that the actual pivotal act would be something further outside the political Overton window. When you consider the ways “melting GPUs” would be insufficient a plan like this is the natural conclusion.
      
      doesn’t require replacing existing governments
      
      I don’t think you would need to replace existing governments. Just block all AI projects and maintain your ability to continue doing so in the future via maintaining military supremacy. Get existing governments to help you, or at least not interfere, via some mix of coercion and trade. Sort of a feudal arrangement with a minimalist central power.
      - Ruby 19 Jul 2024 4:31 UTC
        13 points
        10
        Parent
        
        Just block all AI projects and maintain your ability to continue doing so in the future via maintaining military supremacy.
        
        That to me is a very very non-central case of “take over the world”, if it is one at all.
        
        This is about “what would people think when they hear that description” and I could be wrong, but I expect “the plan is to take over the world” summary would lead people to expect “replace governments” level of interference, not “coerce/trade to ensure this specific policy”—and there’s a really really big difference between the two.
        Richard_Ngo 19 Jul 2024 18:37 UTC
        25 points
        9
        Parent
        I think this whole debate is missing the point I was trying to make. My claim was that it’s often useful to classify actions which tend to lead you to having a lot of power as “structural power-seeking” regardless of what your motivations for those actions are. Because it’s very hard to credibly signal that you’re accumulating power for the right reasons, and so the defense mechanisms will apply to you either way.
        In this case MIRI was trying to accumulate a lot of power, and claiming that they were aiming to use it in the “right way” (do a pivotal act) rather than the “wrong way” (replacing governments). But my point above is that this sort of claim is largely irrelevant to defense mechanisms against power-seeking.
        (Now, in this case, MIRI was pursuing a type of power that was too weird to trigger many defense mechanisms, though it did trigger some “this is a cult” defense mechanisms. But the point cross-applies to other types of power that they, and others in AI safety, are pursuing.)
        habryka 19 Jul 2024 18:43 UTC
        7 points
        −14
        Parent
        I don’t super buy this. I don’t think MIRI was trying to accumulate a lot of power. In my model of the world they were trying to design a blueprint for some institution or project that would mostly have highly conditional power, that they would personally not wield.
        In the metaphor of classical governance, I think what MIRI was doing was much more “design a blueprint for a governance agency” not “put themselves in charge of a governance agency”. Designing a blueprint is not a particularly power-seeking move, especially if you expect other people to implement it.
        [ ]
        [deleted]
        Ruby 19 Jul 2024 19:54 UTC
        5 points
        3
        Parent
        I got your point and think it’s valid and I don’t object to calling MIRI structurally power-seeking to the extent they wanted to execute a pivotal act themselves (Habryka claims they weren’t, I’m not knowledgeable on that front).
        
        I still think it’s important to push back against a false claim that someone had the goal of taking over the world.