Buck comments on Towards more cooperative AI safety strategies

Buck 19 Jul 2024 20:43 UTC
56 points
45
Basically any plan of the form “use AI to prevent anyone from building more powerful and more dangerous AI” is incredibly power-grabbing by normal standards: in order to do this, you’ll have to take actions that start out as terrorism and then might quickly need to evolve into insurrection (given that the government will surely try to coerce you into handing over control over the AI-destroying systems); this goes against normal standards for what types of actions private citizens are allowed to take.
I agree that “obtain enough hard power that you can enforce your will against all governments in the world including your own” is a bit short of “try to take over the world”, but I think that it’s pretty world-takeover-adjacent.
- habryka 19 Jul 2024 21:23 UTC
  22 points
  14
  Parent
  I mean, it really matters whether you are suggesting someone else to take that action or whether you are planning to take that action yourself. Asking the U.S. government to use AI to prevent anyone from building more powerful and more dangerous AI is not in any way a power-grabbing action, because it does not in any meaningful way make you more powerful (like, yes, you are part of the U.S. so I guess you end up with a bit more power as the U.S. ends up with more power, but that effect is pretty negligible). Even asking random AI capability companies to do that is also not a power-grabbing action, because you yourself do not end up in charge of those companies as part of that.
  Yes, unilaterally deploying such a system yourself would be, but I have no idea what people are referring to when they say that MIRI was planning on doing that (maybe they were, but all I’ve seen them do is to openly discuss plans about what ideally someone with access to a frontier model should do in a way that really did not sound like it would end up with MIRI meaningfully in charge).
  - interstice 20 Jul 2024 18:08 UTC
    10 points
    12
    Parent
    I think they talked explicitly about planning to deploy the AI themselves back in the early days(2004-ish) then gradually transitioned to talking generally about what someone with a powerful AI could do.
    
    But I strongly suspect that in the event that they were the first to obtain powerful AI, they would deploy it themselves or perhaps give it to handpicked successors. Given Eliezer’s worldview I don’t think it would make much sense for them to give the AI to the US government(considered incompetent) or AI labs(negligently reckless)
    - habryka 20 Jul 2024 20:08 UTC
      7 points
      5
      Parent
      I think they talked explicitly about planning to deploy the AI themselves back in the early days(2004-ish) then gradually transitioned to talking generally about what someone with a powerful AI could do.
      I agree that very old MIRI (explicitly disavowed by present MIRI and mostly modeled as “one guy in a basement somewhere”) looked a bit more like this, but I think making inferences from that to modern MIRI is about as confused as making inferences from people’s high-school essays about what they will do when they become president. I don’t think it has zero value in forecasting the future, but going and reading someone’s high-school political science essay, and inferring they would endorse that position in the modern day, is extremely dubious.
      My model of them would definitely think very hard about the signaling and coordination problems that come with people trying to build an AGI themselves, and then act on those. I think Eliezer’s worldview here would totally output actions that include very legible precommitments about what the AI system would be used for, and would absolutely definitely not include the ability of whoever builds AGI to just take over the world with it. Eliezer has written a lot about this stuff and clearly takes considerations like that extremely seriously.
      - interstice 21 Jul 2024 1:25 UTC
        14 points
        3
        Parent
        
        I think making inferences from that to modern MIRI is about as confused as making inferences from people’s high-school essays about what they will do when they become president
        
        Yeah, but it’s not just the old MIRI views, but those in combination with their statements about what one might do with powerful AI, the telegraphed omissions in those statements, and other public parts of their worldview e.g. regarding the competence of the rest of the world. I get the pretty strong impression that “a small group of people with overwhelming hard power” was the ideal goal, and that this would ideally be controlled by MIRI or by a small group of people handpicked by them.
        habryka 21 Jul 2024 2:00 UTC
        8 points
        0
        Parent
        Some things that feel incongruent with this:
        Eliezer talks a lot in the Arbital article on CEV about how useful it is to have a visibly neutral alignment target
        Right now Eliezer is pursuing a strategy which does not meaningfully empower him at all (just halting AGI progress)
        Eliezer complaints a lot about various people using AI alignment under the guise of mostly just achieving their personal objectives (in-particular the standard AI censorship stuff being thrown into the same bucket)
        Lots of conversations I’ve had with MIRI employees
        I would be happy to take bets here about what people would say.
        interstice 21 Jul 2024 3:02 UTC
        2 points
        0
        Parent
        
        I would be happy to take bets here about what people would say.
        
        Sure, I DM’d you.
      - Eli Tyre 21 Jul 2024 5:00 UTC
        4 points
        −4
        Parent
        but I think making inferences from that to modern MIRI is about as confused as making inferences from people’s high-school essays about what they will do when they become president.
        This seems too strong to me. There looks to me like a clear continuity of MIRI’s strategic outlook from the days when their explicit plan was to build a singleton and “optimize” the universe, through to today. In between there was a series of updates regarding how difficult various intermediate targets would be. But the goal remains to implement CEV or something like it, and optimize the universe according to the resulting utility function.
        
        If I remember correctly, back in the AI foom debate, Robin Hanson characterized the Singularity Institute’s plan (to be the first to a winner-take-all technology, and then use that advantage to optimize the cosmos) as declaring total war on the world. Eliezer disputed that characterization.
        
        (Note that I spent 10 minutes trying to find the relevant comments, and didn’t find anything quite like what I was remembering which does decrease my credence that I’m remembering correctly.)
        Steven Byrnes 21 Jul 2024 13:57 UTC
        9 points
        6
        Parent
        the goal remains to implement CEV or something like it, and optimize the universe according to the resulting utility function
        I think you mean “the goal remains to ensure that CEV or something like it is eventually implemented, and the universe is thus optimized according to the resulting utility function”, right? I think Eliezer’s view has always been that we want a CEV-maximizing ASI to be eventually turned on, but if that happens, it wouldn’t matter which human turns it on. And then evidently Eliezer has pivoted over the decades from thinking that this is likeliest to happen if he tries to build such an ASI with his own hands, to no longer thinking that.
        Eli Tyre 21 Jul 2024 19:01 UTC
        12 points
        8
        Parent
        All of that sounds right to me. But this pivot with regards to means isn’t much evidence about what Eliezer/MIRI would do if they (as a magical hypothetical) suddenly found themselves with a verifiably-aligned CEV AGI.
        
        I expect that they would turn it on, with the expectation that it would develop a hard power decisive strategic advantage, use that to end the acute risk period, and then proceed to optimize the universe.
        
        Insofar as that’s true, I think Oliver’s statement above...
        and would absolutely definitely not include the ability of whoever builds AGI to just take over the world with it.
        ...is inaccurate.
        
        MIRI has never said, to my knowledge,
        We used to think that if a small team could build a verifiably-aligned CEV AI, that they should unilaterally turn it on, knowing that that will likely result in the relative disempowerment of many human institutions and existing human leaders. We once planned to do that ourselves.
        
        We now think that was a mistake, not just because building a verifiably-aligned CEV AI is unworkably hard, but because unilaterally seizing a hard power advantage, even in the seizing a hard power advantage, even in the service of CEV, is an act of war (or something).
        The Singularity Institute used to have the plan of building and deploying a friendly AI, which they expected to “optimize” the whole world.
        Eliezer’s writing includes lots of point in which he at least hints (some would say more than hints), that he thinks that it is morally obligatory or at least virtuous, to take over the world for the side of Good.
        Famously, Harry says “World Domination is such an ugly phrase. I prefer world optimization.” (We made t-shirts of this phrase!)
        The Sword of Good ends with the line
        “‘I don’t trust you either,’ Hirou whispered, ‘but I don’t expect there’s anyone better,’ and he closed his eyes until the end of the world.” He’s concluded that all the evil in the world must be opposed, that it’s right for someone to cast the “spell of ultimate power” to do that.
        (This is made a bit murky, because Eliezer’s writings usually focus on the transhumanist conquest of the evils of nature, rather than political triumph over human-evil. But triumph over the human-evils is definitely included eg the moral importance and urgency of destroying Azkaban, in HP:MoR.)
        
        From all that, I think it is reasonable to think that MIRI is in favor of taking over the world, if they could get the power to do it!
        So it seems disingenuous, to me, to say,
        I think Eliezer’s worldview here...would absolutely definitely not include the ability of whoever builds AGI to just take over the world with it.
        I agree that
        MIRI’s leadership doesn’t care who implements a CEV AI, as long as they do it correctly.
        (Though this is not as clearly non-powerseeking, if you rephrase it as “MIRI leadership doesn’t care who implements the massively powerful AI, as long as they correctly align it to the values that MIRI leadership endorses.
        
        For an outsider who doesn’t already trust the CEV process, this is about as reassuring as a communist group saying “we don’t care who implements the AI, as long as they properly align it to Marxist doctrine. I understand how CEV is more meta than that, how it is explicitly avoiding coding object level values into the AI. But not everyone will see it that way, especially if they think the output of CEV is contrary to their values (as indeed, virtually every existing group should.))
        CEV as an optimization target is itself selected to be cosmopolitan and egalitarian. It’s as good faith attempt to optimize for the good of all. It does seem to me that the plan of “give a hard power advantage to this process, which we expect to implement the Good, itself”, is a step down in power-seeking from “give a hard power advantage to me, and I’ll do Good stuff.”
        But it still seems to me that MIRI’s culture endorses sufficiently trustworthy people taking unilateral action, both to do a pivotal act and end the acute risk period, and more generally, to unleash a process that will inexorably optimize the world for Good.
        
        habryka 21 Jul 2024 5:07 UTC
        7 points
        6
        Parent
        I mean, I also think there is continuity from the beliefs I held in my high-school essays and my present beliefs, but it’s also enough time and distance that if you straightforwardly attribute claims to me that I made in my high-school essays, that I have explicitly disavowed and told you I do not believe, that I will be very annoyed with you and will model you as not actually trying to understand what I believe.
        Eli Tyre 21 Jul 2024 5:15 UTC
        2 points
        0
        Parent
        Absolutely, if you have specifically disavowed any claims, that takes precedence over anything else. And if I insist you still think x, because you said x ten years ago, but you say you now think something else, I’m just being obstinant.
        
        In contrast, if you said x ten years ago, and in the intervening time you’ve shared a bunch of highly detailed models that are consistent with x, I think I should think you still think x.
        
        I’m not aware of any specific disavowals of anything after 2004? What are you thinking of here?
    - Eli Tyre 21 Jul 2024 3:34 UTC
      5 points
      0
      Parent
      Here is a video of of Eliezer, first hosted on vimeo in 2011. I don’t know when it was recorded.
      [Anyone know if there’s a way to embed the video inthe coment, so people don’t have to click out to watch it?]
      
      He states explicitly:
      As a research fellow of the Singularity institute, I’m supposed to first figure out how to build a friendly AI, and then once I’ve done that go and actually build one.
      And later in the video he says:
      The Singularity Institute was founded on the theory that in order to get a friendly artificial intelligence someone’s got to build one. So there. We’re just going to have an organization whose mission is ‘build a friendly AI’. That’s us. There’s like various other things that we’re also concerned with, like trying to get more eyes and more attention focused on the problem, trying to encourage people to do work in this area. But at the core, the reasoning is: “Someone has to do it. ‘Someone’ is us.”
- Ruby 19 Jul 2024 21:03 UTC
  2 points
  −1
  Parent
  “Pretty world-takeover-adjacent” feels like a fair description to me.
- sunwillrise 19 Jul 2024 20:57 UTC
  1 point
  −2
  Parent
  Basically any plan of the form “use AI to prevent anyone from building more powerful and more dangerous AI” is incredibly power-grabbing by normal standards
  I don’t think this is the important point of disagreement. Habryka’s point throughout this thread seems to be that, yes, doing that is power-grabbing, but it is not what MIRI planned to do. So MIRI planned to (intellectually) empower anyone else willing to do (and capable of doing) a pivotal act with a blueprint for how to do so.
  So MIRI wasn’t seeking to take power, but rather to allow someone else^[1] to do so. It’s the difference between using a weapon and designing a weapon for someone else’s use. An important part is that this “someone else” could very well disagree with MIRI about a large number of things, so there need not be any natural allyship or community or agreement between them.
  If you are a blacksmith working in a forge and someone comes into your shop and says “build me a sword so I can use it to kill the king and take control of the realm,” and you agree to do so but do not expect to get anything out-of-the-ordinary in return (in terms of increased power, status, etc), it seems weird and non-central to call your actions power-seeking. You are simply empowering another, different power-seeker. You are not seeking any power of your own.
  1. ^
    Who was both in a position of power at an AI lab capable of designing a general intelligence and sufficiently clear-headed about the dangers of powerful AI to understand the need for such a strategy.