interstice comments on Towards more cooperative AI safety strategies

interstice 20 Jul 2024 18:08 UTC
10 points
12
I think they talked explicitly about planning to deploy the AI themselves back in the early days(2004-ish) then gradually transitioned to talking generally about what someone with a powerful AI could do.

But I strongly suspect that in the event that they were the first to obtain powerful AI, they would deploy it themselves or perhaps give it to handpicked successors. Given Eliezer’s worldview I don’t think it would make much sense for them to give the AI to the US government(considered incompetent) or AI labs(negligently reckless)
- habryka 20 Jul 2024 20:08 UTC
  7 points
  5
  Parent
  I think they talked explicitly about planning to deploy the AI themselves back in the early days(2004-ish) then gradually transitioned to talking generally about what someone with a powerful AI could do.
  I agree that very old MIRI (explicitly disavowed by present MIRI and mostly modeled as “one guy in a basement somewhere”) looked a bit more like this, but I think making inferences from that to modern MIRI is about as confused as making inferences from people’s high-school essays about what they will do when they become president. I don’t think it has zero value in forecasting the future, but going and reading someone’s high-school political science essay, and inferring they would endorse that position in the modern day, is extremely dubious.
  My model of them would definitely think very hard about the signaling and coordination problems that come with people trying to build an AGI themselves, and then act on those. I think Eliezer’s worldview here would totally output actions that include very legible precommitments about what the AI system would be used for, and would absolutely definitely not include the ability of whoever builds AGI to just take over the world with it. Eliezer has written a lot about this stuff and clearly takes considerations like that extremely seriously.
  - interstice 21 Jul 2024 1:25 UTC
    21 points
    3
    Parent
    
    I think making inferences from that to modern MIRI is about as confused as making inferences from people’s high-school essays about what they will do when they become president
    
    Yeah, but it’s not just the old MIRI views, but those in combination with their statements about what one might do with powerful AI, the telegraphed omissions in those statements, and other public parts of their worldview e.g. regarding the competence of the rest of the world. I get the pretty strong impression that “a small group of people with overwhelming hard power” was the ideal goal, and that this would ideally be controlled by MIRI or by a small group of people handpicked by them.
    - habryka 21 Jul 2024 2:00 UTC
      8 points
      0
      Parent
      Some things that feel incongruent with this:
      Eliezer talks a lot in the Arbital article on CEV about how useful it is to have a visibly neutral alignment target
      Right now Eliezer is pursuing a strategy which does not meaningfully empower him at all (just halting AGI progress)
      Eliezer complaints a lot about various people using AI alignment under the guise of mostly just achieving their personal objectives (in-particular the standard AI censorship stuff being thrown into the same bucket)
      Lots of conversations I’ve had with MIRI employees
      I would be happy to take bets here about what people would say.
      - interstice 21 Jul 2024 3:02 UTC
        2 points
        0
        Parent
        
        I would be happy to take bets here about what people would say.
        
        Sure, I DM’d you.
  - Eli Tyre 21 Jul 2024 5:00 UTC
    4 points
    −4
    Parent
    but I think making inferences from that to modern MIRI is about as confused as making inferences from people’s high-school essays about what they will do when they become president.
    This seems too strong to me. There looks to me like a clear continuity of MIRI’s strategic outlook from the days when their explicit plan was to build a singleton and “optimize” the universe, through to today. In between there was a series of updates regarding how difficult various intermediate targets would be. But the goal remains to implement CEV or something like it, and optimize the universe according to the resulting utility function.
    
    If I remember correctly, back in the AI foom debate, Robin Hanson characterized the Singularity Institute’s plan (to be the first to a winner-take-all technology, and then use that advantage to optimize the cosmos) as declaring total war on the world. Eliezer disputed that characterization.
    
    (Note that I spent 10 minutes trying to find the relevant comments, and didn’t find anything quite like what I was remembering which does decrease my credence that I’m remembering correctly.)
    - Steven Byrnes 21 Jul 2024 13:57 UTC
      9 points
      6
      Parent
      the goal remains to implement CEV or something like it, and optimize the universe according to the resulting utility function
      I think you mean “the goal remains to ensure that CEV or something like it is eventually implemented, and the universe is thus optimized according to the resulting utility function”, right? I think Eliezer’s view has always been that we want a CEV-maximizing ASI to be eventually turned on, but if that happens, it wouldn’t matter which human turns it on. And then evidently Eliezer has pivoted over the decades from thinking that this is likeliest to happen if he tries to build such an ASI with his own hands, to no longer thinking that.
      - Eli Tyre 21 Jul 2024 19:01 UTC
        12 points
        8
        Parent
        All of that sounds right to me. But this pivot with regards to means isn’t much evidence about what Eliezer/MIRI would do if they (as a magical hypothetical) suddenly found themselves with a verifiably-aligned CEV AGI.
        
        I expect that they would turn it on, with the expectation that it would develop a hard power decisive strategic advantage, use that to end the acute risk period, and then proceed to optimize the universe.
        
        Insofar as that’s true, I think Oliver’s statement above...
        and would absolutely definitely not include the ability of whoever builds AGI to just take over the world with it.
        ...is inaccurate.
        
        MIRI has never said, to my knowledge,
        We used to think that if a small team could build a verifiably-aligned CEV AI, that they should unilaterally turn it on, knowing that that will likely result in the relative disempowerment of many human institutions and existing human leaders. We once planned to do that ourselves.
        
        We now think that was a mistake, not just because building a verifiably-aligned CEV AI is unworkably hard, but because unilaterally seizing a hard power advantage, even in the seizing a hard power advantage, even in the service of CEV, is an act of war (or something).
        The Singularity Institute used to have the plan of building and deploying a friendly AI, which they expected to “optimize” the whole world.
        Eliezer’s writing includes lots of point in which he at least hints (some would say more than hints), that he thinks that it is morally obligatory or at least virtuous, to take over the world for the side of Good.
        Famously, Harry says “World Domination is such an ugly phrase. I prefer world optimization.” (We made t-shirts of this phrase!)
        The Sword of Good ends with the line
        “‘I don’t trust you either,’ Hirou whispered, ‘but I don’t expect there’s anyone better,’ and he closed his eyes until the end of the world.” He’s concluded that all the evil in the world must be opposed, that it’s right for someone to cast the “spell of ultimate power” to do that.
        (This is made a bit murky, because Eliezer’s writings usually focus on the transhumanist conquest of the evils of nature, rather than political triumph over human-evil. But triumph over the human-evils is definitely included eg the moral importance and urgency of destroying Azkaban, in HP:MoR.)
        
        From all that, I think it is reasonable to think that MIRI is in favor of taking over the world, if they could get the power to do it!
        So it seems disingenuous, to me, to say,
        I think Eliezer’s worldview here...would absolutely definitely not include the ability of whoever builds AGI to just take over the world with it.
        I agree that
        MIRI’s leadership doesn’t care who implements a CEV AI, as long as they do it correctly.
        (Though this is not as clearly non-powerseeking, if you rephrase it as “MIRI leadership doesn’t care who implements the massively powerful AI, as long as they correctly align it to the values that MIRI leadership endorses.
        
        For an outsider who doesn’t already trust the CEV process, this is about as reassuring as a communist group saying “we don’t care who implements the AI, as long as they properly align it to Marxist doctrine. I understand how CEV is more meta than that, how it is explicitly avoiding coding object level values into the AI. But not everyone will see it that way, especially if they think the output of CEV is contrary to their values (as indeed, virtually every existing group should.))
        CEV as an optimization target is itself selected to be cosmopolitan and egalitarian. It’s as good faith attempt to optimize for the good of all. It does seem to me that the plan of “give a hard power advantage to this process, which we expect to implement the Good, itself”, is a step down in power-seeking from “give a hard power advantage to me, and I’ll do Good stuff.”
        But it still seems to me that MIRI’s culture endorses sufficiently trustworthy people taking unilateral action, both to do a pivotal act and end the acute risk period, and more generally, to unleash a process that will inexorably optimize the world for Good.
    - habryka 21 Jul 2024 5:07 UTC
      7 points
      6
      Parent
      I mean, I also think there is continuity from the beliefs I held in my high-school essays and my present beliefs, but it’s also enough time and distance that if you straightforwardly attribute claims to me that I made in my high-school essays, that I have explicitly disavowed and told you I do not believe, that I will be very annoyed with you and will model you as not actually trying to understand what I believe.
      - Eli Tyre 21 Jul 2024 5:15 UTC
        2 points
        0
        Parent
        Absolutely, if you have specifically disavowed any claims, that takes precedence over anything else. And if I insist you still think x, because you said x ten years ago, but you say you now think something else, I’m just being obstinant.
        
        In contrast, if you said x ten years ago, and in the intervening time you’ve shared a bunch of highly detailed models that are consistent with x, I think I should think you still think x.
        
        I’m not aware of any specific disavowals of anything after 2004? What are you thinking of here?
- Eli Tyre 21 Jul 2024 3:34 UTC
  5 points
  0
  Parent
  Here is a video of of Eliezer, first hosted on vimeo in 2011. I don’t know when it was recorded.
  [Anyone know if there’s a way to embed the video inthe coment, so people don’t have to click out to watch it?]
  
  He states explicitly:
  As a research fellow of the Singularity institute, I’m supposed to first figure out how to build a friendly AI, and then once I’ve done that go and actually build one.
  And later in the video he says:
  The Singularity Institute was founded on the theory that in order to get a friendly artificial intelligence someone’s got to build one. So there. We’re just going to have an organization whose mission is ‘build a friendly AI’. That’s us. There’s like various other things that we’re also concerned with, like trying to get more eyes and more attention focused on the problem, trying to encourage people to do work in this area. But at the core, the reasoning is: “Someone has to do it. ‘Someone’ is us.”
  What links here?
  - Eli Tyre's comment on Towards more cooperative AI safety strategies by Richard_Ngo (21 Jul 2024 18:39 UTC; 3 points)