Ruby comments on Towards more cooperative AI safety strategies

Ruby 21 Jul 2024 4:29 UTC
3 points
0
I’m confused about your question. I am not imagining that the pivotal act AI (the AI under discussion) does any of those things.
- Eli Tyre 21 Jul 2024 4:47 UTC
  2 points
  0
  Parent
  I am not imagining that the pivotal act AI (the AI under discussion) does any of those things.
  Right, but I’m asking about what you’re visualization of a Friendly AI as described in the sequences, not a limited AGI for a pivotal act.
  
  I’m confused by your confusion! Are you saying that that’s a non-sequitur, because I’m asking about a CEV-sovereign instead of a corrigible, limited genie or oracle?
  
  It seems relevant to me, because both of those were strategic goals for MIRI at various points in it’s history, and at least one of them seem well characterized as “taking over the world” (or at least something very nearby to that). Which seems germane to the discussion at hand to me.
  - Ruby 21 Jul 2024 5:05 UTC
    2 points
    0
    Parent
    I would be surprised if a Friendly AI resulted in those things being left untouched.
    
    I think that is germane but maybe needed some bridging/connecting work since this thread so far was about MIRI-as-having-pivotal-act-goal. Whereas I was less sure about whether MIRI itself would enact a pivotal act if they could than Habryka, my understanding was they had no plan to create a sovereign for most of their history (like after 2004) and so doesn’t seem like that’s a candidate for them having a plan to take over the world.
    - Eli Tyre 21 Jul 2024 5:19 UTC
      3 points
      1
      Parent
      my understanding was they had no plan to create a sovereign for most of their history (like after 2004)
      Yeah, I think that’s false.
      
      The plan was “Figure out how to build a friendly AI, and then build one”. (As Eliezer stated in the video that I linked somewhere else in this comment thread).
      
      But also, I got that impression from the Sequences? Like Eliezer talks about actually building an AGI, not just figuring out the theory of how to build one. You didn’t get that impression?
      - Ruby 21 Jul 2024 5:33 UTC
        4 points
        2
        Parent
        I don’t remember what exactly I thought in 2012 when I was reading the Sequences. I do recall sometime later, after DL was in full swing, it seeming like MIRI wasn’t in any position to be building AGI before others (like no compute, not the engineering prowess), and someone (not necessarily at MIRI) confirmed that wasn’t the plan. Now and at the time, I don’t know how much that was principle vs ability.
        Ruby 21 Jul 2024 5:42 UTC
        2 points
        0
        Parent
        My feeling of the plan pre-pivotal-act era was “figure out the theory of how to build a safe AI at all, and try to get whoever is building to adopt that approach”, and that MIRI wasn’t taking any steps to be the ones building it. I also had the model that due to psychological unity of mankind, anyone building an aligned[ with them] AGI was a good outcome compared to someone building unaligned. Like even if it was Xi Jinping, a sovereign aligned with him would be okay (and not obviously that dramatically different from anyone else?). I’m not sure how much this was MIRI positions vs fragments that I combined in my own head that came from assorted places and were never policy.
        Eli Tyre 21 Jul 2024 18:39 UTC
        3 points
        0
        Parent
        Well, I can tell you that they definitely planned to build the Friendly AI, after figuring out how.
        
        See this other comment.
        Ruby 21 Jul 2024 19:17 UTC
        2 points
        0
        Parent
        Pretty solid evidence.