Eli Tyre comments on Towards more cooperative AI safety strategies

Eli Tyre 21 Jul 2024 3:26 UTC
24 points
1
My understanding was there were 4 phases in which the Singularity Institute / MIRI had 4 different plans.
- ~2000 to ~2004:
  - Plan:
    Build a recursively self-improving seed AI as quickly as possible →
    That seed AI Fooms →
    It figures out the Good and does it.
  - [Note: Eliezer has explicitly disendorsed everything that he believed in this period, unless othrwise noted.]
- ~2004 to ~2016:
  - Update: “Wait. Because of the Orthogonality thesis, not all seed AIs will converge to values that we consider good, even though they’re much smarter than us. The supermajority of seed AIs don’t. We have to build in humane values directly, or building a recursively self-improving AGI will destroy both the world and everything of value in the world.”
  - New plan:
    Figure out the math of motivationaly-stable self-improvement, figure the deep math of cognition →
    use both to build a seed AI, initialized to implement Coherent Extrapolated Volition ->
    let that seed AI recursively self improve into a singleton / sovereign with a decisive strategic advantage →
    that singleton now uses its decisive strategic advantage to optimize the universe.
  - (“World domination is such an ugly phrase. I prefer world optimization”).
- ~2016 to ~2021:
  - Update: “it turns out Deep Learning is general enough that it is possible to build AGI with relatively brute force methods, without having much deep insight into the nature of cognition. AI timelines are shorter than we thought. Fuck. There isn’t time to figure out how to do alignment deeply and completely, at the level that would be required to trust an AI to be a sovereign, and optimize the whole universe.”
  - New plan:
    Figure out enough of alignment to build the minimal AGI system that can preform a pivotal act, in a tightly controlled circumstances, with lots of hacky guardrails and speed-bumps →
    Build such a limited AGI →
    Deploy that AGI to do a pivotal act to prevent any competitor projects from building a more dangerous unbounded AGI.
- ~2021 to present:
  - Update: “We can’t figure out even that much of the science of alignment in time. The above plan is basically doomed. We think the world is doomed. Given that, we might as well try outreach:”
  - New plan:
    Do outreach →
    Get all the Great Powers in the world to join and enforce an treaty that maintains a world-wide ban on large training runs →
    Do biotech projects that can produce humans that are smart enough that they have security mindset not out of a special personal disposition, but just because they’re smart enough to see the obviousness of it by default →
    Those superhumans solve alignment and (presumably?) implement more or less the pre-2016 MIRI plan.
I think interstice’s summary is basically an accurate representation of the ~2001 to ~2016 plan. They’re only mistaken in that MIRI didn’t switch away from that plan until recently.
- interstice 21 Jul 2024 4:27 UTC
  7 points
  0
  Parent
  Nice overview, I agree but I think the 2016-2021 plan could still arguably be described as “obtain god-like AI and use it to take over the world”(admittedly with some rhetorical exaggeration, but like, not that much)
  - Eli Tyre 21 Jul 2024 4:58 UTC
    5 points
    0
    Parent
    I think it’s pretty important that the 2016 to 2021 plan was explicitly aiming to avoid unleashing godlike power. “The minimal amount of power to do a thing which is otherwise impossible”, not “as much omnipotence as is allowed by physics”.
    
    And similarly, the 2016 to 2021 plan did not entail optimizing the world except with regard to what is necessary to prevent dangerous AGIs.
    
    These are both in contrast to the earlier 2004 to 2016 plan. So the rhetorical exaggeration confuses things.
    
    MIRI actually did have a plan that, in my view, is well characterized as (eventually) taking over the world, without exaggeration, that’s apt to get lost if we describe a “toned down” plan as “taking over the world”, because it involves taking powerful, potentially aggressive, action.
    - interstice 21 Jul 2024 5:37 UTC
      6 points
      2
      Parent
      This discussion is a nice illustration of why x-riskers are definitely more power-seeking than the average activist group. Just like Eskimos proverbially have 50 words for snow, AI-risk-reducers need at least 50 terms for “taking over the world” to demarcate the range of possible scenarios. ;)