Richard_Ngo comments on AGI safety from first principles: Introduction

Richard_Ngo 11 Oct 2020 9:55 UTC
LW: 3 AF: 1
AF
Yeah, this seems like a reasonable point. But I’m not that much of a fan of the alternatives you suggest. What do you think about “AGI safety”?
- Ben Pace 12 Oct 2020 2:49 UTC
  LW: 7 AF: 4
  AF Parent
  Oli suggests that there are no fields with three-word-names, and so “AI Existential Risk” is not a choice. I think “AI Alignment” is the currently most accurate name for the field that encompasses work like Paul’s and Vanessa’s and Scott/Abram’s and so on. I think “AI Alignment From First Principles” is probably a good name for the sequence.
- Ben Pace 11 Oct 2020 21:14 UTC
  LW: 3 AF: 1
  AF Parent
  It seems a definite improvement on the axis of specificity, I do prefer it over the status quo for that reason.
  But it doesn’t address the problem of scope-sensitivity. I don’t think this sequence is about preventing medium-sized failures from AGI. It’s about preventing extinction-level risks to our future.
  “A First-Principles Explanation of the Extinction-Level Threat of AGI: Introduction”
  “The AGI Extinction Threat from First Principles: Introduction”
  “AGI Extinction From First Principles: Introduction”
  - Richard_Ngo 16 Oct 2020 1:15 UTC
    LW: 4 AF: 2
    AF Parent
    Yeah, I agree that’s a problem. Bur I don’t think it’s a big problem, because who’s talking about medium-size risks from AGI?
    
    In particular, the flag I want to plant is something like: “when you’re talking about AGI, it’s going to be So Big that existential safety is the default type of safety to be concerned with.”
    
    Also I think having the big EXTINCTION in the title costs weirdness points, because even within the field people don’t use that word very much. So I’m leaning towards AGI safety.
    - Kaj_Sotala 10 Nov 2020 14:41 UTC
      LW: 4 AF: 1
      AF Parent
      because who’s talking about medium-size risks from AGI?
      Well, I have talked about them… :-)
      The capability claim is often formulated as the possibility of an AI achieving a decisive strategic advantage (DSA). While the notion of a DSA has been implicit in many previous works, the concept was first explicitly defined by Bostrom (2014, p. 78) as “a level of technological and other advantages sufficient to enable [an AI] to achieve complete world domination.”
      However, assuming that an AI will achieve a DSA seems like an unnecessarily strong form of the capability claim, as an AI could cause a catastrophe regardless. For instance, consider a scenario where an AI launches an attack calculated to destroy human civilization. If the AI was successful in destroying humanity or large parts of it, but the AI itself was also destroyed in the process, this would not count as a DSA as originally defined. Yet, it seems hard to deny that this outcome should nonetheless count as a catastrophe.
      Because of this, this chapter focuses on situations where an AI achieves (at least) a major strategic advantage (MSA), which we will define as “a level of technological and other advantages sufficient to pose a catastrophic risk to human society.” A catastrophic risk is one that might inflict serious damage to human well-being on a global scale and cause 10 million or more fatalities (Bostrom & Ćirković 2008).
      What links here?
      Investigating AI Takeover Scenarios by Sammy Martin (17 Sep 2021 18:47 UTC; 27 points)
      Sammy Martin's comment on Distinguishing AI takeover scenarios by Sam Clarke (8 Sep 2021 18:18 UTC; 17 points)
    - Raemon 11 Jan 2022 2:49 UTC
      LW: 2 AF: 1
      AF Parent
      A year later, as we consider this for the 2020 Review, I think figuring out a better name is worth another look.
      Another option is “AI Catastrophe from First Principles”