Promoted to curated: I really enjoyed reading through this sequence. I have some disagreements with it, but overall it’s one of the best plain language introductions to AI safety that I’ve seen, and I expect I will link to this as a good introduction many times in the future. I was also particularly happy with how the sequence bridged and synthesized a number of different perspectives that usually feel in conflict with each other.
Critch recently made the argument (and wrote it in his ARCHES paper, summarized by Rohin here) that “AI safety” is a straightforwardly misleading name because “safety” is a broader category than is being talked about in (for example) this sequence – it includes things like not making self-driving cars crash. (To quote directly: “the term “AI safety” should encompass research on any safety issue arising from the use of AI systems, whether the application or its impact is small or large in scope”.) I wanted to raise the idea here and ask Richard what he thinks about renaming it to something like “AI existential safety from first principles” or “AI as an extinction risk from first principles” or “AI alignment from first principles”.
Oli suggests that there are no fields with three-word-names, and so “AI Existential Risk” is not a choice. I think “AI Alignment” is the currently most accurate name for the field that encompasses work like Paul’s and Vanessa’s and Scott/Abram’s and so on. I think “AI Alignment From First Principles” is probably a good name for the sequence.
It seems a definite improvement on the axis of specificity, I do prefer it over the status quo for that reason.
But it doesn’t address the problem of scope-sensitivity. I don’t think this sequence is about preventing medium-sized failures from AGI. It’s about preventing extinction-level risks to our future.
“A First-Principles Explanation of the Extinction-Level Threat of AGI: Introduction”
“The AGI Extinction Threat from First Principles: Introduction”
“AGI Extinction From First Principles: Introduction”
Yeah, I agree that’s a problem. Bur I don’t think it’s a big problem, because who’s talking about medium-size risks from AGI?
In particular, the flag I want to plant is something like: “when you’re talking about AGI, it’s going to be So Big that existential safety is the default type of safety to be concerned with.”
Also I think having the big EXTINCTION in the title costs weirdness points, because even within the field people don’t use that word very much. So I’m leaning towards AGI safety.
The capability claim is often formulated as the possibility of an AI achieving a decisive strategic advantage (DSA). While the notion of a DSA has been implicit in many previous works, the concept was first explicitly defined by Bostrom (2014, p. 78) as “a level of technological and other advantages sufficient to enable [an AI] to achieve complete world domination.”
However, assuming that an AI will achieve a DSA seems like an unnecessarily strong form of the capability claim, as an AI could cause a catastrophe regardless. For instance, consider a scenario where an AI launches an attack calculated to destroy human civilization. If the AI was successful in destroying humanity or large parts of it, but the AI itself was also destroyed in the process, this would not count as a DSA as originally defined. Yet, it seems hard to deny that this outcome should nonetheless count as a catastrophe.
Because of this, this chapter focuses on situations where an AI achieves (at least) a major strategic advantage (MSA), which we will define as “a level of technological and other advantages sufficient to pose a catastrophic risk to human society.” A catastrophic risk is one that might inflict serious damage to human well-being on a global scale and cause 10 million or more fatalities (Bostrom & Ćirković 2008).
Very good point. Safety in Engineering is often summarized as “nothing bad happens”, without anthropomorphic nuance, and without “intent”: An engineered system can just go wrong. It seems “AI Safety” often glosses over or ignores such facets. Is it that “AI Safety” is cast as looking into the creation of “reasonable A(G)I” ?
Promoted to curated: I really enjoyed reading through this sequence. I have some disagreements with it, but overall it’s one of the best plain language introductions to AI safety that I’ve seen, and I expect I will link to this as a good introduction many times in the future. I was also particularly happy with how the sequence bridged and synthesized a number of different perspectives that usually feel in conflict with each other.
Critch recently made the argument (and wrote it in his ARCHES paper, summarized by Rohin here) that “AI safety” is a straightforwardly misleading name because “safety” is a broader category than is being talked about in (for example) this sequence – it includes things like not making self-driving cars crash. (To quote directly: “the term “AI safety” should encompass research on any safety issue arising from the use of AI systems, whether the application or its impact is small or large in scope”.) I wanted to raise the idea here and ask Richard what he thinks about renaming it to something like “AI existential safety from first principles” or “AI as an extinction risk from first principles” or “AI alignment from first principles”.
Yeah, this seems like a reasonable point. But I’m not that much of a fan of the alternatives you suggest. What do you think about “AGI safety”?
Oli suggests that there are no fields with three-word-names, and so “AI Existential Risk” is not a choice. I think “AI Alignment” is the currently most accurate name for the field that encompasses work like Paul’s and Vanessa’s and Scott/Abram’s and so on. I think “AI Alignment From First Principles” is probably a good name for the sequence.
It seems a definite improvement on the axis of specificity, I do prefer it over the status quo for that reason.
But it doesn’t address the problem of scope-sensitivity. I don’t think this sequence is about preventing medium-sized failures from AGI. It’s about preventing extinction-level risks to our future.
“A First-Principles Explanation of the Extinction-Level Threat of AGI: Introduction”
“The AGI Extinction Threat from First Principles: Introduction”
“AGI Extinction From First Principles: Introduction”
Yeah, I agree that’s a problem. Bur I don’t think it’s a big problem, because who’s talking about medium-size risks from AGI?
In particular, the flag I want to plant is something like: “when you’re talking about AGI, it’s going to be So Big that existential safety is the default type of safety to be concerned with.”
Also I think having the big EXTINCTION in the title costs weirdness points, because even within the field people don’t use that word very much. So I’m leaning towards AGI safety.
Well, I have talked about them… :-)
A year later, as we consider this for the 2020 Review, I think figuring out a better name is worth another look.
Another option is “AI Catastrophe from First Principles”
Very good point. Safety in Engineering is often summarized as “nothing bad happens”, without anthropomorphic nuance, and without “intent”: An engineered system can just go wrong. It seems “AI Safety” often glosses over or ignores such facets. Is it that “AI Safety” is cast as looking into the creation of “reasonable A(G)I” ?