Steven Byrnes comments on What does it mean for an AGI to be ‘safe’?

Steven Byrnes 7 Oct 2022 16:14 UTC
LW: 8 AF: 7
1
AF
If we’re just discussing terminology, I continue to believe that “AGI safety” is much better than “AI safety”, and plausibly the least bad option.
- One problem with “AI alignment” is that people use that term to refer to “making very weak AIs do what we want them to do”.
- Another problem with “AI alignment” is that people take it to mean “alignment with a human” (i.e. work on ambitious value learning specifically) or “alignment with humanity” (i.e. work on CEV specifically). Thus, work on things like task AGIs and sandbox testing protocols etc. are considered out of scope for “AI alignment”.
Of course, “AGI safety” isn’t perfect either. How can it be abused?
- “they’ve probably found a way to keep their AGI weak enough that it isn’t very useful.” — maybe, but when we’re specifically saying “AGI”, not “AI”, that really should imply a certain level of power. Of course, if the term AGI is itself “sliding backwards on the semantic treadmill”, that’s a problem. But I haven’t seen that happen much yet (and I am fighting the good fight against it!)
- The term “AGI safety” seems to rule out the possibility of “TAI that isn’t AGI”, e.g. CAIS. — Sure, but in my mind, that’s a feature not a bug; I really don’t think that “TAI that isn’t AGI” is going to happen, and thus it’s not what I‘m working on.
- This quote:
If using the label “AI safety” for this problem causes us to confuse a proxy goal (“safety”) for the actual goal “things go great in the long run”, then we should ditch the label. And likewise, we should ditch the term if it causes researchers to mistake a hard problem (“build an AGI that can safely end the acute risk period and give humanity breathing-room to make things go great in the long run”) for a far easier one (“build a safe-but-useless AI that I can argue counts as an ‘AGI’”).
Sometimes I talk about “safe and beneficial AGI” (or more casually, “awesome post-AGI utopia”) as the larger project, and “AGI safety” as the part where we try to make AGIs that don’t kill everyone. I do think it’s useful to have different terms for those.
See also: Safety ≠ alignment (but they’re close!)