Rob Bensinger comments on AGI Ruin: A List of Lethalities

Rob Bensinger 7 Jun 2022 4:25 UTC
6 points
0
a hidden assumption there is a fundamental discontinuity between some categories of systems (eg. weak, won’t kill you, won’t help you with alignment | strong, would help you with alignment, but will kill you by default ) and there isn’t anything interesting/helpful in between (eg. “moderately strong” systems). I don’t think this is true or inevitable.

- I’ll probably try to write and post a longer, top-level post about this (working title: Hope is in continuity).
I think discontinuity is true, but it’s not actually required for EY’s argument. Thus, asserting continuity isn’t sufficient as a response.
You specifically need it to be the case that you get useful capabilities earlier than dangerous ones. If the curves are continuous and danger comes at a different time than pivotalness, but danger comes before pivotalness, then you’re plausibly in a worse situation rather than a better one.
So there needs to be some pivotal act that is pre-dangerous but also post-useful. I think the best way to argue for this is just to name one or more examples. Not necessarily examples where you have an ironclad proof that the curves will work out correctly; just examples that you do in fact believe are reasonably likely to work out. Then we can talk about whether there’s a disagreement about the example’s usefulness, or about its dangeousness, or both.
(Elaborating on “I think discontinuity is true”: I don’t think AGI is just GPT-7 or Bigger AlphaGo; I don’t think the cognitive machinery involved in modeling physical environments, generating and testing scientific hypotheses to build an edifice of theory, etc. is a proper or improper subset of the machinery current systems exhibit; and I don’t think the missing skills are a huge grab bag of unrelated local heuristics such that accumulating them will be gradual and non-lumpy.)
- Jan_Kulveit 13 Jun 2022 22:00 UTC
  2 points
  0
  Parent
  The actual post is now here—as expected, it’s more post-length than a comment.