Jan_Kulveit comments on AGI Ruin: A List of Lethalities

Jan_Kulveit 7 Jun 2022 2:17 UTC
9 points
0
One datapoint:
- Overall I don’t think the structure of the text makes it easy to express larger disagreements. Many points state obviously true observations, many other points are expressing the same problem in different words, some points are false, and sometimes whether a point actually bites depends on highly speculative assumptions.
- For example: if that counts as a disagreement, in my view what makes multiple of these points “lethal” is a hidden assumption there is a fundamental discontinuity between some categories of systems (eg. weak, won’t kill you, won’t help you with alignment | strong, would help you with alignment, but will kill you by default ) and there isn’t anything interesting/helpful in between (eg. “moderately strong” systems). I don’t think this is true or inevitable.
- I’ll probably try to write and post a longer, top-level post about this (working title: Hope is in continuity).
- I think an attempt to discuss this in comments would be largely pointless. Short-form comment would run into the problem of misunderstanding of what I mean, long comment would be too long.
- Rob Bensinger 7 Jun 2022 4:25 UTC
  6 points
  0
  Parent
  a hidden assumption there is a fundamental discontinuity between some categories of systems (eg. weak, won’t kill you, won’t help you with alignment | strong, would help you with alignment, but will kill you by default ) and there isn’t anything interesting/helpful in between (eg. “moderately strong” systems). I don’t think this is true or inevitable.
  
  - I’ll probably try to write and post a longer, top-level post about this (working title: Hope is in continuity).
  I think discontinuity is true, but it’s not actually required for EY’s argument. Thus, asserting continuity isn’t sufficient as a response.
  You specifically need it to be the case that you get useful capabilities earlier than dangerous ones. If the curves are continuous and danger comes at a different time than pivotalness, but danger comes before pivotalness, then you’re plausibly in a worse situation rather than a better one.
  So there needs to be some pivotal act that is pre-dangerous but also post-useful. I think the best way to argue for this is just to name one or more examples. Not necessarily examples where you have an ironclad proof that the curves will work out correctly; just examples that you do in fact believe are reasonably likely to work out. Then we can talk about whether there’s a disagreement about the example’s usefulness, or about its dangeousness, or both.
  (Elaborating on “I think discontinuity is true”: I don’t think AGI is just GPT-7 or Bigger AlphaGo; I don’t think the cognitive machinery involved in modeling physical environments, generating and testing scientific hypotheses to build an edifice of theory, etc. is a proper or improper subset of the machinery current systems exhibit; and I don’t think the missing skills are a huge grab bag of unrelated local heuristics such that accumulating them will be gradual and non-lumpy.)
  - Jan_Kulveit 13 Jun 2022 22:00 UTC
    2 points
    0
    Parent
    The actual post is now here—as expected, it’s more post-length than a comment.