Ivan Vendrov comments on How likely is deceptive alignment?

Ivan Vendrov 31 Aug 2022 16:49 UTC
LW: 3 AF: 2
0
AF
Thank you for putting numbers on it!
~60%: there will be an existential catastrophe due to deceptive alignment specifically.
Is this an unconditionally prediction of 60% chance of existential catastrophe due to deceptive alignment alone? In contrast to the commonly used 10% chance of existential catastrophe due to all AI sources this century. Or do you mean that, conditional on there being an existential catastrophe due to AI, 60% chance it will be caused by deceptive alignment, and 40% by other problems like misuse or outer alignment?
- paulfchristiano 31 Aug 2022 19:33 UTC
  LW: 27 AF: 11
  4
  AF Parent
  In contrast to the commonly used 10% chance of existential catastrophe due to all AI sources this century
  Amongst the LW crowd I’m relatively optimistic, but I’m not that optimistic. I would give maybe 20% total risk of misalignment this century. (I’m generally expecting singularity this century with >75% chance such that most alignment risk ever will be this century.)
  The number is lower if you consider “how much alignment risk before AI systems are in the driver’s seat,” which I think is very often the more relevant question, but I’d still put it at 10-20%. At various points in the past my point estimates have ranged from 5% up to 25%.
  And then on top of that there are significant other risks from the transition to AI. Maybe a total of more like 40% total existential risk from AI this century? With extinction risk more like half of that, and more uncertain since I’ve thought less about it.
  I still find 60% risk from deceptive alignment quite implausible, but wanted to clarify that 10% total risk is not in line with my view and I suspect it is not a typical view on LW or the alignment forum.
  - Jeffrey Ladish 31 Aug 2022 20:14 UTC
    4 points
    0
    AF Parent
    And then on top of that there are significant other risks from the transition to AI. Maybe a total of more like 40% total existential risk from AI this century? With extinction risk more like half of that, and more uncertain since I’ve thought less about it.
    40% total existential risk, and extinction risk half of that? Does that mean the other half is some kind of existential catastrophe / bad values lock-in but where humans do survive?
    - evhub 31 Aug 2022 20:55 UTC
      LW: 6 AF: 4
      4
      AF Parent
      Fwiw, I would put non-extinction existential risk at ~80% of all existential risk from AI. So maybe my extinction numbers are actually not too different than Paul’s (seems like we’re both ~20% on extinction specifically).
      - iamthouthouarti 1 Sep 2022 7:29 UTC
        8 points
        1
        Parent
        And then there’s me who was so certain until now that any time people talk about x-risk they mean it to be synonymous with extinction. It does make me curious though, what kind of scenarios are you imagining in which misalignment doesn’t kill everyone? Do more people place a higher credence on s-risk than I originally suspected?
- evhub 31 Aug 2022 19:14 UTC
  LW: 26 AF: 12
  4
  AF Parent
  Unconditional. I’m rather more pessimistic than an overall 10% chance. I usually give ~80% chance of existential risk from AI.
  What links here?