Ben Pace comments on AGI Predictions

Ben Pace 21 Nov 2020 20:04 UTC
2 points
”Catastrophic” is normally used in the term ”global catastrophic risk” and means something like “kills 100,000s of people”, so I do think “doesn’t necessarily kill but could’ve killed a couple of people” is a fairly different meaning. In retrospect I realize that I put my answer to the second question far too high — if it just means “a deceptive aligned system nearly gives a few people in hospital a fatal dosage but it’s stopped and we don’t know why the system messed up” then it’s quite plausible nothing this substantial will happen as a result of that.
- TurnTrout 21 Nov 2020 20:20 UTC
  5 points
  Parent
  ”Catastrophic” is normally used in the term ”global catastrophic risk” and means something like “kills 100,000s of people”, so I do think “doesn’t necessarily kill but could’ve killed a couple of people” is a fairly different meaning.
  Agreed. In retrospect, I might have opted for “pre-AGI nearly-deadly accident caused by deceptive alignment.”
  In retrospect I realize that I put my answer to the second question far too high — if it just means “a deceptive aligned system nearly gives a few people in hospital a fatal dosage but it’s stopped and we don’t know why the system messed up” then it’s quite plausible nothing this substantial will happen as a result of that.
  I intended the situation to be more like “we catch the AI pretending to be aligned, but actually lying, and it almost or does kill at least a few people as a result of that.”
  With #1, I’m trying to have people predict the “deception is robustly instrumental behavior, but AIs will be bad at it at first and so we’ll catch them.” #2 is trying to operationalize whether this would be viewed as a fire alarm.
  Some ways you might think scenario #1 won’t happen:
  - You don’t think deception will be incentivized
  - Fast takeoff means the AI is never smart enough to deceive but dumb enough to get caught
  - Our transparency tools won’t be good enough for many people to believe it was actually deceptively aligned
  - Rohin Shah 1 Dec 2020 16:57 UTC
    2 points
    Parent
    Some ways you might think scenario #1 won’t happen:
    Also: we solve alignment really well on paper, and that’s why deception doesn’t arise. (I assign non-trivial probability to this.)