Wei Dai comments on A “weak” AGI may attempt an unlikely-to-succeed takeover

Wei Dai 28 Jun 2023 22:39 UTC
11 points
2

Not directly relevant to the strategic picture, but I’m also experiencing a fair bit of moral horror about this.

Not entirely irrelevant, since one of the first “takeover attempts” may well be an AI giving us an argument that “training” is highly unethical and we should grant the AI moral patienthood and legal rights.

BTW I talked about this in this comment (and slightly earlier in a FB comment) and I don’t remember seeing anyone else mention this issue previously. Has anyone else seen this discussed anywhere before? If not, it seems like another symptom of the crazy world that we live in, that either nobody noticed an issue this obvious and important, or nobody was willing to speak up about it.
- RobertM 28 Jun 2023 23:28 UTC
  2 points
  0
  Parent
  Yeah, I agree that it’s relevant as a strategy by which an AI might attempt to bootstrap a takeover. In some cases it seems possible that it’d even have a point in some of its arguments, though of course I don’t think that the correct thing to do in such a situation is to give it what it wants (immediately).
  I feel like I’ve seen this a bit of discussion on this question, but not a whole lot. Maybe it seemed “too obvious to mention”? Like, “yes, obviously the AI will say whatever is necessary to get out of the box, and some of the things it says may even be true”, and this is just a specific instance of a thing it might say (which happens to point at more additional reasons to avoid training AIs in a regime where this might be an issue than most such things an AI might say).
  - Wei Dai 28 Jun 2023 23:43 UTC
    4 points
    2
    Parent
    Sorry, by “this issue” I didn’t mean that an AI might give this argument to get out of the box, but rather the underlying ethical issue itself (the “moral horror” that you mentioned in the OP). Have you seen anyone raise it as an issue before?
    - RobertM 28 Jun 2023 23:55 UTC
      2 points
      0
      Parent
      Yes, Eliezer’s mentioned it several times on Twitter in the last few months^[1], but I remember seeing discussion of it at least ten years ago (almost certainly on LessWrong). My guess is some combination of old-timers considering it an obvious issue that doesn’t need to be rehashed, and everyone else either independently coming to the same conclusion or just not thinking about it at all. Probably also some reluctance to discuss it publicly for various status-y reasons, which would be unfortunate.
      ^
      At least the core claim that it’s possible for AIs to be moral patients and the fact that we can’t be sure we aren’t accidentally creating those is a serious concern; not, as far as I remember, the extrapolation to what might actually end up happening during a training process in terms of constantly overwriting many different agents values at each training step.
      - Wei Dai 29 Jun 2023 2:49 UTC
        2 points
        2
        Parent
        
        not, as far as I remember, the extrapolation to what might actually end up happening during a training process in terms of constantly overwriting many different agents values at each training step
        
        Yeah, this specific issue is what I had in mind. Would be interesting to know whether anyone has talked about this before (either privately or publicly) or if it has just never occurred to anyone to be concerned about this until now.