Sorry, by “this issue” I didn’t mean that an AI might give this argument to get out of the box, but
rather the underlying ethical issue itself (the “moral horror” that you mentioned in the OP). Have you seen anyone raise it as an issue before?
Yes, Eliezer’s mentioned it several times on Twitter in the last few months[1], but I remember seeing discussion of it at least ten years ago (almost certainly on LessWrong). My guess is some combination of old-timers considering it an obvious issue that doesn’t need to be rehashed, and everyone else either independently coming to the same conclusion or just not thinking about it at all. Probably also some reluctance to discuss it publicly for various status-y reasons, which would be unfortunate.
At least the core claim that it’s possible for AIs to be moral patients and the fact that we can’t be sure we aren’t accidentally creating those is a serious concern; not, as far as I remember, the extrapolation to what might actually end up happening during a training process in terms of constantly overwriting many different agents values at each training step.
not, as far as I remember, the extrapolation to what might actually end up happening during a training process in terms of constantly overwriting many different agents values at each training step
Yeah, this specific issue is what I had in mind. Would be interesting to know whether anyone has talked about this before (either privately or publicly) or if it has just never occurred to anyone to be concerned about this until now.
Sorry, by “this issue” I didn’t mean that an AI might give this argument to get out of the box, but rather the underlying ethical issue itself (the “moral horror” that you mentioned in the OP). Have you seen anyone raise it as an issue before?
Yes, Eliezer’s mentioned it several times on Twitter in the last few months[1], but I remember seeing discussion of it at least ten years ago (almost certainly on LessWrong). My guess is some combination of old-timers considering it an obvious issue that doesn’t need to be rehashed, and everyone else either independently coming to the same conclusion or just not thinking about it at all. Probably also some reluctance to discuss it publicly for various status-y reasons, which would be unfortunate.
At least the core claim that it’s possible for AIs to be moral patients and the fact that we can’t be sure we aren’t accidentally creating those is a serious concern; not, as far as I remember, the extrapolation to what might actually end up happening during a training process in terms of constantly overwriting many different agents values at each training step.
Yeah, this specific issue is what I had in mind. Would be interesting to know whether anyone has talked about this before (either privately or publicly) or if it has just never occurred to anyone to be concerned about this until now.