paulfchristiano comments on Transcript of Sam Altman’s interview touching on AI safety

paulfchristiano 20 Jan 2023 19:15 UTC
90 points
70
On the one hand, I do think people around here say a lot of stuff that feels really silly to me, some of which definitely comes from analogies to humans, so I can sympathize with where Sam is coming from.
On the other hand, I think this response mischaracterizes the misalignment concern and is generally dismissive and annoying. Implying that “if you think an AI might behave badly, that really shows that it is you who would behave badly” is kind of rhetorically effective (and it is a non-zero signal) but it’s a tiny consideration and either misunderstands the issues or is deliberately obtuse to score rhetorical points. It would be really worrying if people doubled down on this kind of rhetorical strategy (which I think is plausible) or if it was generally absorbed as part of the culture of OpenAI. Unfortunately some other OpenAI have made similarly worrying statements.
I agree that it’s not obvious what is right. I think there is maybe a 50% chance that the alignment concerns are totally overblown and either emerge way too late to be relevant or are extremely easily dealt with. I hope that it will be possible to make measurements to resolve this dispute well before something catastrophic happens, and I do think there are plausible angles for doing so. In the meantime I personally just feel pretty annoyed at people on both sides who seem so confident and dismissive. I’m more frustrated at Eliezer because he is in some sense “on my side” of this issue, but I’m more worried about Sam since erring in the other direction would irreversibly disempower humanity.
- paulfchristiano 21 Jan 2023 1:06 UTC
  15 points
  4
  Parent
  That said, I agree with Sam that in the short term more of the harm comes from misuse than misalignment. I just think the “short term” could be quite short, and normal people are not myopic enough that the costs of misuse are comparable to say a 3% risk of death in 10 years. I also think “misuse” vs “misalignment” can be blurry in a way that makes both positions more defensible, e.g. a scenario where OpenAI trains a model which is stolen and then deployed recklessly can involve both. Misalignment is what makes that event catastrophic for humanity, but from OpenAI’s perspective any event where someone steals their model and applies it recklessly might be described as misuse.