-- it’s not like the AI wakes up and decides to be evil. I think all of the traditional AI safety thinkers reveal a lot more about themselves than they mean to when they talk about what they think the AGI is going to be like.
I think Sam Altman is “inventing a guy to be mad at” here. Who anthropomorphizes models?
And the bad case—and I think this is important to say—is like lights out for all of us. (..) But I can see the accidental misuse case clearly and that’s super bad. So I think it’s like impossible to overstate the importance of AI safety and alignment work. I would like to see much much more happening.
This reinforces my position that the fundamental dispute between the opposing segments of the AI safety landscape is based mainly on how hard it is to prevent extreme accidents, rather than on irreconcilable value differences. Of course, I can’t judge who is right, and there might be quite a lot of uncertainty until shortly before very transformative events are possible.
On the one hand, I do think people around here say a lot of stuff that feels really silly to me, some of which definitely comes from analogies to humans, so I can sympathize with where Sam is coming from.
On the other hand, I think this response mischaracterizes the misalignment concern and is generally dismissive and annoying. Implying that “if you think an AI might behave badly, that really shows that it is you who would behave badly” is kind of rhetorically effective (and it is a non-zero signal) but it’s a tiny consideration and either misunderstands the issues or is deliberately obtuse to score rhetorical points. It would be really worrying if people doubled down on this kind of rhetorical strategy (which I think is plausible) or if it was generally absorbed as part of the culture of OpenAI. Unfortunately some other OpenAI have made similarly worrying statements.
I agree that it’s not obvious what is right. I think there is maybe a 50% chance that the alignment concerns are totally overblown and either emerge way too late to be relevant or are extremely easily dealt with. I hope that it will be possible to make measurements to resolve this dispute well before something catastrophic happens, and I do think there are plausible angles for doing so. In the meantime I personally just feel pretty annoyed at people on both sides who seem so confident and dismissive. I’m more frustrated at Eliezer because he is in some sense “on my side” of this issue, but I’m more worried about Sam since erring in the other direction would irreversibly disempower humanity.
That said, I agree with Sam that in the short term more of the harm comes from misuse than misalignment. I just think the “short term” could be quite short, and normal people are not myopic enough that the costs of misuse are comparable to say a 3% risk of death in 10 years. I also think “misuse” vs “misalignment” can be blurry in a way that makes both positions more defensible, e.g. a scenario where OpenAI trains a model which is stolen and then deployed recklessly can involve both. Misalignment is what makes that event catastrophic for humanity, but from OpenAI’s perspective any event where someone steals their model and applies it recklessly might be described as misuse.
I think Sam Altman is “inventing a guy to be mad at” here. Who anthropomorphizes models?
This reinforces my position that the fundamental dispute between the opposing segments of the AI safety landscape is based mainly on how hard it is to prevent extreme accidents, rather than on irreconcilable value differences. Of course, I can’t judge who is right, and there might be quite a lot of uncertainty until shortly before very transformative events are possible.
On the one hand, I do think people around here say a lot of stuff that feels really silly to me, some of which definitely comes from analogies to humans, so I can sympathize with where Sam is coming from.
On the other hand, I think this response mischaracterizes the misalignment concern and is generally dismissive and annoying. Implying that “if you think an AI might behave badly, that really shows that it is you who would behave badly” is kind of rhetorically effective (and it is a non-zero signal) but it’s a tiny consideration and either misunderstands the issues or is deliberately obtuse to score rhetorical points. It would be really worrying if people doubled down on this kind of rhetorical strategy (which I think is plausible) or if it was generally absorbed as part of the culture of OpenAI. Unfortunately some other OpenAI have made similarly worrying statements.
I agree that it’s not obvious what is right. I think there is maybe a 50% chance that the alignment concerns are totally overblown and either emerge way too late to be relevant or are extremely easily dealt with. I hope that it will be possible to make measurements to resolve this dispute well before something catastrophic happens, and I do think there are plausible angles for doing so. In the meantime I personally just feel pretty annoyed at people on both sides who seem so confident and dismissive. I’m more frustrated at Eliezer because he is in some sense “on my side” of this issue, but I’m more worried about Sam since erring in the other direction would irreversibly disempower humanity.
That said, I agree with Sam that in the short term more of the harm comes from misuse than misalignment. I just think the “short term” could be quite short, and normal people are not myopic enough that the costs of misuse are comparable to say a 3% risk of death in 10 years. I also think “misuse” vs “misalignment” can be blurry in a way that makes both positions more defensible, e.g. a scenario where OpenAI trains a model which is stolen and then deployed recklessly can involve both. Misalignment is what makes that event catastrophic for humanity, but from OpenAI’s perspective any event where someone steals their model and applies it recklessly might be described as misuse.