Here’s one factor that might push against the value of Steinhardt’s post as something to send to ML researchers: perhaps it is not arguing for anything controversial, and so is easier to defend convincingly. Steinhardt doesn’t explicitly make any claim about the possibility of existential risk, and barely mentions alignment. Gates spends the entire talk on alignment and existential risk, and might avoid being too speculative because their talk is about a survey of basically the same ML researcher population as the audience, and so can engage with the most important concerns, counterarguments, etc.
I’d guess that the typical ML researcher who reads Steinhardt’s blogposts will basically go on with their career unaffected, whereas one that watches the Gates video will now know that alignment is a real subfield of AI research, plus the basic arguments for catastrophic failure modes. Maybe they’ll even be on the lookout for impressive research from the alignment field, or empirical demonstrations of alignment problems.
Caveats: I’m far from a NeurIPS author and spent 10 minutes skipping through the video, so maybe all of this is wrong. Would love to see evidence one way or the other.
+1 to this, I feel like an important question to ask is “how much did this change your mind?”. I would probably swap the agree/disagree question for this?
I think the qualitative comments also bear this out as well:
dislike of a focus on existential risks or an emphasis on fears, a desire to be “realistic” and not “speculative”
This seems like people like AGI Safety arguments that don’t really cover AGI Safety concerns! I.e. the problem researchers have isn’t so much with the presentation but the content itself.
Agreed and thanks for pointing out here that each of these resources has different content, not just presentation, in addition to being aimed at different audiences. This seems important and not highlighted in the post.
We then get into what we want to do about that, where one of the major tricky things is the ongoing debate of “how much researchers need to be thinking in the frame of xrisk to make useful progress in alignment”, which seems like a pretty important crux, and another is “what do ML researchers think after consuming different kinds of content”, where Thomas has some hypotheses in the paragraph “I’d guess...” but we don’t actually have data on this and I can think of alternate hypotheses, which also seems quite cruxy.
On the other hand, there’s something to be said about introducing an argument in ways that are as maximally uncontroversial as possible, so that they smoothly fit into a person’s existing views but start to imply things that the person hasn’t considered yet. If something like the Steinhardt posts gets researchers thinking about related topics by themselves, then that might get them to a place where they’re more receptive to the x-risk arguments a few months or a year later—or even end up reinventing those arguments themselves.
I once saw a comment that went along the lines of “you can’t choose what conclusions people reach, but you can influence which topics they spend their time thinking about”. It might be more useful to get people thinking about alignment topics in general, than to immediately sell them on x-risk specifically. (Edited to add: not to mention that trying to get people thinking about a topic, is better epistemics than trying to get them to accept your conclusion directly.)
I feel pretty scared by the tone and implication of this comment. I’m extremely worried about selecting our arguments here for truth instead of for convincingness, and mentioning a type of propaganda and then talking about how we should use it to make people listen to our arguments feels incredibly symmetric. If the strength our arguments for why AI risk is real do not hinge on whether or not those arguments are centrallytrue, we should burn them with fire.
I get the concern and did wonder for a bit whether to include the second paragraph. But I also never suggested saying anything untrue, nor would I endorse saying anything that we couldn’t fully stand behind.
Also, if someone in the “AI is not an x-risk” camp were considering how to best convince me, I would endorse them using a similar technique of first introducing arguments that made maximal sense to me, and letting me think about their implications for a while before introducing arguments that led to conclusions I might otherwise reject before giving them a fair consideration. If everyone did that, then I would expect the most truthful arguments to win out.
On grounds of truth, I would be more concerned about attempts to directly get people to reach a particular conclusion, than ones that just shape their attention to specific topics. Suggesting people what they might want to think about leaves open the possibility that you might be mistaken and that they might see this and reject your arguments. I think this is a more ethical stance than one that starts out from “how do we get them from where they are to accepting x-risk in one leap”. (But I agree that the mention of propaganda gives the wrong impression—I’ll edit that part out.)
Yeah, I agree with Kaj here. We do need to avoid the risk of using misleading or dishonest communication. However it also seems fine and important to optimise relevant communication variables (e.g., tone, topic, timing, concision, relevance etc) to maximise positive impact.
Truthfully, I understand. And if we have longer timelines or alignment will happen by default, then I’d agree. Unfortunately the chance of these assumptions not being true is high enough that we should probably forsake the deontological rule and make the most convincing arguments. Remember, we need to convince those that already have biases against certain types of reasoning common on LW, and there’s no better option. I agree with Kaj Sotala’s advice.
Dentological rules like the truth are too constraining to work in a short timelines environment or the possibility of non-alignment by default.
I disagree with this, to be clear. I don’t think we should sacrifice truth, and the criticism I was responding to wasn’t that Steinhardt’s posts would be untrue.
Yeah, this is basically the thing I’m terrified about. If someone has been convinced of AI risk with arguments which do not track truth, then I find it incredibly hard to believe that they’d ever be able to contribute useful alignment research, not to mention the general fact that if you recruit using techniques that select for people with bad epistemics you will end up with a community with shitty epistemics and wonder what went wrong.
Here’s one factor that might push against the value of Steinhardt’s post as something to send to ML researchers: perhaps it is not arguing for anything controversial, and so is easier to defend convincingly. Steinhardt doesn’t explicitly make any claim about the possibility of existential risk, and barely mentions alignment. Gates spends the entire talk on alignment and existential risk, and might avoid being too speculative because their talk is about a survey of basically the same ML researcher population as the audience, and so can engage with the most important concerns, counterarguments, etc.
I’d guess that the typical ML researcher who reads Steinhardt’s blogposts will basically go on with their career unaffected, whereas one that watches the Gates video will now know that alignment is a real subfield of AI research, plus the basic arguments for catastrophic failure modes. Maybe they’ll even be on the lookout for impressive research from the alignment field, or empirical demonstrations of alignment problems.
Caveats: I’m far from a NeurIPS author and spent 10 minutes skipping through the video, so maybe all of this is wrong. Would love to see evidence one way or the other.
+1 to this, I feel like an important question to ask is “how much did this change your mind?”. I would probably swap the agree/disagree question for this?
I think the qualitative comments also bear this out as well:
This seems like people like AGI Safety arguments that don’t really cover AGI Safety concerns! I.e. the problem researchers have isn’t so much with the presentation but the content itself.
(Just a comment on some of the above, not all)
Agreed and thanks for pointing out here that each of these resources has different content, not just presentation, in addition to being aimed at different audiences. This seems important and not highlighted in the post.
We then get into what we want to do about that, where one of the major tricky things is the ongoing debate of “how much researchers need to be thinking in the frame of xrisk to make useful progress in alignment”, which seems like a pretty important crux, and another is “what do ML researchers think after consuming different kinds of content”, where Thomas has some hypotheses in the paragraph “I’d guess...” but we don’t actually have data on this and I can think of alternate hypotheses, which also seems quite cruxy.
On the other hand, there’s something to be said about introducing an argument in ways that are as maximally uncontroversial as possible, so that they smoothly fit into a person’s existing views but start to imply things that the person hasn’t considered yet. If something like the Steinhardt posts gets researchers thinking about related topics by themselves, then that might get them to a place where they’re more receptive to the x-risk arguments a few months or a year later—or even end up reinventing those arguments themselves.
I once saw a comment that went along the lines of “you can’t choose what conclusions people reach, but you can influence which topics they spend their time thinking about”. It might be more useful to get people thinking about alignment topics in general, than to immediately sell them on x-risk specifically. (Edited to add: not to mention that trying to get people thinking about a topic, is better epistemics than trying to get them to accept your conclusion directly.)
I feel pretty scared by the tone and implication of this comment. I’m extremely worried about selecting our arguments here for truth instead of for convincingness, and mentioning a type of propaganda and then talking about how we should use it to make people listen to our arguments feels incredibly symmetric. If the strength our arguments for why AI risk is real do not hinge on whether or not those arguments are centrally true, we should burn them with fire.
I get the concern and did wonder for a bit whether to include the second paragraph. But I also never suggested saying anything untrue, nor would I endorse saying anything that we couldn’t fully stand behind.
Also, if someone in the “AI is not an x-risk” camp were considering how to best convince me, I would endorse them using a similar technique of first introducing arguments that made maximal sense to me, and letting me think about their implications for a while before introducing arguments that led to conclusions I might otherwise reject before giving them a fair consideration. If everyone did that, then I would expect the most truthful arguments to win out.
On grounds of truth, I would be more concerned about attempts to directly get people to reach a particular conclusion, than ones that just shape their attention to specific topics. Suggesting people what they might want to think about leaves open the possibility that you might be mistaken and that they might see this and reject your arguments. I think this is a more ethical stance than one that starts out from “how do we get them from where they are to accepting x-risk in one leap”. (But I agree that the mention of propaganda gives the wrong impression—I’ll edit that part out.)
Cool, I feel a lot more comfortable with your elaboration; thank you!
Yeah, I agree with Kaj here. We do need to avoid the risk of using misleading or dishonest communication. However it also seems fine and important to optimise relevant communication variables (e.g., tone, topic, timing, concision, relevance etc) to maximise positive impact.
Truthfully, I understand. And if we have longer timelines or alignment will happen by default, then I’d agree. Unfortunately the chance of these assumptions not being true is high enough that we should probably forsake the deontological rule and make the most convincing arguments. Remember, we need to convince those that already have biases against certain types of reasoning common on LW, and there’s no better option. I agree with Kaj Sotala’s advice.
Dentological rules like the truth are too constraining to work in a short timelines environment or the possibility of non-alignment by default.
I disagree with this, to be clear. I don’t think we should sacrifice truth, and the criticism I was responding to wasn’t that Steinhardt’s posts would be untrue.
Yeah, this is basically the thing I’m terrified about. If someone has been convinced of AI risk with arguments which do not track truth, then I find it incredibly hard to believe that they’d ever be able to contribute useful alignment research, not to mention the general fact that if you recruit using techniques that select for people with bad epistemics you will end up with a community with shitty epistemics and wonder what went wrong.
“we must sacrifice the very thing we intend to create, alignment, in order to create it”
A nice rebuttal against my unpopular previous comment.
Or in other words, we can’t get them to accept conclusions we favor, but we can frame alignment in such a way that it just seems natural.