On the other hand, there’s something to be said about introducing an argument in ways that are as maximally uncontroversial as possible, so that they smoothly fit into a person’s existing views but start to imply things that the person hasn’t considered yet. If something like the Steinhardt posts gets researchers thinking about related topics by themselves, then that might get them to a place where they’re more receptive to the x-risk arguments a few months or a year later—or even end up reinventing those arguments themselves.
I once saw a comment that went along the lines of “you can’t choose what conclusions people reach, but you can influence which topics they spend their time thinking about”. It might be more useful to get people thinking about alignment topics in general, than to immediately sell them on x-risk specifically. (Edited to add: not to mention that trying to get people thinking about a topic, is better epistemics than trying to get them to accept your conclusion directly.)
I feel pretty scared by the tone and implication of this comment. I’m extremely worried about selecting our arguments here for truth instead of for convincingness, and mentioning a type of propaganda and then talking about how we should use it to make people listen to our arguments feels incredibly symmetric. If the strength our arguments for why AI risk is real do not hinge on whether or not those arguments are centrallytrue, we should burn them with fire.
I get the concern and did wonder for a bit whether to include the second paragraph. But I also never suggested saying anything untrue, nor would I endorse saying anything that we couldn’t fully stand behind.
Also, if someone in the “AI is not an x-risk” camp were considering how to best convince me, I would endorse them using a similar technique of first introducing arguments that made maximal sense to me, and letting me think about their implications for a while before introducing arguments that led to conclusions I might otherwise reject before giving them a fair consideration. If everyone did that, then I would expect the most truthful arguments to win out.
On grounds of truth, I would be more concerned about attempts to directly get people to reach a particular conclusion, than ones that just shape their attention to specific topics. Suggesting people what they might want to think about leaves open the possibility that you might be mistaken and that they might see this and reject your arguments. I think this is a more ethical stance than one that starts out from “how do we get them from where they are to accepting x-risk in one leap”. (But I agree that the mention of propaganda gives the wrong impression—I’ll edit that part out.)
Yeah, I agree with Kaj here. We do need to avoid the risk of using misleading or dishonest communication. However it also seems fine and important to optimise relevant communication variables (e.g., tone, topic, timing, concision, relevance etc) to maximise positive impact.
Truthfully, I understand. And if we have longer timelines or alignment will happen by default, then I’d agree. Unfortunately the chance of these assumptions not being true is high enough that we should probably forsake the deontological rule and make the most convincing arguments. Remember, we need to convince those that already have biases against certain types of reasoning common on LW, and there’s no better option. I agree with Kaj Sotala’s advice.
Dentological rules like the truth are too constraining to work in a short timelines environment or the possibility of non-alignment by default.
I disagree with this, to be clear. I don’t think we should sacrifice truth, and the criticism I was responding to wasn’t that Steinhardt’s posts would be untrue.
Yeah, this is basically the thing I’m terrified about. If someone has been convinced of AI risk with arguments which do not track truth, then I find it incredibly hard to believe that they’d ever be able to contribute useful alignment research, not to mention the general fact that if you recruit using techniques that select for people with bad epistemics you will end up with a community with shitty epistemics and wonder what went wrong.
On the other hand, there’s something to be said about introducing an argument in ways that are as maximally uncontroversial as possible, so that they smoothly fit into a person’s existing views but start to imply things that the person hasn’t considered yet. If something like the Steinhardt posts gets researchers thinking about related topics by themselves, then that might get them to a place where they’re more receptive to the x-risk arguments a few months or a year later—or even end up reinventing those arguments themselves.
I once saw a comment that went along the lines of “you can’t choose what conclusions people reach, but you can influence which topics they spend their time thinking about”. It might be more useful to get people thinking about alignment topics in general, than to immediately sell them on x-risk specifically. (Edited to add: not to mention that trying to get people thinking about a topic, is better epistemics than trying to get them to accept your conclusion directly.)
I feel pretty scared by the tone and implication of this comment. I’m extremely worried about selecting our arguments here for truth instead of for convincingness, and mentioning a type of propaganda and then talking about how we should use it to make people listen to our arguments feels incredibly symmetric. If the strength our arguments for why AI risk is real do not hinge on whether or not those arguments are centrally true, we should burn them with fire.
I get the concern and did wonder for a bit whether to include the second paragraph. But I also never suggested saying anything untrue, nor would I endorse saying anything that we couldn’t fully stand behind.
Also, if someone in the “AI is not an x-risk” camp were considering how to best convince me, I would endorse them using a similar technique of first introducing arguments that made maximal sense to me, and letting me think about their implications for a while before introducing arguments that led to conclusions I might otherwise reject before giving them a fair consideration. If everyone did that, then I would expect the most truthful arguments to win out.
On grounds of truth, I would be more concerned about attempts to directly get people to reach a particular conclusion, than ones that just shape their attention to specific topics. Suggesting people what they might want to think about leaves open the possibility that you might be mistaken and that they might see this and reject your arguments. I think this is a more ethical stance than one that starts out from “how do we get them from where they are to accepting x-risk in one leap”. (But I agree that the mention of propaganda gives the wrong impression—I’ll edit that part out.)
Cool, I feel a lot more comfortable with your elaboration; thank you!
Yeah, I agree with Kaj here. We do need to avoid the risk of using misleading or dishonest communication. However it also seems fine and important to optimise relevant communication variables (e.g., tone, topic, timing, concision, relevance etc) to maximise positive impact.
Truthfully, I understand. And if we have longer timelines or alignment will happen by default, then I’d agree. Unfortunately the chance of these assumptions not being true is high enough that we should probably forsake the deontological rule and make the most convincing arguments. Remember, we need to convince those that already have biases against certain types of reasoning common on LW, and there’s no better option. I agree with Kaj Sotala’s advice.
Dentological rules like the truth are too constraining to work in a short timelines environment or the possibility of non-alignment by default.
I disagree with this, to be clear. I don’t think we should sacrifice truth, and the criticism I was responding to wasn’t that Steinhardt’s posts would be untrue.
Yeah, this is basically the thing I’m terrified about. If someone has been convinced of AI risk with arguments which do not track truth, then I find it incredibly hard to believe that they’d ever be able to contribute useful alignment research, not to mention the general fact that if you recruit using techniques that select for people with bad epistemics you will end up with a community with shitty epistemics and wonder what went wrong.
“we must sacrifice the very thing we intend to create, alignment, in order to create it”
A nice rebuttal against my unpopular previous comment.
Or in other words, we can’t get them to accept conclusions we favor, but we can frame alignment in such a way that it just seems natural.