SM = can I extract something useful for myself from the other side’s arguments
With ITT, my goal is to have a good model of the other side. That can be instrumentally useful to predict their behavior, or to win more debates against them (because they are less likely to surprise me). That is, ITT is socially motivated activity. If I knew that the other side will disappear tomorrow and no one will want to talk about them, ITT would be a waste of time.
With SM, my goal is to improve my model of the world. The other side is unimportant, except as a potential source of true information that may be currently in my blind spot. That is, SM is a selfishly motivated activity. Whether the other side approves of my steelman of them, is irrelevant; my activity is not aimed at them.
SM is trying to find a diamond in a heap of dung. ITT is learning to simulate someone who enjoys the dung.
I would further state that steelmanning is something you don’t usually do in an active debate. Certainly not an in-person debate; anytime you start to say “I think your argument would be better stated as [...]”, in person that should immediately be checked with your interlocutor. If it’s more like a series of letters exchanged, or articles published, then that time-delay is some justification for doing interpretive work on your side… But at the very least, I think you should call out explicitly every time you’re ignoring one argument in favor of one you think is better, for legibility and the opportunity for your interlocutor to say “No, that argument is a worse one.” Steelmanning is best when it’s you vs a flood of statistically bogus articles by partisan hacks and you’re trying to keep yourself grounded.
Peter Boghossian has been doing activities where, having found two people on opposing sides of an issue, he asks both of them to write down their best reason for their side, then asks each person to guess what the other one wrote down, and asks the other person “Is that the reason you wrote?”, and, if not (which it’s usually not), “Is that better than the reason you wrote?” (which I don’t remember ever being a yes). It’s an interesting approach. Example.
By the way, separately from beliefs, another issue is values. Suppose you favor policy X, because it helps with value A, and your opponent favors policy Y, because it helps with value B; suppose you care a lot about A, and care little about B, and your opponent is vice-versa on that. Then, when you “steelman” the argument for Y, you might say “Well, someone who favored Y might think that it helps with A, and I guess a naive look would indeed support that, but if we dig into the details we conclude strongly that X is much better for A. Case closed.”
Trouble is that even checking the steelman with the other person does not avoid the failure modes I am talking about. In fact, some moments ago, I made slight changes to the post to include a bit where the interlocutor presents a proposed steelman and you reject it. I included this because many redditors objected that this is by definition part of steelmanning (though none of the cited definitions actually included this criterion), and so I wanted to show that it makes no difference at all to my argument whether the interlocutor asks for confirmation of the steelman versus you becoming aware of it by some other mechanism. What’s relevant is only that you somehow learn of the steelman attempt, reject it as inadequate, and try to redirect your interlocutor back to the actual argument you made. The precise social forms by which this happens (the ideal being something like “would the following be an acceptable steelman [...]”) are only dressing, not substance.
I have in fact had a very long email conversation spanning several months with another LessWronger who kept constructing would-be steelmen of my argument that I kept having to correct.
As it was a private conversation, I cannot give too many details, but I can try to summarize the general gist
I and this user are part of a shared IRL social network, which I have been feeling increasingly alienated from, but which I cannot simply leave without severe consequences. Trouble is that this social network generally treats me with extreme condescension, disdain, patronisation, etc, and that I am constrained in my ability to fight back in my usual manner. I am not so concerned about the underlying contempt, except for its part in creating the objectionable behaviour. It seems to me that they must subconsciously have extreme contempt for me, but since I do not respect their judgement of me, my self-esteem is not harmed by this knowledge. The real problem is that situations where I am treated with contempt and cannot defend myself from it, but must remain polite and simply take it, provide a kind of evidence to my autonomous unconscious status tracking processes (what JBP claims to be the function of the serotoninergic system, though idk if this is true at all), and that this is not so easily overridden by my own contempt for their poor judgement as my conscious reasoning about their disdain for me is.
I repeatedly explained to this LessWrong user that the issue is that these situations provide evidence for contempt for me, and that since I am constrained in my ability to talk back, they also provide systematically false evidence about my level of self respect and about how I deserve to be treated. Speaking somewhat metaphorically, you could say that this social network is inadvertently using black magic against me and that I want them to stop. It might seem that this position could be easily explained, and indeed that was how it seemed to me too at the outset of the conversation, but it was complicated by the need to demonstrate that I was in fact being treated contemptuously, and that I was in fact being constrained in my ability to defend myself against it. It was not enough to give specific examples of the treatment, because that led my interlocutor to overly narrow abstractions, so I had to point out that the specific instances of contemptuous treatment demonstrated the existence of underlying contempt, and that this underlying contempt should a priori be expected to generate a large variety of contemptuous behaviour. This in turn led to a very tedious argument over whether that underlying contempt exists at all, where it would’ve come from, etc.
Anyway, I eventually approached another member of this social network and tried to explain my predicament. It was tricky, because I had to accuse him of an underlying contempt giving rise to a pattern of disrespectful behaviour, but also explain that it was the behaviour itself I was objecting to and not the underlying contempt, all without telling him explicitly that I do not respect his judgement. Astonishingly, I actually made a lot of progress anyway.
Well, that didn’t last long, because the LW user in question took it into his own hands to attempt to fix the schism, and told this man that if I am objecting to a pattern of disrespectful behaviour, then it is unreasonable to assume that I am objecting to the evidence of disrespect, rather than the underlying disrespect itself. You will notice that this is exactly the 180 degree opposite of my actual position. It also had the effect of cutting off my chance at making any further progress with the man in question, since it is now to my eyes impossible to explain what I actually object to without telling him outright that I have no respect for his judgement.
I am sure he thought he was being reasonable. After all, absent the context, it would seem like a perfectly reasonable observation. But as there were other problems with his behaviour that made it seem smug and self righteous to me, and as the whole conversation up to that point had already been so maddening and let to so much disaster (it seems in fact to have played a major part in causing extreme mental harm to someone who was quite close to me), I decided to cut my losses and not pursue it any further, except for scolding him for what seemed to me like the breach of an oath he had given earlier.
Anyway, the point is not to generalise too much from this example. What I described in the post was actually inspired by other scenarios. The point of telling you this story is simply that even if you are presented with the interlocutor’s proposed steelman and given a chance to reject it, this does not save you, and the conversation can still go on for literally months and not get out of the trap I described. I have had other examples of this trap being highly persistent, even with people who were more consistent in explicitly asking for confirmation of each proposed steelman, but what was special about this case was that it was the only one that lasted for literally months with hundreds of emails, that my interlocutor started out with a stated intent to see the conversation through to the end, and that my interlocutor was a fairly prolific LessWrong commenter and poster, whom I would rate as being at least in the top 5% and probably top 1% of smartest LessWrongers
I should mention for transparency that the LessWrong user in question did not state outright that he was steelmanning me, but having been around in this community for a long time, I think I am able to tell which behaviours are borne out of an attempt to steelman, or more broadly, which behaviours spring from the general culture of steelmanning and of being habituated to a steelman-esque mode of discourse. As my post indicated, I think steelmanning is a reasonable way to get to a more expedient resolution between people who broadly speaking “share base realities”, but as someone with views that are highly heterodox relative to the dominant worldviews on LessWrong, I can say that my own experience with steelmanning has been that it is one of the nastiest forms of argumentation I know of.
I focused on the practice of steelmanning as emblematic of a whole approach to thinking about good faith that I believe is wrongheaded more generally and not only pertaining to steelmanning. In hindsight, I should have stated this. I considered doing so, but decided to make it the subject of a subsequent post, and I didn’t notice that making a more in-depth post about the abstract pattern does not preclude me from making a brief mention in this post that steelmanning is only one instance of a more general pattern I am trying to critique.
The pattern is simply to focus excessively on behaviours and specific arguments as being in bad faith, and paying insufficient attention to the emotional drivers of being in bad faith, which also tend to make people go into denial about their bad faith.
Indeed, that was the purpose of steelmanning in its original form, as it was pioneered on Slate Star Codex.
Interestingly, when I posted it on r/slatestarcodex, a lot of people started basically screaming at me that I am strawmanning the concept of steelmanning, because a steelman by definition requires that the person you’re steelmanning accepts the proposed steelman as accurate. Hence, your comment provides me some fresh relief and assures me that there is still a vestige left of the rationalist community I used to know.
I wrote my article mostly concerning how I see the word colloquially used today. I intended it as one of several posts demonstrating a general pattern of bad faith argumentation that disguises itself as exceptionally good faith.
But setting all that aside, I think my critique still substantially applies to the concept in its original form. It is still the case, for example, that superficial mistakes will tend to be corrected automatically just from the general circulation of ideas within a community, and that the really persistent errors have to do with deeper distortions in the underlying worldview.
Worldviews are however basically analogous to scientific paradigms as described by Thomas Kuhn. People do not adopt a complicated worldview without it seeming vividly correct from at least some angle, however parochial that angle might be. Hence, the only correct way to resolve a deep conflict between worldviews is by the acquisition of a broader perspective that subsumes both. Of course, either worldview, or both, may be a mixture of real patterns coupled with a bunch of propaganda, but in such a case, the worldview that subsumes both should ideally be able to explain why that propaganda was created and why it seems vividly believable to its adherents.
At first glance, this might not seem to pose much of a problem for the practice of steelmanning in its original form, because in many cases it will seem like you can completely subsume the “grain of truth” from the other perspective into your own without any substantial conflict. But that would basically classify it as a “superficial improvement”, the kind that is bound to happen automatically just from the general circulation of ideas, and therefore less important than the less inevitable improvements. But if an improvement of this sort is not inevitable, it indicates that your current social network cannot generate the improvement on its own, but instead can only generate it through confrontations with conflicting worldviews from outside your main social network, and that means that your existing worldview cannot properly explain the grain of truth from the opposing view, since it could not predict it in advance, which means there is more to learn from this outside perspective than can be learned by straightforwardly integrating its apparent grain of truth.
This is basically the same pattern I am describing in the post, but just removed from the context of conversations between individuals, and instead applied to confrontations between different social networks with low-ish overlap. The argument is substantially the same, only less concrete.
Steelmanning is not the Ideological Turing Test.
ITT = simulating the other side
SM = can I extract something useful for myself from the other side’s arguments
With ITT, my goal is to have a good model of the other side. That can be instrumentally useful to predict their behavior, or to win more debates against them (because they are less likely to surprise me). That is, ITT is socially motivated activity. If I knew that the other side will disappear tomorrow and no one will want to talk about them, ITT would be a waste of time.
With SM, my goal is to improve my model of the world. The other side is unimportant, except as a potential source of true information that may be currently in my blind spot. That is, SM is a selfishly motivated activity. Whether the other side approves of my steelman of them, is irrelevant; my activity is not aimed at them.
SM is trying to find a diamond in a heap of dung. ITT is learning to simulate someone who enjoys the dung.
I would further state that steelmanning is something you don’t usually do in an active debate. Certainly not an in-person debate; anytime you start to say “I think your argument would be better stated as [...]”, in person that should immediately be checked with your interlocutor. If it’s more like a series of letters exchanged, or articles published, then that time-delay is some justification for doing interpretive work on your side… But at the very least, I think you should call out explicitly every time you’re ignoring one argument in favor of one you think is better, for legibility and the opportunity for your interlocutor to say “No, that argument is a worse one.” Steelmanning is best when it’s you vs a flood of statistically bogus articles by partisan hacks and you’re trying to keep yourself grounded.
Peter Boghossian has been doing activities where, having found two people on opposing sides of an issue, he asks both of them to write down their best reason for their side, then asks each person to guess what the other one wrote down, and asks the other person “Is that the reason you wrote?”, and, if not (which it’s usually not), “Is that better than the reason you wrote?” (which I don’t remember ever being a yes). It’s an interesting approach. Example.
By the way, separately from beliefs, another issue is values. Suppose you favor policy X, because it helps with value A, and your opponent favors policy Y, because it helps with value B; suppose you care a lot about A, and care little about B, and your opponent is vice-versa on that. Then, when you “steelman” the argument for Y, you might say “Well, someone who favored Y might think that it helps with A, and I guess a naive look would indeed support that, but if we dig into the details we conclude strongly that X is much better for A. Case closed.”
Trouble is that even checking the steelman with the other person does not avoid the failure modes I am talking about. In fact, some moments ago, I made slight changes to the post to include a bit where the interlocutor presents a proposed steelman and you reject it. I included this because many redditors objected that this is by definition part of steelmanning (though none of the cited definitions actually included this criterion), and so I wanted to show that it makes no difference at all to my argument whether the interlocutor asks for confirmation of the steelman versus you becoming aware of it by some other mechanism. What’s relevant is only that you somehow learn of the steelman attempt, reject it as inadequate, and try to redirect your interlocutor back to the actual argument you made. The precise social forms by which this happens (the ideal being something like “would the following be an acceptable steelman [...]”) are only dressing, not substance.
I have in fact had a very long email conversation spanning several months with another LessWronger who kept constructing would-be steelmen of my argument that I kept having to correct.
As it was a private conversation, I cannot give too many details, but I can try to summarize the general gist
I and this user are part of a shared IRL social network, which I have been feeling increasingly alienated from, but which I cannot simply leave without severe consequences. Trouble is that this social network generally treats me with extreme condescension, disdain, patronisation, etc, and that I am constrained in my ability to fight back in my usual manner. I am not so concerned about the underlying contempt, except for its part in creating the objectionable behaviour. It seems to me that they must subconsciously have extreme contempt for me, but since I do not respect their judgement of me, my self-esteem is not harmed by this knowledge. The real problem is that situations where I am treated with contempt and cannot defend myself from it, but must remain polite and simply take it, provide a kind of evidence to my autonomous unconscious status tracking processes (what JBP claims to be the function of the serotoninergic system, though idk if this is true at all), and that this is not so easily overridden by my own contempt for their poor judgement as my conscious reasoning about their disdain for me is.
I repeatedly explained to this LessWrong user that the issue is that these situations provide evidence for contempt for me, and that since I am constrained in my ability to talk back, they also provide systematically false evidence about my level of self respect and about how I deserve to be treated. Speaking somewhat metaphorically, you could say that this social network is inadvertently using black magic against me and that I want them to stop. It might seem that this position could be easily explained, and indeed that was how it seemed to me too at the outset of the conversation, but it was complicated by the need to demonstrate that I was in fact being treated contemptuously, and that I was in fact being constrained in my ability to defend myself against it. It was not enough to give specific examples of the treatment, because that led my interlocutor to overly narrow abstractions, so I had to point out that the specific instances of contemptuous treatment demonstrated the existence of underlying contempt, and that this underlying contempt should a priori be expected to generate a large variety of contemptuous behaviour. This in turn led to a very tedious argument over whether that underlying contempt exists at all, where it would’ve come from, etc.
Anyway, I eventually approached another member of this social network and tried to explain my predicament. It was tricky, because I had to accuse him of an underlying contempt giving rise to a pattern of disrespectful behaviour, but also explain that it was the behaviour itself I was objecting to and not the underlying contempt, all without telling him explicitly that I do not respect his judgement. Astonishingly, I actually made a lot of progress anyway.
Well, that didn’t last long, because the LW user in question took it into his own hands to attempt to fix the schism, and told this man that if I am objecting to a pattern of disrespectful behaviour, then it is unreasonable to assume that I am objecting to the evidence of disrespect, rather than the underlying disrespect itself. You will notice that this is exactly the 180 degree opposite of my actual position. It also had the effect of cutting off my chance at making any further progress with the man in question, since it is now to my eyes impossible to explain what I actually object to without telling him outright that I have no respect for his judgement.
I am sure he thought he was being reasonable. After all, absent the context, it would seem like a perfectly reasonable observation. But as there were other problems with his behaviour that made it seem smug and self righteous to me, and as the whole conversation up to that point had already been so maddening and let to so much disaster (it seems in fact to have played a major part in causing extreme mental harm to someone who was quite close to me), I decided to cut my losses and not pursue it any further, except for scolding him for what seemed to me like the breach of an oath he had given earlier.
Anyway, the point is not to generalise too much from this example. What I described in the post was actually inspired by other scenarios. The point of telling you this story is simply that even if you are presented with the interlocutor’s proposed steelman and given a chance to reject it, this does not save you, and the conversation can still go on for literally months and not get out of the trap I described. I have had other examples of this trap being highly persistent, even with people who were more consistent in explicitly asking for confirmation of each proposed steelman, but what was special about this case was that it was the only one that lasted for literally months with hundreds of emails, that my interlocutor started out with a stated intent to see the conversation through to the end, and that my interlocutor was a fairly prolific LessWrong commenter and poster, whom I would rate as being at least in the top 5% and probably top 1% of smartest LessWrongers
I should mention for transparency that the LessWrong user in question did not state outright that he was steelmanning me, but having been around in this community for a long time, I think I am able to tell which behaviours are borne out of an attempt to steelman, or more broadly, which behaviours spring from the general culture of steelmanning and of being habituated to a steelman-esque mode of discourse. As my post indicated, I think steelmanning is a reasonable way to get to a more expedient resolution between people who broadly speaking “share base realities”, but as someone with views that are highly heterodox relative to the dominant worldviews on LessWrong, I can say that my own experience with steelmanning has been that it is one of the nastiest forms of argumentation I know of.
I focused on the practice of steelmanning as emblematic of a whole approach to thinking about good faith that I believe is wrongheaded more generally and not only pertaining to steelmanning. In hindsight, I should have stated this. I considered doing so, but decided to make it the subject of a subsequent post, and I didn’t notice that making a more in-depth post about the abstract pattern does not preclude me from making a brief mention in this post that steelmanning is only one instance of a more general pattern I am trying to critique.
The pattern is simply to focus excessively on behaviours and specific arguments as being in bad faith, and paying insufficient attention to the emotional drivers of being in bad faith, which also tend to make people go into denial about their bad faith.
Indeed, that was the purpose of steelmanning in its original form, as it was pioneered on Slate Star Codex.
Interestingly, when I posted it on r/slatestarcodex, a lot of people started basically screaming at me that I am strawmanning the concept of steelmanning, because a steelman by definition requires that the person you’re steelmanning accepts the proposed steelman as accurate. Hence, your comment provides me some fresh relief and assures me that there is still a vestige left of the rationalist community I used to know.
I wrote my article mostly concerning how I see the word colloquially used today. I intended it as one of several posts demonstrating a general pattern of bad faith argumentation that disguises itself as exceptionally good faith.
But setting all that aside, I think my critique still substantially applies to the concept in its original form. It is still the case, for example, that superficial mistakes will tend to be corrected automatically just from the general circulation of ideas within a community, and that the really persistent errors have to do with deeper distortions in the underlying worldview.
Worldviews are however basically analogous to scientific paradigms as described by Thomas Kuhn. People do not adopt a complicated worldview without it seeming vividly correct from at least some angle, however parochial that angle might be. Hence, the only correct way to resolve a deep conflict between worldviews is by the acquisition of a broader perspective that subsumes both. Of course, either worldview, or both, may be a mixture of real patterns coupled with a bunch of propaganda, but in such a case, the worldview that subsumes both should ideally be able to explain why that propaganda was created and why it seems vividly believable to its adherents.
At first glance, this might not seem to pose much of a problem for the practice of steelmanning in its original form, because in many cases it will seem like you can completely subsume the “grain of truth” from the other perspective into your own without any substantial conflict. But that would basically classify it as a “superficial improvement”, the kind that is bound to happen automatically just from the general circulation of ideas, and therefore less important than the less inevitable improvements. But if an improvement of this sort is not inevitable, it indicates that your current social network cannot generate the improvement on its own, but instead can only generate it through confrontations with conflicting worldviews from outside your main social network, and that means that your existing worldview cannot properly explain the grain of truth from the opposing view, since it could not predict it in advance, which means there is more to learn from this outside perspective than can be learned by straightforwardly integrating its apparent grain of truth.
This is basically the same pattern I am describing in the post, but just removed from the context of conversations between individuals, and instead applied to confrontations between different social networks with low-ish overlap. The argument is substantially the same, only less concrete.