Inward and outward steelmanning

The usual steelmanning is when you try to find the best possible arguments for a position you don’t like. Or the hardest to refute version of an idea you don’t like. Let’s call this “inward steelmanning”. You’re mostly fixing internal cogs of arguments and positions.

I want to propose a different idea, “outward steelmanning”. In this approach to steelmanning you’re imagining a world (similar enough to our own on the surface) where the idea you don’t like is true, then you try to see if this world resonates with the real one in any interesting way. You treat arguments and positions as black boxes and focus solely on their implications.

Imagine that you encountered a car with square wheels

Inward steelmanning: “This is an abomination! It doesn’t work! But maybe with round wheels it would be beautiful. Or maybe a different vehicle with square wheels could be beautiful.”

Outward steelmanning: “This is ugly! It doesn’t work! But maybe if I imagine a world where this car works, it will change my standards of beauty. Maybe I will gain some insight about this world that I’m missing.”

If you want to be charitable, why not grant your opponent an entire universe with its own set of rules?

Jokes aside, I’ll explain the goals of outward steelmanning in more detail along the way.

Thought experiments

Extremism In Thought Experiment Is No Vice

Scott Alexander describes a world where research about morality works like high energy physics: the more extreme (disgusting) situations you imagine, the better you understand the laws of morality. Let’s imagine a world where the opposite idea is true.

I imagine a world where you use your moral intuitions to navigate the chaotic space of all possible moral situations. In such world you may need to seek situations where you have the chance to do a good thing, not where you’re forced to do disgusting things no matter what. The former situations contain discernible pieces of general moral rules, not the latter ones. Moral intuitions are your torchlight and you shouldn’t try to destroy the torch before you found the details of a flashlight.

We can also imagine a “hybrid world”, where both approaches need to be combined. You need to look where the logic inferred from disgusting and unfair situations leads you and where the logic inferred from more optimistic situations leads you. Then you make some meta-inference about general moral rules. Or maybe you just need to seek pieces of general moral rules at different depths of the “disgustingness” scale.

And we can consider a “reductio ad absurdum” world: in such world the more disgusting detail you imagine, the better you know morality. You torture yourself with those thoughts.

And there’s also a “flipped” version of Scott’s world: there the most extreme cases are the easiest ones to solve. In such world not extremeness and direct conflict of intuitions make a moral problem hard. This world would be similar to our own because we don’t deal with the most extreme cases.

I believe that all worlds hint where possible insights can be, no matter what specific world ends up more true. The absurd world may teach you what gives disgust moral meaning. The “anti morbid thought experiments” world may teach you how our moral intuitions paint the space of all possible choices. “Hybrid” world may tell you if there’re meta properties of morality. And the Scott’s world and its flipped version may teach us what properties of reality affect properties of morality (and how). Let’s take the best of all worlds!

I think the article would be more interesting if it discussed all those worlds. And it would cover most of the responses in the comments (e.g. “not extreme situations are important too”, “nuances are important”, “at some point disgusting details become meaningless”). Scott explores only 1 analogy, one possible mechanism of finding general moral rules and one aspect of the question.

So, I think outward steelmanning can help you predict where you can gain insight and find supporting arguments/objections.

God and morality

I think you can use outward steelmanning to extract food for thought from the most “stupid” arguments. Or to perform some sort of Hegelian dialectic where you synthesize ideas from different counterfactual worlds.

“If God/objective morality doesn’t exist, why not hurt someone just for fun?”. Many people think that this argument doesn’t make sense, because there’s A LOT of strong reasons to not be an awful person beyond God and objective morality. But you can imagine a world where this question makes sense.

In such world there can be multiple reasons to (not) do something that affect each other in non-obvious ways. Like ingredients of a dish. It makes sense to keep seeking for reasons even if many of them are obvious and strong enough.

Or maybe in such world “strength” of a reason is not an obvious concept. For example, in a bit weirder scenario different reasons to (not) do something are like different Artificial Intelligences. Even though your moral AI is very strong and works incredibly well you want to know that it is perfect and won’t backstab you at some point. So no matter how strong and good your reasons are you continue your investigation because nothing is enough. The question is too important for any finite amount of evidence to be enough.

You can also imagine a “flipped” world where you don’t want to choose reasons based on their apparent strength. In such world the best reasons for (not) doing something are the most weak and least comprehensible in some aspects. And your selfless desire to save your friends ends up way more important than your selfish love for them. Even though it’s your weakest emotion towards them: you want to see your friends as special, not as some strangers.

Maybe after such steelmanning you can extract more food for thought from arguments from morality. I didn’t analyze the top arguments here, but that’s the point of steelmanning: you make imperfect arguments stronger.

I like the idea that you don’t need God or abstract objective rules to have morality. But I like some ideas in the argument above very much.

I think about the world where something objective and binding comes not from abstract rules, but from the existence of other minds itself. And the desire to not hurt them is not just a preference.

No Stupid Arguments

So, one unusual consequence of “outward steelmanning” philosophy is that there’re no bad arguments, only arguments that give more or less potential insight about the real world.

But that depends on your knowledge and opinions. Thus quality of an argument is only a function of your perspective.

Arguments are mechanisms and each mechanism works in a certain world.

Invisible Cities, Einstein’s Dreams

Maybe those novels can help you get in the mood of “outward steelmanning”: using fantasy you can tease out your intuitions about the real world.

Those novels create fantasy worlds that reveal something about the human nature… or make you think about the human nature. Reveal something you care about or want to know more about.

Einstein’s Dreams by Alan Lightman

10 June 1905

Suppose that time is not a quantity but a quality, like the luminescence of the night above the trees just when a rising moon has touched the treeline. Time exists, but it cannot be measured.

The man and woman follow a winding path of small white stones to a restaurant on a hill. Have they been together a lifetime, or only a moment? Who can say?

Invisible Cities by Italo Calvino

Cities and the Dead 2, Adelma

“You reach a moment in life when, among the people you have known, the dead outnumber the living. And the mind refuses to accept more faces, more expressions: on every new face you encounter, it prints the old forms, for each one it finds the most suitable mask.”

I’ll quote Jorge Luis Borges later.

Irrationality

I think outward steelmanning can help to better understand people’s feelings about something.

What Are We Arguing About When We Argue About Rationality?

Scott Alexander analyzes why some people don’t 100% agree with rationality. In order to understand, Scott defines rationality as the “study of study” and imagines a world where rationality never leads to an advantage. But I think you can imagine a world where the “study of study” sometimes can’t be performed at all.

Maybe rationality is what we’re doing right now—trying to figure out the proper role of explicit computation vs. intuition vs. heuristics. In this sense, it would be the study of how to best find truth.

This matches a throwaway line I made above—that the most rational answer to the “explicit computation vs. heuristics” question is to try both and see which works better.

In this irrational world there exist events in which you can’t test both method A (computation) and method B (intuition) and see what method works best. Can’t make a randomized controlled trial. Or maybe in this world similarity between events is subjective (reference class problem) and this gets in the way of judging methods.

In such world there may be people who explain and share their reasoning, but that reasoning doesn’t become “study of study”. Or that “study of study” doesn’t become anything similar to explicit computations.

You can also imagine a world where you can’t separate the study from the “study of study”. There’re different types of truth and to study a certain type you need to have the experience of successfully finding it. Or maybe each truth has 2 parts and the “study of study” can capture only one part. It can teach you to find the diamonds in the ground, but it can’t teach you to “feel the diamonds”. And the latter skill may turn out to be more relevant in other fields. So “theory” and “practice” benefit each other, but can be in constant tension and can even harm each other sometimes. Analogy: GPT-3 learned the theory of completing human sentences perfectly, but it didn’t learn what humans do when they compose sentences (a more abstract skill). GPT-3 got one half of the truth, but not the other.

Conclusion: I think exploring those “irrational worlds” can bring more insight about people’s opinions and feelings about rationality.

Arguments as soldiers

You can use outward steelmanning to check if 2 things really contradict each other. For example:

Politics is the Mind-Killer

In this article Eliezer Yudkowsky describes a world where people stop seeking the truth because they start to associate arguments with opposing tribes. Arguments are not instruments for searching the truth anymore.

Politics is an extension of war by other means. Arguments are soldiers. Once you know which side you’re on, you must support all arguments of that side, and attack all arguments that appear to favor the enemy side; otherwise it’s like stabbing your soldiers in the back—providing aid and comfort to the enemy. People who would be level-headed about evenhandedly weighing all sides of an issue in their professional life as scientists, can suddenly turn into slogan-chanting zombies when there’s a Blue or Green position on an issue.

But I can easily imagine a world where arguments are somewhat like living beings and it doesn’t contradict truth seeking.

In such world arguments feed on micro-assumptions, values, charity and whatnot. And strength of your argument can decrease because of the opponent’s bad faith or immoral decisions. You may get upset that your argument got stabbed precisely because you care about the truth.

I think without investigating this world the critique in the article (“evolution and tribes lead you to bad debate practices”) is very incomplete. Because there’s no contradiction between tribal debates and truth seeking in all counterfactual worlds. Even if those counterfactual worlds are only possible in contrived simulations.

Least convenient possible world

You can view outward steelmanning as a development of the Least convenient possible world technique.

But the latter is often more focused on realistic worlds or modifying some “conditions” of the world rather than the way some “processes” work in the world.

For an example of changing the way a process works see Newtonian Ethics.

We often refer to morality as being a force; for example, some charity is “a force for good” or some argument “has great moral force”. But which force is it?
Consider the possibility that it is gravity.
We can confirm this to the case by investigating inverse square laws. If morality is indeed an unusual form of gravitation, it will vary with the square of the distance between two objects.

Misaligned Romans Lost in Time

Dangers of steelmanning / principle of charity by gothgirl420666

The author describes one of the problems with the usual steelmanning:

Sticking rigidly to a certain worldview/paradigm/established belief set, even as you find yourself willing to consider more and more concrete propositions. The Roman would have done better to really read what the modern progressive’s logic was, think about it, and try to see where he was coming from than to automatically filter it through his own worldview. If he consistently does this he will never find himself considering alternative ways of seeing the world that might be better.

The author gives a political example about a Roman, but I want to give a non-political example about aliens.

Inward steelmanning:

Alien 1: I should eat tasty humans!

Human: No, you should help them!

Alien 1: That’s nonsense! Wait… if I help them they can become more tasty later.

Human: No, you should help humans out of kindness!

Alien 1: That’s strange… but I guess kindness gives you the intuition how to make humans even more tasty. You get the biggest reward when you don’t expect the reward.

Outward steelmanning:

Alien 2: I should eat tasty humans!

Human: No, you should help them!

Alien 2: That’s strange… but maybe helping humans just works. You want me to treat others like I treat myself (I don’t eat myself). I may eat something else. I may value a different kind of tastiness or beauty instead of tastiness. I don’t need to destroy beautiful things to enjoy beauty.

Outward steelmanning isn’t immune to wrong translations of other worldviews, but maybe it does a better job.

All those caricatures vaguely remind me of this question: how to realize that you’re missing a basic experience? What Universal Human Experiences Are You Missing Without Realizing It? (by Scott Alexander)

If a person lacks an experience, said person may come up with a more complicated substitute for this experience.

King of Your Castle

I think you can use outward steelmanning to find the most interesting parts of an idea.

Robin Hanson on Lex Fridman Podcast, 2:08:01

What Hypocrisy Feels Like

Robin Hanson has this idea: the brain has 2 parts, one part makes selfish decisions, the other part (conscious) explains those decisions as altruistic to the public (and to itself). “You don’t make decisions, you justify them to an audience”. But you can imagine a world where humans have a similar brain architecture and don’t use it for self-deception.

In such world you often can’t make fully conscious decisions (decisions are black boxes). Similar to how you can’t fully control your body. But you can see if the resulting behaviour makes sense. So you filter out nonsensical behaviour.

Or you can imagine a world where motives of actions are complex, and when people accept a “story” about your action, they make this story true. No deception is occurring. Some unknown true hidden motives just don’t exist.

My main conclusion is this: the general idea “there can be unusual mechanisms of connecting actions and justifications” is infinitely more interesting than the specific idea that humans use this for lying and self-deception.

My smaller conclusion: maybe Robin Hanson should focus more on finding where exactly the dissonance between motives (of different brain parts) and lies come from. If the dissonance/lying doesn’t appear in all counterfactual worlds with this brain architecture, where exactly does it come from? And if we got a more specific “lying architecture”, how did we get it, how much of the more general architecture can we emulate, can we upgrade to a more general architecture?

Rationalization

Rationalization by Eliezer Yudkowsky

One of the goals of outward steelmanning is to seek “true differences”. If you can imagine a world where a difference you found doesn’t make sense, this is not a “true difference”. For example:

Not every change is an improvement, but every improvement is necessarily a change. You cannot obtain more truth for a fixed proposition by arguing it; you can make more people believe it, but you cannot make it more true. To improve our beliefs, we must necessarily change our beliefs.

I can imagine a world where arguing a proposition can make it more true. In such world arguing a proposition transforms it. So your beliefs do change anyway.

“Rationality” is the forward flow that gathers evidence, weighs it, and outputs a conclusion.

“Rationalization” is a backward flow from conclusion to selected evidence.

I fear that Traditional Rationality does not properly sensitize its users to the difference between forward flow and backward flow.

I can imagine a world where there’s no difference between “forward” and “backward” flow/those flows are connected and happen at the same time.

For me those fictional worlds mean that Eliezer haven’t drawn the true line that separates rationality from irrationality, only gave a first approximation. Because in some backward thinking worlds rationality still exists.

Wishful thinking

Outward steelmanning can help you to guess the context of an idea.

If I told you “maybe wishful thinking is not explored enough”, you probably wouldn’t understand what I mean and would have zero context. Is it about religion? Is it about politics? I would need to explain a lot and it would be hard for you to listen if you disagree.

But maybe outward steelmanning can give you all the context: just imagine a world where wishful thinking works. Remember, this world needs to look similar to our own.

I imagine a world where wishes work like theories: there’re abstract and specific wishes, good and bad wishes. And actual theories, ideologies and arguments are based on wishes. Here’s a picture of such world:

People think more about what they want and what wishes are. What wishes are random and what are deep? There’s more philosophy about wishes.

Ideologies are based on abstract wishes about the world/society. A more abstract wish makes your ideology more popular. Politics is more connected to philosophy.

Science paradigms are based on some “aesthetic wishes”. Science and politics connect when some wishes are abstract enough to be applicable in both domains.

The best wishes (the most abstract, the most applicable) are the ones that can be applied to questions where you can’t know what to wish for beforehand. E.g. it’s hard to know what you’d wish the laws of the universe to be (unless you already can figure them out) or the future of yourself and your friends to be (unless you have perfect information about possible futures). So the best wishes are recursive, fractal and “gradient descent” types of wishes.

In this world things are not true just because you believe in them, but because you managed to pick up the right wish. And that wish led you to the truth or a better wish.

I think the “wishful thinking” world has many interesting similarities and differences with the real world. For example, it has more abstract ideologies that rely more on the first principles. But many developments of those ideologies are similar to the developments of today’s ideologies. Science is almost the same. The most interesting object from this world, “recursive/fractal/gradient descent” wishes are not known or popular in our world.

If you can imagine this world, you know everything I think about wishful thinking.

Jorge Luis Borges, Inside Out thinking

You can use outward steelmanning to seek interesting unexplored ideas and possibilities. For example:

Borges has the idea that without objects abstract thinking is impossible.

Funes the Memorious

With no effort, he had learned English, French, Portuguese and Latin. I suspect, however, that he was not very capable of thought. To think is to forget differences, generalize, make abstractions. In the teeming world of Funes, there were only details, almost immediate in their presence.

Tlön, Uqbar, Orbis Tertius

The noun is formed by an accumulation of adjectives. They do not say “moon,” but rather “round airy-light on dark” or “pale-orange-of-the-sky” or any other such combination. In the example selected the mass of adjectives refers to a real object, but this is purely fortuitous. The literature of this hemisphere (like Meinong’s subsistent world) abounds in ideal objects, which are convoked and dissolved in a moment, according to poetic needs. At times they are determined by mere simultaneity. There are objects composed of two terms, one of visual and another of auditory character: the color of the rising sun and the faraway cry of a bird. There are objects of many terms: the sun and the water on a swimmer’s chest, the vague tremulous rose color we see with our eyes closed, the sensation of being carried along by a river and also by sleep. These second-degree objects can be combined with others; through the use of certain abbreviations, the process is practically infinite.

The fact that no one believes in the reality of nouns paradoxically causes their number to be unending.

This monism or complete idealism invalidates all science.

The Immortal

I reflected that Argos and I lived our lives in separate universes; I reflected that our perceptions were identical but that Argos combined them differently than I, constructed them from different objects; I reflected that perhaps for him there were no objects, but rather a constant, dizzying play of swift impressions. I imagined a world without memory, without time; I toyed with the possibility of a language that had no nouns, a language of impersonal verbs or indeclinable adjectives.

But we can imagine a world where abstraction happens without objects. In such world you find relationships not between objects, but between the ways objects look/feel. More specifically, we can imagine a world where you can’t recognize a person using objective facts:

You trust your friend Samwise not because you know that person for 20 years, but because you learned to trust people who feel like Samwise. You’re not sure if different Samwises are the same person or even if a particular Samwise consists of a single person: you can’t be sure to what exactly your feelings are attached. Maybe Samwise is a group of people, maybe Samwise is you plus the other person. To some degree “it doesn’t really matter” what underlying referent is really true and you can believe in multiple referents simultaneously (“some aspect of Samwise is his own personality, some aspect of Samwise is my friendship with him”). In a way the “true underlying referent” doesn’t even exist, it depends on the way you decide to separate your feelings or your identity. Do you want your identity to be separate from Samwise or not? You can test your hypotheses about different referents to some degree (you can try to go away from Samwise and see if your identity is still intact), but ultimately it’s your choice (maybe you care about Samwise so much it doesn’t really matter if you’re close or apart, your identities are entangled: you can’t check it, but you feel that it’s true). The number of possible referents (objects) is infinite, as Borges notices.

It would be a strange world, but not too chaotic or dissimilar to our own. I guess such world would be more affected by morality, e.g. by veil of ignorance or something similar. And the fictional languages that Borges describes are suspiciously similar to Synesthesia, a real world phenomenon.

Conclusion: I think “abstractions without objects” is an unexplored possible skill of the human mind that also has some moral implications. It’s a shame that the idea wasn’t noticed and explored.

And it’s interesting to know how probabilistic reasoning could work in such “inside out”/”inverted” worlds as the one described above. But I don’t have the math knowledge to even try to imagine.

The thought process in the “Samwise world” is similar to what we do with outward steelmanning, turning the reality into a black box. Also check out this interesting comment by chaosmosis 5 years ago about the equivalence of inside out and outside in approaches:

I like to go the opposite direction: Atman, not Annata. Human brains are built such that we can’t entertain ideas without temporarily adopting them into our identity. If you refuse to appeal to identity when thinking, you’re slowing yourself unnecessarily. A better solution is to adopt multiple perspectives and switch between them, with your true beliefs existing somewhere in the patterns of oscillation between the identities. You get the benefits of appealing to cognitively easy strategies like self-motivation via identity, but you avoid the chief drawback.

But to a first approximation, I prefer the schizophrenic identity to the identityless or small identity oriented approach.

That said, there’s a sense in which both of these are highly similar things. From a signals processing perspective, whether you flash morse code in white light against black sky or with dark flags in sunlight makes no difference. It’s the relationship between the light and dark that’s important, not which is more prominent. The same is arguably true for identity attachment. If your identity is small, in practice that basically amounts to being able to fit into other identities more easily. If your identity is all encompassing, no individual identity will appear overwhelmingly large to your decisionmaking process. So if the implementation is good, these work out to be equivalent.

Hyper-Irrational World

You’re a very successful and happy citizen of a developing utopia of rationalists.

You know statistics and physics very well and you teach very smart pupils to do science.

But a plague hits your utopia. The intelligence level of infected citizens degrades to that of a toddler.

Your best pupil got infected and wrote this “scientific paper”: “Humans are made of good. Poop is funny.” If that continues, the civilization may collapse.

You decide to do your best effort to understand the psychology of infected citizens.

You start to see that their thinking has some internal coherence, it could work in some other world. It jut doesn’t make sense in our physical reality. Their brains are not completely destroyed, they were just unlucky to be jumbled like that.

Fortunately, intense therapy helped against the infection. All infected citizens were cured.

All “conscious” tensors were updated back to normal and the utopia called GPT-4 was built. No more drops in performance.

So, I can imagine one world where rationality has the potential to do a lot of harm. Can we do better than rewrite conscious citizens?

Similar to the logic of The Hour I First Believed, should we split our world between infected and normal citizens? Or should we incorporate parts of their reasoning in our brains, similar to the “values handshake”?

May it be that brains of some people are predisposed to incorporating beliefs/identities of other people? Just in case we’re living in an absurd simulation. It would be a multiverse-wide cooperation too.

Some consequences

Here I want to describe some consequences of outward steelmanning.

A. This steelmanning always forces you to make analogies between everything. So the more you do it, the more different topics seem similar to you. I think outward steelmanning leads to some sort of logical holism: “in Philosophy, logical holism is the belief that the world operates in such a way that no part can be known without the whole being known first”.

In particular, each act of this steelmanning makes you learn something about this steelmanning.

B. Outward steelmanning lets you compress ideas and opinions of different people avoiding direct conflict of beliefs as much as possible. Because the beliefs are “hidden” in the black box. So maybe this steelmanning can be used for reaching an agreement between people somehow?

C. Outward steelmanning should make you able to work with hypotheses and evidence in an unusual way. But I haven’t got a specific abstract example. I guess you start with getting some hypotheses and reformulating them so they sound identical or imagining a world where they’re all the same.

D. I think that wishful thinking is not necessary, but natural for outward steelmanning.

You have a chance to defeat the usual steel man. But you don’t have a chance to defeat outward steelmanning. So you need meta-principles (motivated reasoning) to decide what ideas to steelman and what to strawman.

P.S.: Thinking

Here I want to describe the way I think. I steelman every idea, every argument, every philosophy.

Initially all ideas are equally strong for me. I can’t prove through reasoning that one idea is weaker than the other. So I need to deliberately decide to weaken some of the ideas and reinforce others, I need to consciously introduce some hypocrisy or double standards or circular reasoning. I need to want some ideas to be more true than others (wishful thinking) to start thinking. I need some meta-principle to weigh the ideas that can’t be based on those ideas themselves. (I use my deepest emotions.) Otherwise everything’s just equal.

You can say I believe in everything at the same time. Some beliefs are just deliberately constrained. For me belief in belief (forcing yourself to believe in something) is not something dishonest, it’s a necessity. Beliefs are like muscles for me, they need energy.

If we don’t take Bayesian rationality into account, the world looks like this to me:

Formal logic can’t be used in debates (or thinking). And there’s no known and agreed upon substitute. People just assume that some invisible force holds their arguments together and gives them weight.
People’s “arguments” are just smaller opinions. And people with different bigger opinions usually have different smaller opinions. So they just argue with each other from completely separate universes.
People rely on clear verbal argumentation, but arguments can go wrong in ways you can’t verbalize.
If reality forced me to sacrifice my friend and believe that they’re way dumber than me, I would hate to do it and hate to believe it. But people are eager to adopt the violence of reality as their own violence.
Most philosophies and ideologies would be easier to formulate and classify with wishes instead of dodgy logic (or dodgy models).
Most facts about the human mind are unknown. But people argue a lot about topics directly or indirectly related to the human mind.
The world looks like Hell compared to what it could be. Suffering and the loss of life are intolerable. And people just don’t respect each other even when they’re safe (this leads to suffering later).
In a way, the world is boring for the human mind. 99% of your possible life experience seem to not mean anything: you can’t build on it like you can build on math skills. Why are 50 years of life in many ways weaker than proving a couple of mathematical theorems? And even the experience of proving them seems useless.

I think it’s unfortunate that rationality doesn’t study argumentation without the assumption that it’s an approximation of Bayesian reasoning.

What really keeps me ~~sane~~ insane is the last bullet point. It’s the last straw. If I could accept it I would be able to “accept” other points. Subjective experience can feel very good and interesting, but no society is based around that idea. If we could share at least something from our subjective experience, it would solve a lot of problems.

Thank you for your time! I hope you can imagine a world where reading this post gave you some insight about the real world. Consider skimming over this post again, because all outward steel men are connected.

Also here’s a post by Scott Alexander, Different Worlds. It’s not about the worlds of arguments, but about the worlds of people.