Gears Level & Policy Level
Inside view vs outside view has been a fairly useful intuition-pump for rationality. However, the dichotomy has a lot of shortcomings. We’ve just gotten a whole sequence about failures of a cluster of practices called modest epistemology, which largely overlaps with what people call outside view. I’m not ready to stop championing what I think of as the outside view. However, I am ready for a name change. The term outside view doesn’t exactly have a clear definition; or, to the extent that it does have one, it’s “reference class forecasting”, which is not what I want to point at. Reference class forecasting has its uses, but many problems have been noted.
I propose gears level & policy level. But, before I discuss why these are appropriate replacements, let’s look at my motives for finding better terms.
Issues with Inside vs Outside
Problems with the concept of outside view as it currently exists:
Reference class forecasting tends to imply stopping at base-rate reasoning, rather than starting at base-rate reasoning. I want a concept of outside view which helps overcome base-rate neglect, but which more obviously connotes combining an outside view with an inside view (by analogy to combining a prior probability with a likelihood function to get a posterior probability).
Reference class forecasting lends itself to reference class tennis, IE, a game of choosing the reference class which best makes your point for you. (That’s a link to the same article as the previous bullet point, since it originated the term, but this Stuart Armstrong article also discusses it. Paul Christiano discusses rules and ettiquete of reference class tennis, because of course he does.) Reference class tennis is both a pretty bad conversation to have, which makes reference class forecasting a poor choice for productive discussion, and a ponentially big source of bias if you do it to yourself. It’s closely related to the worst argument in the world.
Reference class forecasting is specified at the object level: you find a class fitting the prediction you want to make, and you check the statistics for things in that class to make your prediction. However, central examples of the usefulness of the outside view occur at the meta level. In examples of planning-fallacy correction, you don’t just note how close you usually get to the deadline before finishing something. You compare it to how close to the deadline you usually expect to get. Why would you do that? To correct your inside view! As I mentioned before, the type of the outside view should be such that it begs combination with the inside view, rather than standing on its own.
Outside view has the connotation of stepping back and ignoring some details. However, we’d like to be able to use all the information at our disposal—so long as we can use it in the right way. Taking base rates into account can look like ignoring information: walking by the proverbial hundred-dollar bill on the gorund in times square, or preparing for a large flood despite there being none in living memory. However, while accounting for base rates does indeed tend to smooth out behavior and make it depend less on evidence, that’s because we’re working with more information, not less. A concept of outside view which connotes bringing in more information, rather than less, would be an improvement.
The existing notion of inside view is also problematic:
The inside-view vs outside-view distinction does double duty as a descriptive dichotomy and a prescriptive technique. This is especially harmful in the case of inside view, which gets belittled as the naive thing you do before you learn to move to outside view. (We could similarly malign the outside view as what you have before you have a true inside-view understanding of a thing.) On the contrary, there are significant skills in forming a high-quality inside view. I primarily want to point at those, rather than the descriptive cluster.
The Gears Level and the Policy Level
Gears-level understanding is a term from CFAR, so you can’t blame me for it. Well, I’m endorsing it, so I suppose you can blame me a little. In any case, I like the term, and I think it fits my purposes. Some features of gears-level reasoning:
Dishing out probability mass precisely, so as to have the virtue of precision.
Having the properties of a good explanation, along the lines of David Deutsch: being pinned down on all sides by the evidence, and providing understanding, not only predictive accuracy. (Contrast this concept with a big neural-net model which classifies images extremely well but is difficult to analyse.)
Reasoning from first principles, rather than analogy.
Making a prediction with a well-defined model, such that anyone who understood your model could calculate the same prediction independently.
The policy level is not a CFAR concept. It is similar to the CFAR concept of the strategic level, which I suspect is based on Nate Soares’ Staring Into Regrets. In any case, here are some things which point in the right direction:
Accounting for knock-on effects, including consistency effects. Choosing an action really is a lot like setting your future policy.
What game theorists mean by policy: a function from observations to actions, which is (ideally) in equilibrium with the policies of all other agents. A good policy lets you coordinate sucessfully with yourself and with others. Choosing a policy illustrates the idea of choosing at the meta level: you aren’t selecting an action, but rather, a function from situations to actions.
Timeless decision theory / updateless decision theory / functional decision theory. Roughly, choosing a policy from behind a Rawlsian veil of ignorance. As I mentioned with accounting for base rates, it might seem from one perspective like this kind of reasoning is throwing information away; but actually, it is much more powerful. It allows you to set up arbitrary functions from information states to strategies. You are not actually throwing information away; you always have the option of responding to it as usual. You are gaining the option of ignoring it, or reacting to it in a different way, based on larger considerations.
Cognitive reductions, in Jessica Taylor’s sense (points five and six here). Taking the outside view should not entail giving up on having a gears-level model. The virtues of good models at the gears level are still virtues at the policy level. Rather, the policy level asks you to make a gears-level model of your own cognitive process. When you go to the policy level, you take your normal way of thinking and doing as an object. You think about the causes and effects of your normal ways of being.
Most of the existing ideas I can point to are about actions: game theory, decision theory, the planning fallacy. That’s probably the worst problem with the terminology choice. Policy-level thinking has a very instrumental character, because it is about process. However, at its core, it is epistemic. Gears level thinking is the practice of good map-making. The output is a high-quality map. Policy-level thinking, on the other hand, is the theory of map-making. The output is a refined strategy for making maps.
The standard example with the planning fallacy illustrates this: although the goal is to improve planning, which sounds instrumental, the key is noticing the miscalibration of time estimates. The same trick works for any kind of mental miscalibration: if you know about it, you can adjust for it.
This is not just reference class forecasting, though. You don’t adjust your time estimates for projects upward and stop there. The fact that you normally underestimate how long things will take makes you think about your model. “Hm, that’s interesting. My plans almost never come out as stated, but I always believe in them when I’m making them.” You shouldn’t be satisfied with this state of affairs! You can slap on a correction factor and keep planning like you always have, but this is a sort of paradoxical mental state to maintain. If you do manage keep the disparity between your past predictions and actual events actively in mind, I think it’s more natural to start considering which parts of your plans are most likely to go wrong.
If I had to spell it out in steps:
Notice that a thing is happening. In particular, notice that a thing is happening to you, or that you’re doing a thing. This step is skipped in experiments on the planning fallacy; experimenters frame the situation. In some respects, though, it’s the most important part; naming the situation as a situation is what lets you jump outside of it. This is what lets you go off-script, or be anti-sphexish.
Make a model of the input-output relations involved. Why did you say what you just said? Why did you think what you just thought? Why did you do what you just did? What are the typical effects of these thoughts, words, actions? This step is most similar to reference class forecasting. Figuring out the input-output relation is a combination of refining the reference class to be the most relevant one, and thinking of the base-rates of outcomes in the reference class.
Adjust your policy. Is there a systematic bias in what you’re currently doing? Is there a risk you weren’t accounting for? Is there an extra variable you could use to differentiate between two cases you were treating as the same? Chesterton-fencing your old strategy is important here. Be gentle with policy changes—you don’t want to make a bucket error or fall into a hufflepuff trap. If you notice resistence in yourself, be sure to leave a line of retreat by visualizing possible worlds. (Yes, I think all those links are actually relevant. No, you don’t have to read them to get the point.)
I don’t know quite what I can say here to convey the importance of this. There is a skill here; a very important skill, which can be done in a split second. It is the skill of going meta.
Gears-Leves and Policy-Level Are Not Opposites
The second-most confusing thing about my proposed terms is probably that they are not opposites of each other. They’d be snappier if they were; “inside view vs outside view” had a nice sound to it. On the other hand, I don’t want the concepts to be opposed. I don’t want a dichotomy that serves as a discriptive clustering of ways of thinking; I want to point at skills of thinking. As I mentioned, the virtuous features of gears-level thinking are still present when thinking at the policy level; unlike in reference class forecasting, the ideal is still to get a good causal model of what’s going on (IE, a good causal model of what is producing systematic bias in your way of thinking).
The opposite of gears-level thinking is un-gears-like thinking: reasoning by analogy, loose verbal arguments, rules of thumb. Policy-level thinking will often be like this when you seek to make simple corrections for biases. But, remember, these are error models in the errors-vs-bugs dichotomy; real skill improvement relies on bug models (as studies in deliberate practice suggest).
The opposite of policy-level thinking? Stimulus-response; reinforcement learning; habit; scripted, sphexish behavior. This, too, has its place.
Still, like inside and outside view, gears and policy thinking are made to work together. Learning the principles of strong gears-level thinking helps you fill in the intricate structure of the universe. It allows you to get past social reasoning about who said what and what you were taught and whay you’re supposed to think and believe, and instead, get at what’s true. Policy-level thinking, on the other hand, helps you to not get lost in the details. It provides the rudder which can keep you moving in the right direction. It’s better at cooperating with others, maintaining sanity before you figure out how it all adds up to normality, and optimizing your daily life.
Gears and policies both constitute moment-to-moment ways of looking at the world which can change the way you think. There’s no simple place to go to learn the skillsets behind each of them, but if you’ve been around LessWrong long enough, I suspect you know what I’m gesturing at.
- Being a Robust Agent by 18 Oct 2018 7:00 UTC; 151 points) (
- Toward a New Technical Explanation of Technical Explanation by 16 Feb 2018 0:44 UTC; 86 points) (
- Track-Back Meditation by 11 Sep 2018 10:31 UTC; 76 points) (
- Bayes’ Law is About Multiple Hypothesis Testing by 4 May 2018 5:31 UTC; 36 points) (
- Akrasia is confusion about what you want by 28 Dec 2018 21:09 UTC; 26 points) (
- Timeless Modesty? by 24 Nov 2017 11:12 UTC; 17 points) (
- 26 Jun 2020 16:07 UTC; 9 points) 's comment on Don’t punish yourself for bad luck by (
- 21 Feb 2018 22:28 UTC; 2 points) 's comment on Toward a New Technical Explanation of Technical Explanation by (
Agile programming notices that we’re bad at forcasting, but concludes that we’re systematically bad. The approach it takes is to ask the programmers to try for consistency in their forecasting of individual tasks, and puts the learning in the next phase, which is planning. So as individual members of a development team, we’re supposed to simultaneously believe that we can make consistent forecasts of particular tasks, and that our ability to make estimates is consistently off, and applying a correction factor will make it work.
Partly this is like learning to throw darts, and partly it’s a theory about aggregating biased estimates. What I mean by the dart throwing example, is that beginning dart tossers are taught to try first to aim for consistency. Once you get all your darts to end up close to the same spot, you can adjust small things to move the spot around. The first requirement for being a decent dart thrower is being able to throw the same way each toss. Once you can do that, you can turn slightly, or learn other tricks to adjust how your aim point relates to the place the darts land.
The aggregation theory says that the problem in forecasting is not with the individual estimates, it’s with random and unforseen factors that are easier to correct for in the aggregate. The problem with the individual forecasts might be overhead tasks that reliably steal time away, it might be that bugs are unpredictable, or it might be about redesign that only becomes apparent as you make progress. These are easier to account for in the aggregate planning than when thinking about individual tasks. If you pad all your tasks for their worst case, you end up with too much padding.
Over the long term, the expansion factor from individual tasks to weeks or months of work can be fairly consistent.
Providing Slack at the project level instead of the task level is a really good idea, and has worked well in many fields outside of programming. It is analogous to the concept of insurance: the RoI on Slack is higher when you aggregate many events with at least partially uncorrelated errors.
One major problem with trying to fix estimates at the task level is that there are strong incentives not to finish a task too early. For example, if you estimated 6 weeks, and are almost done after 3, and something moderately urgent comes up, you’re more likely to switch and fix that urgent thing since you have time. On the other hand, if you estimated 4 weeks, you’re more likely to delay the other task (or ask someone else to do it).
As a result, I’ve found that teams are literally likely to finish projects faster with higher quality if you estimate the project as, say, 8 3-week tasks with 24 weeks of overall slack (so 48 weeks total) than if you estimate the project as a 8 6-week tasks.
This is somewhat counterintuitive but really easy to apply in practice if you have a bit of social capital.
We’re not systematically bad forecasters. We’re subject to widespread rewards to overconfidence.
Ah, interesting evidence, thanks! And I like the dart-throwing analogy.
There’s a bunch of things in the content of this post I really like:
Pushing against ignoring details (i.e. throwing away info available to you when using the outside view)
Making more detailed claims about how the gears level of reasoning works—one of the issues with an epistemology based on taking the outside view is that it doesn’t explain how to actually figure new things out e.g. explaining how your gears and policies work together to create a better set of gears—making your epistemology work together to solve mistakes you’ve made
I don’t feel like I fully understand (i.e. could easily replicate) how your notion of the policy level interacts with the gears—I might try to work through some examples in a separate comment later, and if I can do that I’ll probably start using these terms myself.
Also, something about this post felt really well structured. Anyhow, for all these reasons, I’ve Featured this.
One thing I’d add to that list is that the post focuses on refining existing concepts, which is quite valuable and generally doesn’t get enough attention.
Doesn’t this basically move the reference class tennis to the meta-level?
“Oh, in general I’m terrible at planning, but not in cases involving X, Y and Z!”
It seems reasonable that this is harder to do this on a meta-level, but do any of the other points you mention actually “solve” this problem?
Valid question. I think the way I framed it may over-sell the solution—there’ll still be some problems of reference-class choice. But, you should move toward gears to try and resolve that.
The way in which policy-level thinking improves the reference-class tennis is that you’re not stopping at reference-class reasoning. You’re trying to map the input-output relations involved.
If you think you’re terrible at planning, but not in cases X, Y, and Z, what mechanisms do you think lie behind that? You don’t have to know in order to use the reference classes, but if you are feeling uncertain about the validity of the reference classes, digging down into causal details is likely a pretty good way to disambiguate.
For example, maybe you find that you’re tempted to say something very critical of someone (accuse them of lying), but you notice that you’re in a position where you are socially incentivised to do so (everyone is criticising this person and you feel like joining in). However, you alse value honesty, and don’t want to accuse them of lying unfairly. You don’t think you’re being dishonest with yourself about it, but you can remember other situations where you’ve joined in an group criticism and later realized you were unfair due to the social momentum.
I think the reference-class reaction is to just downgrade your certainty, and maybe have a policy of not speaking up in those kinds of situations. This isn’t a bad reaction, but it can be seen as a sort of epistemic learned helplessness. “I’ve been irrational in this sort of situation, therefore I’m incapable of being rational in this sort of situation.” You might end up generally uncomfortable with this sort of social situation and feeling like you don’t know how to handle it well.
So, another reaction, which might be better in the long-term, would be to take a look at your thinking process. “What makes me think they’re a liar? Wow, I’m not working on much evidence here. There are all kinds of alternative explanations, I just gravitated to that one...”
It’s a somewhat subtle distinction there; maybe not the clearest example.
Another example which is pretty clear: someone who you don’t like provides a critique of your idea. You are tempted to reject it out of hand, but outside view tells you you’re likely to reject that person’s comments reguardless of truth. A naive adjustment might be to upgrade your credence in their criticism. This seems like something you only want to do if your reasoning in the domain is too messy to assess objectively. Policy-level thinking might say that the solution is to judge their critique on its merits. Your “inside view” (in the sense of how-reality-naively-seems-to-you) is that it’s a bad critique; but this obviously isn’t a gears-level assessment.
Maybe your policy-level analysis is that you’ll be unable to judge a critique objectively in such cases, even if you pause to try and imagine what you’d think of it if you came up with it yourself. In that case, maybe you decide that what you’ll do in such cases is write it down and think it through in more detail later (and say as much to the critic).
Or, maybe your best option really is to downgrade your own belief without assessing the critique at the gears level. (Perhaps this issue isn’t that important to you, and you don’t want to spend time evaluating arguments in detail.) But the point is that you can go into more detail.