Scott Garrabrant comments on My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage)

Scott Garrabrant 21 Oct 2021 11:43 UTC
41 points
So I think my orientation on seeking out disagreement is roughly as follows. (This is going to be a rant I write in the middle of the night, so might be a little incoherent.)
There are two distinct tasks: 1)Generating new useful hypotheses/tools, and 2)Selecting between existing hypotheses/filtering out bad hypotheses.
There are a bunch of things that make people good at both these tasks simultaneously. Further, each of these tasks is partially helpful for doing the other. However, I still think of them as mostly distinct tasks.
I think skill at these tasks is correlated in general, but possibly anti-correlated after you filter on enough g correlates, in spite of the fact that they are each common subtasks of the other.
I don’t think this (anti-correlated given g) very confidently, but I do think it is good to track your own and others skill in the two tasks separately, because it is possible to have very different scores (and because of side effects of judging generators on reliability might make them less generative as a result of being afraid of being wrong, and similarly vise versa.)
I think that seeking out disagreement is especially useful for the selection task, and less useful for the generation task. I think that echo chambers are especially harmful for the selection task, but can sometimes be useful for the generation task. Working with someone who agrees with you on a bunch of stuff and shares your ontology allows you to build deeply faster. Someone with a lot of disagreement with you can cause you to get stuck on the basics and not get anywhere. (Sometimes disagreement can also be actively helpful for generation, but it is definitely not always helpful.)
I spend something like 90+% of my research time focused on the generation task. Sometimes I think my colleagues are seeing something that I am missing, and I seek out disagreement, so that I can get a new perspective, but the goal is to get a slightly different perspective on the thing I am working on, and not on really filtering based on which view is more true. I also sometimes do things like double-crux with people with fairly different world views, but even there, it feels like the goal is to collect new ways to think, rather than to change my mind. I think that for this task a small amount of focusing on people who disagree with you is pretty helpful, but even then, I think I get the most out of people who disagree with me a little bit, because I am more likely to be able to actually pick something up. Further, my focus is not really on actually understanding the other person, I just want to find new ways to think, so I will often translate things to something near by my ontology, and thus learn a lot, but still not be able to pass an ideological Turing test.
On the other hand, when you are not trying to find new stuff, but instead e.g. evaluate various different hypotheses about AI timelines, I think it is very important to try to understand views that are very far from your own, and take steps to avoid echo chamber effects. It is important to understand the view, the way the other person understands it, not just the way that conveniently fits with your ontology. This is my guess at the relevant skills, but I do not actually identify as especially good at this task. I am much better at generation, and I do a lot of outside-view style thinking here.
However, I think that currently, AI safety disagreements are not about two people having mostly the same ontology and disagreeing on some important variables, but rather trying to communicate across very different ontologies. This means that we have to build bridges, and the skills start to look more like generation skill. It doesn’t help to just say, “Oh, this other person thinks I am wrong, I should be less confident.” You actually have to turn that into something more productive, which means building new concepts, and a new ontology in which the views can productively dialogue. Actually talking to the person you are trying to bridge to is useful, but I think so is retreating to your echo chamber, and trying to make progress on just becoming less confused yourself.
For me, there is a handful of people who I think of as having very different views from me on AI safety, but are still close enough that I feel like I can understand them at all. When I think about how to communicate, I mostly think about bridging the gap to these people (which already feels like and impossibly hard task), and not as much the people that are really far away. Most of these people I would describe as sharing the philosophical stance I said MIRI selects for, but probably not all.
If I were focusing on resolving strategic disagreements, I would try to interact a lot more than I currently do with people who disagree with me. Currently, I am choosing to focus more on just trying to figure out how minds work in theory, which means I only interact with people who disagree with me a little. (Indeed, I currently also only interact with people who agree with me a little bit, and so am usually in an especially strong echo chamber, which is my own head.)
However, I feel pretty doomy about my current path, and might soon go back to trying to figure out what I should do, which means trying to leave the echo chamber. Often when I do this, I neither produce anything great nor change my mind, and eventually give up and go back to doing the doomy thing where at least I make some progress (at the task of figuring out how minds work in theory, which may or may not end up translating to AI safety at all).
Basically, I already do quite a bit of the “Here are a bunch of people who are about as smart as I am, and have thought about this a bunch, and have a whole bunch of views that differ from me and from each other. I should be not that confident” (although I should often take actions that are indistinguishable from confidence, since that is how you work with your inside view.) But learning from disagreements more than that is just really hard, and I don’t know how to do it, and I don’t think spending more time with them fixes it on its own. I think this would be my top priority if I had a strategy I was optimistic about, but I don’t, and so instead, I am trying to figure out how minds work, which seems like it might be useful for a bunch of different paths. (I feel like I have some learned helplessness here, but I think everyone else (not just MIRI) is also failing to learn (new ontologies, rather than just noticing mistakes) from disagreements, which makes me think it is actually pretty hard.)
What links here?
- riceissa's comment on Disagreeables and Assessors: Two Intellectual Archetypes by Ozzie Gooen (EA Forum; 5 Nov 2021 18:25 UTC; 4 points)