dxu comments on My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage)

dxu 21 Oct 2021 4:35 UTC
32 points

I agree that MIRI has strong (statistical) bias towards things that were invented internally. It is currently not clear to me how much of this statistical bias is also a mistake vs the correct reaction to how much internally invented things seem to fit our needs, and how hard it is to find the good stuff that exists externally when it exists. (I think there a lot of great ideas out there that I really wish I had, but I dont have a great method for filtering for in in the sea of irrelevant stuff.)

Strong-upvoted for this paragraph in particular, for pointing out that the strategy of “seeking out disagreement in order to learn” (which obviously isn’t how hg00 actually worded it, but seems to me descriptive of their general suggested attitude/approach) has real costs, which can sometimes be prohibitively high.

I often see this strategy contrasted with a group’s default behavior, and when this happens it is often presented as [something like] a Pareto improvement over said default behavior, with little treatment (or even acknowledgement) given to the tradeoffs involved. I think this occurs because the strategy in question is viewed as inherently virtuous (which in turn I fundamentally see as a consequence of epistemic learned helplessness run rampant, leaking past the limits of any particular domain and seeping into a general attitude towards anything considered sufficiently “hard” [read: controversial]), and attributing “virtuousness” to something often has the effect of obscuring the real costs and benefits thereof.
- hg00 21 Oct 2021 8:31 UTC
  8 points
  Parent
  
  which in turn I fundamentally see as a consequence of epistemic learned helplessness run rampant
  
  Not sure I follow. It seems to me that the position you’re pushing, that learning from people who disagree is prohibitively costly, is the one that goes with learned helplessness. (“We’ve tried it before, we encountered inferential distances, we gave up.”)
  
  Suppose there are two execs at an org on the verge of building AGI. One says “MIRI seems wrong for many reasons, but we should try and talk to them anyways to see what we learn.” The other says “Nah, that’s epistemic learned helplessness, and the costs are prohibitive. Turn this baby on.” Which exec do you agree with?
  
  This isn’t exactly hypothetical, I know someone at a top AGI org (I believe they “take seriously the idea that they are a computation/algorithm”) who reached out to MIRI and was basically ignored. It seems plausible to me that MIRI is alienating a lot of people this way, in fact. I really don’t get the impression they are spending excessive resources engaging people with different worldviews.
  
  Anyway, one way to think about it is talking to people who disagree is just a much more efficient way to increase the accuracy of your beliefs. Suppose the population as a whole is ⁵⁰⁄₅₀ pro-Skub and anti-Skub. Suppose you learn that someone is pro-Skub. This should cause you to update in the direction that they’ve been exposed to more evidence for the pro-Skub position than the anti-Skub position. If they’re trying to learn facts about the world as quickly as possible, their time is much better spent reading an anti-Skub book than a pro-Skub book, since the pro-Skub book will have more facts they already know. An anti-Skub book also has more decision-relevant info. If they read a pro-Skub book, they’ll probably still be pro-Skub afterwards. If they read an anti-Skub book, they might change their position and therefore change their actions.
  
  Talking to an informed anti-Skub in person is even more efficient, since the anti-Skub person can present the very most relevant/persuasive evidence that is the very most likely to change their actions.
  
  Applying this thinking to yourself, if you’ve got a particular position you hold, that’s evidence you’ve been disproportionately exposed to facts that favor that position. If you want to get accurate beliefs quickly you should look for the strongest disconfirming evidence you can find.
  
  None of this discussion even accounts for confirmation bias, groupthink, or information cascades! I’m getting a scary “because we read a website that’s nominally about biases, we’re pretty much immune to bias” vibe from your comment. Knowing about a bias and having implemented an effective, evidence-based debiasing intervention for it are very different.
  
  BTW this is probably the comment that updated me the most in the direction that LW will become / already is a cult.
  - Scott Garrabrant 21 Oct 2021 11:43 UTC
    41 points
    Parent
    So I think my orientation on seeking out disagreement is roughly as follows. (This is going to be a rant I write in the middle of the night, so might be a little incoherent.)
    There are two distinct tasks: 1)Generating new useful hypotheses/tools, and 2)Selecting between existing hypotheses/filtering out bad hypotheses.
    There are a bunch of things that make people good at both these tasks simultaneously. Further, each of these tasks is partially helpful for doing the other. However, I still think of them as mostly distinct tasks.
    I think skill at these tasks is correlated in general, but possibly anti-correlated after you filter on enough g correlates, in spite of the fact that they are each common subtasks of the other.
    I don’t think this (anti-correlated given g) very confidently, but I do think it is good to track your own and others skill in the two tasks separately, because it is possible to have very different scores (and because of side effects of judging generators on reliability might make them less generative as a result of being afraid of being wrong, and similarly vise versa.)
    I think that seeking out disagreement is especially useful for the selection task, and less useful for the generation task. I think that echo chambers are especially harmful for the selection task, but can sometimes be useful for the generation task. Working with someone who agrees with you on a bunch of stuff and shares your ontology allows you to build deeply faster. Someone with a lot of disagreement with you can cause you to get stuck on the basics and not get anywhere. (Sometimes disagreement can also be actively helpful for generation, but it is definitely not always helpful.)
    I spend something like 90+% of my research time focused on the generation task. Sometimes I think my colleagues are seeing something that I am missing, and I seek out disagreement, so that I can get a new perspective, but the goal is to get a slightly different perspective on the thing I am working on, and not on really filtering based on which view is more true. I also sometimes do things like double-crux with people with fairly different world views, but even there, it feels like the goal is to collect new ways to think, rather than to change my mind. I think that for this task a small amount of focusing on people who disagree with you is pretty helpful, but even then, I think I get the most out of people who disagree with me a little bit, because I am more likely to be able to actually pick something up. Further, my focus is not really on actually understanding the other person, I just want to find new ways to think, so I will often translate things to something near by my ontology, and thus learn a lot, but still not be able to pass an ideological Turing test.
    On the other hand, when you are not trying to find new stuff, but instead e.g. evaluate various different hypotheses about AI timelines, I think it is very important to try to understand views that are very far from your own, and take steps to avoid echo chamber effects. It is important to understand the view, the way the other person understands it, not just the way that conveniently fits with your ontology. This is my guess at the relevant skills, but I do not actually identify as especially good at this task. I am much better at generation, and I do a lot of outside-view style thinking here.
    However, I think that currently, AI safety disagreements are not about two people having mostly the same ontology and disagreeing on some important variables, but rather trying to communicate across very different ontologies. This means that we have to build bridges, and the skills start to look more like generation skill. It doesn’t help to just say, “Oh, this other person thinks I am wrong, I should be less confident.” You actually have to turn that into something more productive, which means building new concepts, and a new ontology in which the views can productively dialogue. Actually talking to the person you are trying to bridge to is useful, but I think so is retreating to your echo chamber, and trying to make progress on just becoming less confused yourself.
    For me, there is a handful of people who I think of as having very different views from me on AI safety, but are still close enough that I feel like I can understand them at all. When I think about how to communicate, I mostly think about bridging the gap to these people (which already feels like and impossibly hard task), and not as much the people that are really far away. Most of these people I would describe as sharing the philosophical stance I said MIRI selects for, but probably not all.
    If I were focusing on resolving strategic disagreements, I would try to interact a lot more than I currently do with people who disagree with me. Currently, I am choosing to focus more on just trying to figure out how minds work in theory, which means I only interact with people who disagree with me a little. (Indeed, I currently also only interact with people who agree with me a little bit, and so am usually in an especially strong echo chamber, which is my own head.)
    However, I feel pretty doomy about my current path, and might soon go back to trying to figure out what I should do, which means trying to leave the echo chamber. Often when I do this, I neither produce anything great nor change my mind, and eventually give up and go back to doing the doomy thing where at least I make some progress (at the task of figuring out how minds work in theory, which may or may not end up translating to AI safety at all).
    Basically, I already do quite a bit of the “Here are a bunch of people who are about as smart as I am, and have thought about this a bunch, and have a whole bunch of views that differ from me and from each other. I should be not that confident” (although I should often take actions that are indistinguishable from confidence, since that is how you work with your inside view.) But learning from disagreements more than that is just really hard, and I don’t know how to do it, and I don’t think spending more time with them fixes it on its own. I think this would be my top priority if I had a strategy I was optimistic about, but I don’t, and so instead, I am trying to figure out how minds work, which seems like it might be useful for a bunch of different paths. (I feel like I have some learned helplessness here, but I think everyone else (not just MIRI) is also failing to learn (new ontologies, rather than just noticing mistakes) from disagreements, which makes me think it is actually pretty hard.)
    What links here?
    riceissa's comment on Disagreeables and Assessors: Two Intellectual Archetypes by Ozzie Gooen (EA Forum; 5 Nov 2021 18:25 UTC; 4 points)
  - Scott Garrabrant 21 Oct 2021 9:54 UTC
    7 points
    Parent
    Not sure I follow. It seems to me that the position you’re pushing, that learning from people who disagree is prohibitively costly, is the one that goes with learned helplessness. (“We’ve tried it before, we encountered inferential distances, we gave up.”)
    I believe they are saying that cheering for seeking out disagreement is learned helplessness as opposed to doing a cost-benefit analysis about seeking out disagreement. I am not sure I get that part either.
    I was also confused reading the comment, thinking that maybe they copied the wrong paragraph, and meant the 2nd paragraph.
    I am interested in the fact that you find the comment so cult-y though, because I didn’t pick that up.
    - hg00 21 Oct 2021 10:53 UTC
      1 point
      Parent
      
      I am interested in the fact that you find the comment so cult-y though, because I didn’t pick that up.
      
      It’s a fairly incoherent comment which argues that we shouldn’t work to overcome our biases or engage with people outside our group, with strawmanning that seems really flimsy… and it has a bunch of upvotes. Seems like curiosity, argument, and humility are out, and hubris is in.
- Ben Pace 21 Oct 2021 4:36 UTC
  4 points
  Parent
  +1
- TekhneMakre 21 Oct 2021 7:27 UTC
  1 point
  Parent
  which in turn I fundamentally see as a consequence of epistemic learned helplessness run rampant
  I don’t understand this, but for some reason I’m interested. Could you say a couple sentences more? How does rampant learned helplessness about having correct beliefs make it more appealing to seek new information by seeking disagreement? Are you saying that there’s learned helplessness about a different strategy for relating to potential sources of information?
  - dxu 21 Oct 2021 16:56 UTC
    27 points
    Parent
    So, my model is that “epistemic learned helplessness” essentially stems from an inability to achieve high confidence in one’s own (gears-level) models. Specifically, by “high confidence” here I mean a level of confidence substantially higher than one would attribute to an ambient hypothesis in a particular space—if you’re not strongly confident that your model [in some domain] is better than the average competing model [in that domain], then obviously you’d prefer to adopt an exploration-based strategy (that is to say: one in which you seek out disagreeing hypotheses in order to increase the variance of your information intake) with respect to that domain.
    
    I think this is correct, so far as it goes, as long as we are in fact restricting our focus to some domain or set of domains. That is to say: as humans, naturally it’s impossible to explore every domain in sufficient depth that we can form and hold high confidence in gears-level model for said domain, which in turn means there will obviously be some domains in which “epistemic learned helplessness” is simply the correct attitude to take. (And indeed, the original blog post in which Scott introduced the concept of “epistemic learned helplessness” does in fact contextualize it using history books as an example.)
    
    Where I think this goes wrong, however, is when the proponent of “epistemic learned helplessness” fails to realize that this attitude’s appropriateness is actually a function of one’s confidence in some particular domain, and instead allows the attitude to seep into every domain. Once that happens, “inability to achieve confidence in own’s own models” ceases to be a rational reaction to a lack of knowledge, and instead turns into an omnipresent fog clouding over everything you think and do. (And the exploration-based strategy I outlined above ceases to be a rational reaction to a lack of confidence, and instead turns into a strategy that’s always correct and virtuous.)
    
    This is the sense in which I characterized the result as
    
    a consequence of epistemic learned helplessness run rampant, leaking past the limits of any particular domain and seeping into a general attitude towards anything considered sufficiently “hard”
    
    (Note the importance of the disclaimer “hard”. For example, I’ve yet to encounter anyone whose “epistemic learned helplessness” is so extreme that they stop to question e.g. whether they are in fact capable of driving a car. But that in itself is not particularly reassuring, especially when domains we care about include stuff labeled “hard”.)
    
    Now for the rub: I think anyone working on AI alignment (or any technical question of comparable difficulty) mustn’t exhibit this attitude with respect to [the thing they’re working on]. If you have a problem where you’re not able to achieve high confidence in your own models of something (relative to competing ambient models), you’re not going to be able to follow your own thoughts far enough to do good work—not without being interrupted by thoughts like “But if I multiply the probability of this assumption being true, by the probability of that assumption being true, by the probability of that assumption being true...” and “But [insert smart person here] thinks this assumption is unlikely to be true, so what probability should I assign to it really?”
    
    I think this is very bad. And since I think it’s very bad, naturally I will strongly oppose attempts to increase pressure in that particular direction—especially since I think pressure to think this way in this particular community is already ALARMINGLY HIGH. I think “epistemic learned helplessness” (which sometimes goes by more innocuous names as well, like fox epistemology or modest epistemology) is epistemically corrosive once it has breached quarantine, and by and large I think it has breached quarantine for a dismayingly large number of people (though thankfully my impression is that this has largely not occurred at MIRI).
    - hg00 22 Oct 2021 10:10 UTC
      3 points
      Parent
      It seems like you wanted me to respond to this comment, so I’ll write a quick reply.
      
      Now for the rub: I think anyone working on AI alignment (or any technical question of comparable difficulty) mustn’t exhibit this attitude with respect to [the thing they’re working on]. If you have a problem where you’re not able to achieve high confidence in your own models of something (relative to competing ambient models), you’re not going to be able to follow your own thoughts far enough to do good work—not without being interrupted by thoughts like “But if I multiply the probability of this assumption being true, by the probability of that assumption being true, by the probability of that assumption being true...” and “But [insert smart person here] thinks this assumption is unlikely to be true, so what probability should I assign to it really?”
      
      This doesn’t seem true for me. I think through details of exotic hypotheticals all the time.
      
      Maybe others are different. But it seems like maybe you’re proposing that people self-deceive in order to get themselves confident enough to explore the ramifications of a particular hypothesis. I think we should be a bit skeptical of intentional self-deception. And if self-deception is really necessary, let’s make it a temporary suspension of belief sort of thing, as opposed to a life belief that leads you to not talk to those with other views.
      
      It’s been a while since I read Inadequate Equilibria. But I remember the message of the book being fairly nuanced. For example, it seems pretty likely to me that there’s no specific passage which contradicts the statement “hedgehogs make better predictions on average than foxes”.
      
      I support people trying to figure things out for themselves, and I apologize if I unintentionally discouraged anyone from doing that—it wasn’t my intention. I also think people consider learning from disagreement to be virtuous for a good reason, not just due to “epistemic learned helplessness”. Also, learning from disagreement seems importantly different from generic deference—especially if you took the time to learn about their views and found yourself unpersuaded. Basically, I think people should account for both known unknowns (in the form of people who disagree whose views you don’t understand) and unknown unknowns, but it seems OK to not defer to the masses / defer to authorities if you have a solid grasp of how they came to their conclusion (this is my attempt to restate the thesis of Inadequate Equilibria as I remember it).
      
      I don’t deny that learning from disagreement has costs. Probably some people do it too much. I encouraged MIRI to do it more on the margin, but it could be that my guess about their current margin is incorrect, who knows.
      - dxu 22 Oct 2021 15:26 UTC
        22 points
        Parent
        Thanks for the reply.
        
        But it seems like maybe you’re proposing that people self-deceive in order to get themselves confident enough to explore the ramifications of a particular hypothesis. I think we should be a bit skeptical of intentional self-deception.
        
        I want to clarify that this is not my proposal, and to the extent that it had been someone’s proposal, I would be approximately as wary about it as you are. I think self-deception is quite bad on average, and even on occasions when it’s good, that fact isn’t predictable in advance, making choosing to self-deceive pretty much always a negative expected-value action.
        
        The reason I suspect you interpreted this as my proposal is that you’re speaking from a frame where “confidence in one’s model” basically doesn’t happen by default, so to get there people need to self-deceive, i.e. there’s no way for someone [in a sufficiently “hard” domain] to have a model and be confident in that model without doing [something like] artificially inflating their confidence higher than it actually is.
        
        I think this is basically false. I claim that having (real, not artificial) confidence in a given model (even of something “hard”) is entirely possible, and moreover happens naturally, as part of the process of constructing a gears-level model to begin with. If your gears-level model actually captures some relevant fraction of the problem domain, I claim it will be obviously the case that it does so—and therefore a researcher holding that model would be very much justified in placing high confidence in [that part of] their model.
        
        How much should such a researcher be swayed by the mere knowledge that other researchers disagree? I claim the ideal answer is “not at all”, for the same reason that argument screens off authority. And I agree that, from the perspective of somebody on the outside (who only has access to the information that two similarly-credentialed researchers disagree, without access to the gears in question), this can look basically like self-deception. But (I claim) from the inside the difference is very obvious, and not at all reminiscent of self-deception.
        
        (Some fields do not admit good gears-level models at all, and therefore it’s basically impossible to achieve the epistemic state described above. For people in such fields, they might plausibly imagine that all fields have this property. But this isn’t the case—and in fact, I would argue that the division of the sciences into “harder” and “softer” is actually pointing at precisely this distinction: the “hardness” attributed to a field is in fact a measure of how possible it is to form a strong gears-level model.)
        
        Does this mean “learning from disagreement” is useless? Not necessarily; gears-level models can also be wrong and/or incomplete, and one entirely plausible (and sometimes quite useful) mechanism by which to patch up incomplete models is to exchange gears with someone else, who may not be working with quite the same toolbox as you. But (I claim) for this process to actually help, it should done in a targeted way: ideally, you’re going into the conversation already with some idea of what you hope to get out of it, having picked your partner beforehand for their likeliness to have gears you personally are missing. If you’re merely “seeking out disagreement” for the purpose of fulfilling a quota, that (I claim) is unlikely to lead anywhere productive. (And I view your exhortations for MIRI to “seek out more disagreement on the margin” as proposing essentially just such a quota.)
        
        (Standard disclaimer: I am not affiliated with MIRI, and my views do not necessarily reflect their views, etc.)
        What links here?
        dxu's comment on Hashing out long-standing disagreements seems low-value to me by So8res (18 Feb 2023 8:01 UTC; 22 points)