My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage)

jessicata16 Oct 2021 21:28 UTC

84 points

Center for Applied Rationality (CFAR)Machine Intelligence Research Institute (MIRI)Drama Leverage Research

I appreciate Zoe Curzi’s revelations of her experience with Leverage. I know how hard it is to speak up when no or few others do, and when people are trying to keep things under wraps.

I haven’t posted much publicly about my experiences working as a researcher at MIRI (2015-2017) or around CFAR events, to a large degree because I’ve been afraid. Now that Zoe has posted about her experience, I find it easier to do so, especially after the post was generally well-received by LessWrong.

I felt moved to write this, not just because of Zoe’s post, but also because of Aella’s commentary:

I’ve found established rationalist communities to have excellent norms that prevent stuff like what happened at Leverage. The times where it gets weird is typically when you mix in a strong leader + splintered, isolated subgroup + new norms. (this is not the first time)

This seemed to me to be definitely false, upon reading it. Most of what was considered bad about the events at Leverage Research also happened around MIRI/CFAR, around the same time period (2017-2019).

I don’t want to concentrate on the question of which is “worse”; it is hard to even start thinking about that without discussing facts on the ground and general social models that would apply to both cases. I also caution against blame in general, in situations like these, where many people (including me!) contributed to the problem, and have kept quiet for various reasons. With good reason, it is standard for truth and reconciliation events to focus on restorative rather than retributive justice, and include the possibility of forgiveness for past crimes.

As a roadmap for the rest of the post, I’ll start by describing some background, describe some trauma symptoms and mental health issues I and others have experienced, and describe the actual situations that these mental events were influenced by and “about” to a significant extent.

Background: choosing a career

After I finished my CS/AI Master’s degree at Stanford, I faced a choice of what to do next. I had a job offer at Google for machine learning research and a job offer at MIRI for AI alignment research. I had also previously considered pursuing a PhD at Stanford or Berkeley; I’d already done undergrad research at CoCoLab, so this could have easily been a natural transition.

I’d decided against a PhD on the basis that research in industry was a better opportunity to work on important problems that impact the world; since then I’ve gotten more information from insiders that academia is a “trash fire” (not my quote!), so I don’t regret this decision.

I was faced with a decision between Google and MIRI. I knew that at MIRI I’d be taking a pay cut. On the other hand, I’d be working on AI alignment, an important problem for the future of the world, probably significantly more important than whatever I’d be working on at Google. And I’d get an opportunity to work with smart, ambitious people, who were structuring their communication protocols and life decisions around the content of the LessWrong Sequences.

These Sequences contained many ideas that I had developed or discovered independently, such as functionalist theory of mind, the idea that Solomonoff Induction was a formalization of inductive epistemology, and the idea that one-boxing in Newcomb’s problem is more rational than two-boxing. The scene attracted thoughtful people who cared about getting the right answer on abstract problems like this, making for very interesting conversations.

Research at MIRI was an extension of such interesting conversations to rigorous mathematical formalism, making it very fun (at least for a time). Some of the best research I’ve done was at MIRI (reflective oracles, logical induction, others). I met many of my current friends through LessWrong, MIRI, and the broader LessWrong Berkeley community.

When I began at MIRI (in 2015), there were ambient concerns that it was a “cult”; this was a set of people with a non-mainstream ideology that claimed that the future of the world depended on a small set of people that included many of them. These concerns didn’t seem especially important to me at the time. So what if the ideology is non-mainstream as long as it’s reasonable? And if the most reasonable set of ideas implies high impact from a rare form of research, so be it; that’s been the case at times in history.

(Most of the rest of this post will be negative-valenced, like Zoe’s post; I wanted to put some things I liked about MIRI and the Berkeley community up-front. I will be noting parts of Zoe’s post and comparing them to my own experience, which I hope helps to illuminate common patterns; it really helps to have an existing different account to prompt my memory of what happened.)

Trauma symptoms and other mental health problems

Back to Zoe’s post. I want to disagree with a frame that says that the main thing that’s bad was that Leverage (or MIRI/CFAR) was a “cult”. This makes it seem like what happened at Leverage is much worse than what could happen at a normal company. But, having read Moral Mazes and talked to people with normal corporate experience (especially in management), I find that “normal” corporations are often quite harmful to the psychological health of their employees, e.g. causing them to have complex PTSD symptoms, to see the world in zero-sum terms more often, and to have more preferences for things to be incoherent. Normal startups are commonly called “cults”, with good reason. Overall, there are both benefits and harms of high-demand ideological communities (“cults”) compared to more normal occupations and social groups, and the specifics matter more than the general class of something being “normal” or a “cult”, although the general class affects the structure of the specifics.

Zoe begins by listing a number of trauma symptoms she experienced. I have, personally, experienced most of those on the list of cult after-effects in 2017, even before I had a psychotic break.

The psychotic break was in October 2017, and involved psychedelic use (as part of trying to “fix” multiple deep mental problems at once, which was, empirically, overly ambitious); although people around me to some degree tried to help me, this “treatment” mostly made the problem worse, so I was placed in 1-2 weeks of intensive psychiatric hospitalization, followed by 2 weeks in a halfway house. This was followed by severe depression lasting months, and less severe depression from then on, which I still haven’t fully recovered from. I had PTSD symptoms after the event and am still recovering.

During this time, I was intensely scrupulous; I believed that I was intrinsically evil, had destroyed significant parts of the world with my demonic powers, and was in a hell of my own creation. I was catatonic for multiple days, afraid that by moving I would cause harm to those around me. This is in line with scrupulosity-related post-cult symptoms.

Talking about this is to some degree difficult because it’s normal to think of this as “really bad”. Although it was exceptionally emotionally painful and confusing, the experience taught me a lot, very rapidly; I gained and partially stabilized a new perspective on society and my relation to it, and to my own mind. I have much more ability to relate to normal people now, who are also for the most part also traumatized.

(Yes, I realize how strange it is that I was more able to relate to normal people by occupying an extremely weird mental state where I thought I was destroying the world and was ashamed and suicidal regarding this; such is the state of normal Americans, apparently, in a time when suicidal music is extremely popular among youth.)

Like Zoe, I have experienced enormous post-traumatic growth. To quote a song, “I am Woman”: “Yes, I’m wise, but it’s wisdom born of pain. I guess I’ve paid the price, but look how much I’ve gained.”

While most people around MIRI and CFAR didn’t have psychotic breaks, there were at least 3 other cases of psychiatric institutionalizations by people in the social circle immediate to MIRI/CFAR; at least one other than me had worked at MIRI for a significant time, and at least one had done work with MIRI on a shorter-term basis. There was, in addition, a case of someone becoming very paranoid, attacking a mental health worker, and hijacking her car, leading to jail time; this person was not an employee of either organization, but had attended multiple CFAR events including a relatively exclusive AI-focused one.

I heard that the paranoid person in question was concerned about a demon inside him, implanted by another person, trying to escape. (I knew the other person in question, and their own account was consistent with attempting to implant mental subprocesses in others, although I don’t believe they intended anything like this particular effect). My own actions while psychotic later that year were, though physically nonviolent, highly morally confused; I felt that I was acting very badly and “steering in the wrong direction”, e.g. in controlling the minds of people around me or subtly threatening them, and was seeing signs that I was harming people around me, although none of this was legible enough to seem objectively likely after the fact. I was also extremely paranoid about the social environment, being unable to sleep normally due to fear.

There are even cases of suicide in the Berkeley rationality community associated with scrupulosity and mental self-improvement (specifically, Maia Pasek/SquirrelInHell, and Jay Winterford/Fluttershy, both of whom were long-time LessWrong posters; Jay wrote an essay about suicidality, evil, domination, and Roko’s basilisk months before the suicide itself). Both these cases are associated with a subgroup splitting off of the CFAR-centric rationality community due to its perceived corruption, centered around Ziz. (I also thought CFAR was pretty corrupt at the time, and I also attempted to split off another group when attempts at communication with CFAR failed; I don’t think this judgment was in error, though many of the following actions were; the splinter group seems to have selected for high scrupulosity and not attenuated its mental impact.)

The cases discussed are not always of MIRI/CFAR employees, so they’re hard to attribute to the organizations themselves, even if they were clearly in the same or a nearby social circle. Leverage was an especially legible organization, with a relatively clear interior/exterior distinction, while CFAR was less legible, having a set of events that different people were invited to, and many conversations including people not part of the organization. Hence, it is easier to attribute organizational responsibility at Leverage than around MIRI/CFAR. (This diffusion of responsibility, of course, doesn’t help when there are actual crises, mental health or otherwise.)

Obviously, for every case of poor mental health that “blows up” and is noted, there are many cases that aren’t. Many people around MIRI/CFAR and Leverage, like Zoe, have trauma symptoms (including “cult after-effect symptoms”) that aren’t known about publicly until the person speaks up.

Why do so few speak publicly, and after so long?

Zoe discusses why she hadn’t gone public until now. She first cites fear of response:

Leverage was very good at convincing me that I was wrong, my feelings didn’t matter, and that the world was something other than what I thought it was. After leaving, it took me years to reclaim that self-trust.

Clearly, not all cases of people trying to convince each other that they’re wrong are abusive; there’s an extra dimension of institutional gaslighting, people telling you something you have no reason to expect they actually believe, people being defensive and blocking information, giving implausible counter-arguments, trying to make you doubt your account and agree with their bottom line.

Jennifer Freyd writes about “betrayal blindness”, a common problem where people hide from themselves evidence that their institutions have betrayed them. I experienced this around MIRI/CFAR.

Some background on AI timelines: At the Asilomar Beneficial AI conference, in early 2017 (after AlphaGo was demonstrated in late 2016), I remember another attendee commenting on a “short timelines bug” going around. Apparently a prominent researcher was going around convincing people that human-level AGI was coming in 5-15 years.

This trend in belief included MIRI/CFAR leadership; one person commented that he noticed his timelines trending only towards getting shorter, and decided to update all at once. I’ve written about AI timelines in relation to political motivations before (long after I actually left MIRI).

Perhaps more important to my subsequent decisions, the AI timelines shortening triggered an acceleration of social dynamics. MIRI became very secretive about research. Many researchers were working on secret projects, and I learned almost nothing about these. I and other researchers were told not to even ask each other about what others of us were working on, on the basis that if someone were working on a secret project, they may have to reveal this fact. Instead, we were supposed to discuss our projects with an executive, who could connect people working on similar projects.

I had disagreements with the party line, such as on when human-level AGI was likely to be developed and about security policies around AI, and there was quite a lot of effort to convince me of their position, that AGI was likely coming soon and that I was endangering the world by talking openly about AI in the abstract (not even about specific new AI algorithms). Someone in the community told me that for me to think AGI probably won’t be developed soon, I must think I’m better at meta-rationality than Eliezer Yudkowsky, a massive claim of my own specialness [EDIT: Eliezer himself and Sequences-type thinking, of course, would aggressively disagree with the epistemic methodology advocated by this person]. I experienced a high degree of scrupulosity about writing anything even somewhat critical of the community and institutions (e.g. this post). I saw evidence of bad faith around me, but it was hard to reject the frame for many months; I continued to worry about whether I was destroying everything by going down certain mental paths and not giving the party line the benefit of the doubt, despite its increasing absurdity.

Like Zoe, I was definitely worried about fear of response. I had paranoid fantasies about a MIRI executive assassinating me. The decision theory research I had done came to life, as I thought about the game theory of submitting to a threat of a gun, in relation to how different decision theories respond to extortion.

This imagination, though extreme (and definitely reflective of a cognitive error), was to some degree re-enforced by the social environment. I mentioned the possibility of whistle-blowing on MIRI to someone I knew, who responded that I should consider talking with Chelsea Manning, a whistleblower who is under high threat. There was quite a lot of paranoia at the time, both among the “establishment” (who feared being excluded or blamed) and “dissidents” (who feared retaliation by institutional actors). (I would, if asked to take bets, have bet strongly against actual assassination, but I did fear other responses.)

More recently (in 2019), there were multiple masked protesters at a CFAR event (handing out pamphlets critical of MIRI and CFAR) who had a SWAT team called on them (by camp administrators, not CFAR people, although a CFAR executive had called the police previously about this group), who were arrested, and are now facing the possibility of long jail time. While this group of people (Ziz and some friends/associates) chose an unnecessarily risky way to protest, hearing about this made me worry about violently authoritarian responses to whistleblowing, especially when I was under the impression that it was a CFAR-adjacent person who had called the cops to say the protesters had a gun (which they didn’t have), which is the way I heard the story the first time.

Zoe further talks about how the experience was incredibly confusing and people usually only talk about the past events secretively. This matches my experience.

Like Zoe, I care about the people I interacted with during the time of the events (who are, for the most part, colleagues who I learned from), and I don’t intend to cause harm to them through writing about these events.

Zoe discusses an unofficial NDA people signed as they left, agreeing not to talk badly of the organization. While I wasn’t pressured to sign an NDA, there were significant security policies discussed at the time (including the one about researchers not asking each other about research). I was discouraged from writing a blog post estimating when AI would be developed, on the basis that a real conversation about this topic among rationalists would cause AI to come sooner, which would be more dangerous (the blog post in question would have been similar to the AI forecasting work I did later, here and here; judge for yourself how dangerous this is). This made it hard to talk about the silencing dynamic; if you don’t have the freedom to speak about the institution and limits of freedom of speech, then you don’t have freedom of speech.

(Is it a surprise that, after over a year in an environment where I was encouraged to think seriously about the possibility that simple actions such as writing blog posts about AI forecasting could destroy the world, I would develop the belief that I could destroy everything through subtle mental movements that manipulate people?)

Years before, MIRI had a non-disclosure agreement that members were pressured to sign, as part of a legal dispute with Louie Helm.

I was certainly socially discouraged from revealing things that would harm the “brand” of MIRI and CFAR, by executive people. There was some discussion at the time of the possibility of corruption in EA/rationality institutions (e.g. Ben Hoffman’s posts criticizing effective altruism, GiveWell, and the Open Philanthropy Project); a lot of this didn’t end up on the Internet due to PR concerns.

Someone who I was collaborating with at the time (Michael Vassar) was commenting on social epistemology and the strengths and weaknesses of various people’s epistemology and strategy, including people who were leaders at MIRI/CFAR. Subsequently, Anna Salamon said that Michael was causing someone else at MIRI to “downvote Eliezer in his head” and that this was bad because it meant that the “community” would not agree about who the leaders were, and would therefore have akrasia issues due to the lack of agreement on a single leader in their head telling them what to do. (Anna says, years later, that she was concerned about bias in selectively causing downvotes rather than upvotes; however, at the time, based on what was said, I had the impression that the primary concern was about coordination around common leadership rather than bias specifically.)

This seemed culty to me and some friends; it’s especially evocative in relation to Julian Jaynes’ writing about bronze age cults, which detail a psychological model in which idols/gods give people voices in their head telling them what to do.

(As I describe these events in retrospect they seem rather ridiculous, but at the time I was seriously confused about whether I was especially crazy or in-the-wrong, and the leadership was behaving sensibly. If I were the type of person to trust my own judgment in the face of organizational mind control, I probably wouldn’t have been hired in the first place; everything I knew about how to be hired would point towards having little mental resistance to organizational narratives.)

Strange psycho-social-metaphysical hypotheses in a group setting

Zoe gives a list of points showing how “out of control” the situation at Leverage got. This is consistent with what I’ve heard from other ex-Leverage people.

The weirdest part of the events recounted is the concern about possibly-demonic mental subprocesses being implanted by other people. As a brief model of something similar to this (not necessarily the same model as the Leverage people were using): people often pick up behaviors (“know-how”) and mental models from other people, through acculturation and imitation. Some of this influence could be (a) largely unconscious on the part of the receiver, (b) partially intentional or the part of the person having mental effects on others (where these intentions may include behaviorist conditioning, similar to hypnosis, causing behaviors to be triggered under certain circumstances), and (c) overall harmful to the receiver’s conscious goals. According to IFS-like psychological models, it’s common for a single brain to contain multiple sub-processes with different intentions. While the mental subprocess implantation hypothesis is somewhat strange, it’s hard to rule out based on physics or psychology.

As weird as the situation got, with people being afraid of demonic subprocesses being implanted by other people, there were also psychotic breaks involving demonic subprocess narratives around MIRI and CFAR. These strange experiences are, as far as I can tell, part of a more general social phenomenon around that time period; I recall a tweet commenting that the election of Donald Trump convinced everyone that magic was real.

Unless there were psychiatric institutionalizations or jail time resulting from the Leverage psychosis, I infer that Leverage overall handled their metaphysical weirdness better than the MIRI/CFAR adjacent community. While in Leverage the possibility of subtle psychological influence between people was discussed relatively openly, around MIRI/CFAR it was discussed covertly, with people being told they were crazy for believing it might be possible. (I noted at the time that there might be a sense in which different people have “auras” in a way that is not less inherently rigorous than the way in which different people have “charisma”, and I feared this type of comment would cause people to say I was crazy.)

As a consequence, the people most mentally concerned with strange social metaphysics were marginalized, and had more severe psychoses with less community support, hence requiring normal psychiatric hospitalization.

The case Zoe recounts of someone “having a psychotic break” sounds tame relative to what I’m familiar with. Someone can mentally explore strange metaphysics, e.g. a different relation to time or God, in a supportive social environment where people can offer them informational and material assistance, and help reality-check their ideas.

Alternatively, like me, they can explore these metaphysics while:

losing days of sleep
becoming increasingly paranoid and anxious
feeling delegitimized and gaslit by those around them, unable to communicate their actual thoughts with those around them
fearing involuntary psychiatric institutionalization
experiencing involuntary psychiatric institutionalization
having almost no real mind-to-mind communication during “treatment”
learning primarily to comply and to play along with the incoherent, shifting social scene (there were mandatory improv classes)
being afraid of others in the institution, including being afraid of sexual assault, which is common in psychiatric hospitals
believing the social context to be a “cover up” of things including criminal activity and learning to comply with it, on the basis that one would be unlikely to exit the institution within a reasonable time without doing so

Being able to discuss somewhat wacky experiential hypotheses, like the possibility of people spreading mental subprocesses to each other, in a group setting, and have the concern actually taken seriously as something that could seem true from some perspective (and which is hard to definitively rule out), seems much more conducive to people’s mental well-being than refusing to have that discussion, so they struggle with (what they think is) mental subprocess implantation on their own. Leverage definitely had large problems with these discussions, and perhaps tried to reach more intersubjective agreement about them than was plausible (leading to over-reification, as Zoe points out), but they seem less severe than the problems resulting from refusing to have them, such as psychiatric hospitalization and jail time.

“Psychosis” doesn’t have to be a bad thing, even if it usually is in our society; it can be an exploration of perceptions and possibilities not before imagined, in a supportive environment that helps the subject to navigate reality in a new way; some of R.D. Liang’s work is relevant here, describing psychotic mental states as a result of ontological insecurity following from an internal division of the self at a previous time. Despite the witch hunts and so on, the Leverage environment seems more supportive than what I had access to. The people at Leverage I talk to, who have had some of these unusual experiences, often have a highly exploratory attitude to the subtle mental realm, having gained access to a new cognitive domain through the experience, even if it was traumatizing.

World-saving plans and rarity narratives

Zoe cites the fact that Leverage has a “world-saving plan” (which included taking over the world) and considered Geoff Anders and Leverage to be extremely special, e.g. Geoff being possibly the best philosopher ever:

Within a few months of joining, a supervisor I trusted who had recruited me confided in me privately, “I think there’s good reason to believe Geoff is the best philosopher who’s ever lived, better than Kant. I think his existence on earth right now is an historical event.”

Like Leverage, MIRI had a “world-saving plan”. This is no secret; it’s discussed in an Arbital article written by Eliezer Yudkowsky. Nate Soares frequently talked about how it was necessary to have a “plan” to make the entire future ok, to avert AI risk; this plan would need to “backchain” from a state of no AI risk and may, for example, say that we must create a human emulation using nanotechnology that is designed by a “genie” AI, which does a narrow task rather than taking responsibility for the entire future; this would allow the entire world to be taken over by a small group including the emulated human. [EDIT: See Nate’s clarification, the small group doesn’t have to be MIRI specifically, and the upload plan is an example of a plan rather than a fixed super-plan.]

I remember taking on more and more mental “responsibility” over time, noting the ways in which people other than me weren’t sufficient to solve the AI alignment problem, and I had special skills, so it was uniquely my job to solve the problem. This ultimately broke down, and I found Ben Hoffman’s post on responsibility to resonate (which discusses the issue of control-seeking).

The decision theory of backchaining and taking over the world somewhat beyond the scope of this post. There are circumstances where back-chaining is appropriate, and “taking over the world” might be necessary, e.g. if there are existing actors already trying to take over the world and none of them would implement a satisfactory regime. However, there are obvious problems with multiple actors each attempting to control everything, which are discussed in Ben Hoffman’s post.

This connects with what Zoe calls “rarity narratives”. There were definitely rarity narratives around MIRI/CFAR. Our task was to create an integrated, formal theory of values, decisions, epistemology, self-improvement, etc (“Friendliness theory”), which would help us develop Friendly AI faster than the rest of the world combined was developing AGI (which was, according to leaders, probably in less than 20 years). It was said that a large part of our advantage in doing this research so fast was that we were “actually trying” and others weren’t. It was stated by multiple people that we wouldn’t really have had a chance to save the world without Eliezer Yudkowsky (obviously implying that Eliezer was an extremely historically significant philosopher).

Though I don’t remember people saying explicitly that Eliezer Yudkowsky was a better philosopher than Kant, I would guess many would have said so. No one there, as far as I know, considered Kant worth learning from enough to actually read the Critique of Pure Reason in the course of their research; I only did so years later, and I’m relatively philosophically inclined. I would guess that MIRI people would consider a different set of philosophers relevant, e.g. would include Turing and Einstein as relevant “philosophers”, and I don’t have reason to believe they would consider Eliezer more relevant than these, though I’m not certain either way. (I think Eliezer is a world-historically-significant philosopher, though not as significant as Kant or Turing or Einstein.)

I don’t think it’s helpful to oppose “rarity narratives” in general. People need to try to do hard things sometimes, and actually accomplishing those things would make the people in question special, and that isn’t a good argument against trying the thing at all. Intellectual groups with high information integrity, e.g. early quantum mechanics people, can have a large effect on history. I currently think the intellectual work I do is pretty rare and important, so I have a “rarity narrative” about myself, even though I don’t usually promote it. Of course, a project claiming specialness while displaying low information integrity is, effectively, asking for more control and resources that it can beneficially use.

Rarity narratives can have the effects of making a group of people more insular, more concentrating relevance around itself and not learning from other sources (in the past or the present), making local social dynamics be more centered on a small number of special people, and increasing pressure on people to try to do (or pretend to try to do) things beyond their actual abilities; Zoe and I both experienced these effects.

(As a hint to evaluating rarity narratives yourself: compare Great Thinker’s public output to what you’ve learned from other public sources; follow citations and see where Great Thinker might be getting their ideas from; read canonical great philosophy and literature; get a quantitative sense of how much insight is coming from which places throughout spacetime.)

The object-level specifics of each case of world-saving plan matter, of course; I think most readers of this post will be more familiar with MIRI’s world-saving plan, especially since Zoe’s post provides few object-level details about the content of Leverage’s plan.

Debugging

Rarity ties into debugging; if what makes us different is that we’re Actually Trying and the other AI research organizations aren’t, then we’re making a special psychological claim about ourselves, that we can detect the difference between actually and not-actually trying, and cause our minds to actually try more of the time.

Zoe asks whether debugging was “required”; she notes:

The explicit strategy for world-saving depended upon a team of highly moldable young people self-transforming into Elon Musks.

I, in fact, asked a CFAR instructor in 2016-17 whether the idea was to psychologically improve yourself until you became Elon Musk, and he said “yes”. This part of the plan was the same [EDIT: Anna clarifies that, while some people becoming like Elon Musk was some people’s plan, there was usually acceptance of people not changing themselves; this might to some degree apply to Leverage as well].

Self-improvement was a major focus around MIRI and CFAR, and at other EA orgs. It often used standard CFAR techniques, which were taught at workshops. It was considered important to psychologically self-improve to the point of being able to solve extremely hard, future-lightcone-determining problems.

I don’t think these are bad techniques, for the most part. I think I learned a lot by observing and experimenting on my own mental processes. (Zoe isn’t saying Leverage’s techniques are bad either, just that you could get most of them from elsewhere.)

Zoe notes a hierarchical structure where people debugged people they had power over:

Trainers were often doing vulnerable, deep psychological work with people with whom they also lived, made funding decisions about, or relied on for friendship. Sometimes people debugged each other symmetrically, but mostly there was a hierarchical, asymmetric structure of vulnerability; underlings debugged those lower than them on the totem pole, never their superiors, and superiors did debugging with other superiors.

This was also the case around MIRI and CFAR. A lot of debugging was done by Anna Salamon, head of CFAR at the time; Ben Hoffman noted that “every conversation with Anna turns into an Anna-debugging-you conversation”, which resonated with me and others.

There was certainly a power dynamic of “who can debug who”; to be a more advanced psychologist is to be offering therapy to others, being able to point out when they’re being “defensive”, when one wouldn’t accept the same from them. This power dynamic is also present in normal therapy, although the profession has norms such as only getting therapy from strangers, which change the situation.

How beneficial or harmful this was depends on the details. I heard that “political” discussions at CFAR (e.g. determining how to resolve conflicts between people at the organization, which could result in people leaving the organization) were mixed with “debugging” conversations, in a way that would make it hard for people to focus primarily on the debugged person’s mental progress without imposing pre-determined conclusions. Unfortunately, when there are few people with high psychological aptitude around, it’s hard to avoid “debugging” conversations having political power dynamics, although it’s likely that the problem could have been mitigated.

[EDIT: See PhoenixFriend’s pseudonymous comment, and replies to it, for more on power dynamics including debugging-related ones at CFAR specifically.]

It was really common for people in the social space, including me, to have a theory about how other people are broken, and how to fix them, by getting them to understand a deep principle you do and they don’t. I still think most people are broken and don’t understand deep principles that I or some others do, so I don’t think this was wrong, although I would now approach these conversations differently.

A lot of the language from Zoe’s post, e.g. “help them become a master”, resonates. There was an atmosphere of psycho-spiritual development, often involving Kegan stages. There is a significant degree of overlap between people who worked with or at CFAR and people at the Monastic Academy [EDIT: see Duncan’s comment estimating that the actual amount of interaction between CFAR and MAPLE was pretty low even though there was some overlap in people].

Although I wasn’t directly financially encouraged to debug people, I infer that CFAR employees were, since instructing people was part of their job description.

Other issues

MIRI did have less time pressure imposed by the organization itself than Leverage did, despite the deadline implied by the AGI timeline; I had no issues with absurdly over-booked calendars. I vaguely recall that CFAR employees were overworked especially around workshop times, though I’m pretty uncertain of the details.

Many people’s social lives, including mine, were spent mostly “in the community”; much of this time was spent on “debugging” and other psychological work. Some of my most important friendships at the time, including one with a housemate, were formed largely around a shared interest in psychological self-improvement. There was, therefore, relatively little work-life separation (which has upsides as well as downsides).

Zoe recounts an experience with having unclear, shifting standards applied, with the fear of ostracism. Though the details of my experience are quite different, I was definitely afraid of being considered “crazy” and marginalized for having philosophy ideas that were too weird, even though weird philosophy would be necessary to solve the AI alignment problem. I noticed more people saying I and others were crazy as we were exploring sociological hypotheses that implied large problems with the social landscape we were in (e.g. people thought Ben Hoffman was crazy because of his criticisms of effective altruism). I recall talking to a former CFAR employee who was scapegoated and ousted after failing to appeal to the winning internal coalition; he was obviously quite paranoid and distrustful, and another friend and I agreed that he showed PTSD symptoms [EDIT: I infer scapegoating based on the public reason given being suspicious/insufficient; someone at CFAR points out that this person was paranoid and distrustful while first working at CFAR as well].

Like Zoe, I experienced myself and others being distanced from old family and friends, who didn’t understand how high-impact the work we were doing was. Since leaving the scene, I am more able to talk with normal people (including random strangers), although it’s still hard to talk about why I expect the work I do to be high-impact.

An ex-Leverage person I know comments that “one of the things I give Geoff the most credit for is actually ending the group when he realized he had gotten in over his head. That still left people hurt and shocked, but did actually stop a lot of the compounding harm.” (While Geoff is still working on a project called “Leverage”, the initial “Leverage 1.0” ended with most of the people leaving.) This is to some degree happening with MIRI and CFAR, with a change in the narrative about the organizations and their plans, although the details are currently less legible than with Leverage.

Conclusion

Perhaps one lesson to take from Zoe’s account of Leverage is that spending relatively more time discussing sociology (including anthropology and history), and less time discussing psychology, is more likely to realize benefits while avoiding problems. Sociology is less inherently subjective and meta than psychology, having intersubjectively measurable properties such as events in human lifetimes and social network graph structures. My own thinking has certainly gone in this direction since my time at MIRI, to great benefit. I hope this account I have written helps others to understand the sociology of the rationality community around 2017, and that this understanding helps people to understand other parts of the society they live in.

There are, obviously from what I have written, many correspondences, showing a common pattern for high-ambition ideological groups in the San Francisco Bay Area. I know there are serious problems at other EA organizations, which produce largely fake research (and probably took in people who wanted to do real research, who become convinced by their experience to do fake research instead), although I don’t know the specifics as well. EAs generally think that the vast majority of charities are doing low-value and/or fake work. I also know that San Francisco startup culture produces cult-like structures (and associated mental health symptoms) with regularity. It seems more productive to, rather than singling out specific parties, think about the social and ecological forces that create and select for the social structures we actually see, which include relatively more and less cult-like structures. (Of course, to the extent that harm is ongoing due to actions taken by people and organizations, it’s important to be able to talk about that.)

It’s possible that after reading this, you think this wasn’t that bad. Though I can only speak for myself here, I’m not sad that I went to work at MIRI instead of Google or academia after college. I don’t have reason to believe that either of these environments would have been better for my overall intellectual well-being or my career, despite the mental and social problems that resulted from the path I chose. Scott Aaronson, for example, blogs about “blank faced” non-self-explaining authoritarian bureaucrats being a constant problem in academia. Venkatesh Rao writes about the corporate world, and the picture presented is one of a simulation constantly maintained thorough improv.

I did grow from the experience in the end. But I did so in large part by being very painfully aware of the ways in which it was bad.

I hope that those that think this is “not that bad” (perhaps due to knowing object-level specifics around MIRI/CFAR justifying these decisions) consider how they would find out whether the situation with Leverage was “not that bad”, in comparison, given the similarity of the phenomena observed in both cases; such an investigation may involve learning object-level specifics about what happened at Leverage. I hope that people don’t scapegoat; in an environment where certain actions are knowingly being taken by multiple parties, singling out certain parties has negative effects on people’s willingness to speak without actually producing any justice.

Aside from whether things were “bad” or “not that bad” overall, understanding the specifics of what happened, including harms to specific people, is important for actually accomplishing the ambitious goals these projects are aiming at; there is no reason to expect extreme accomplishments to result without very high levels of epistemic honesty.

What links here?

jessicata16 Oct 2021 21:28 UTC

84 points

949 comments22 min readLW link

Center for Applied Rationality (CFAR)Machine Intelligence Research Institute (MIRI)Drama Leverage Research