AnnaSalamon comments on We run the Center for Applied Rationality, AMA

AnnaSalamon 24 Dec 2019 17:47 UTC
78 points
My closest current stab is that we’re the “Center for Bridging between Common Sense and Singularity Scenarios.” (This is obviously not our real name. But if I had to grab a handle that gestures at our raison d’etre, at the moment I’d pick this one. We’ve been internally joking about renaming ourselves this for some months now.)

To elaborate: thinking about singularity scenarios is profoundly disorienting (IMO, typically worse than losing a deeply held childhood religion or similar). Folks over and over again encounter similar failure modes as they attempt this. It can be useful to have an institution for assisting with this—collecting concepts and tools that were useful for previous waves who’ve attempted thought/work about singularity scenarios, and attempting to pass them on to those who are currently beginning to think about such scenarios.

Relatedly, the pattern of thinking required for considering AI risk and related concepts at all is pretty different from the patterns of thinking that suffice in most other contexts, and it can be useful to have a group that attempts to collect these and pass them forward.

Further, it can be useful to figure out how the heck to do teams and culture in a manner that can withstand the disruptions that can come from taking singularity scenarios seriously.

So, my best current angle on CFAR is that we should try to be a place that can help people through these standard failure modes—a place that can try to answer the question “how can we be sane and reasonable and sensible and appropriately taking-things-seriously in the face of singularity scenarios,” and can try to pass on our answer, and can notice and adjust when our answer turns out to be invalid.

To link this up with our concrete activities:

AIRCS workshops / MSFP:
- Over the last year, about half our staff workshop-days went into attempting to educate potential AI alignment researchers. These programs were co-run with MIRI. Workshops included a bunch about technical AI content; a bunch of practice thinking through “is there AI risk” and “how the heck would I align a superintelligence” and related things; and a bunch of discussion of e.g. how to not have “but the stakes are really really big” accidentally overwhelm one’s basic sanity skills (and other basic pieces of how to not get too disoriented).
- Many program alumni attended multiple workshops, spaced across time, as part of a slow acculturation process: stare at AI risk; go back to one’s ordinary job/school context for some months while digesting in a back-burner way; repeat.
- These programs aim at equipping people to contribute to Al alignment technical work at MIRI and elsewhere; in the last two years it’s helped educate a sizable number of MIRI hires and smaller but still important number of others (brief details in our 2019 progress report; more details coming eventually). People sometime try to gloss the impact of AIRCS as “outreach” or “causing career changes,” but, while I think it does in fact fulfill CEA-style metrics, that doesn’t seem to me like a good way to see its main purple—helping folks feel their way toward being more oriented and capable around these topics in general, in a context where other researchers have done or are doing likewise.
- They seem like a core activity for a “Center for bridging between common sense and singularity scenarios”—both in that they tell us more about what happens when folks encounter AI risk, and in that they let us try to use what we think we know for good. (We hope it’s “good.”)
Mainline workshops, alumni reunions, alumni workshops unrelated to AI risk, etc.:
- We run mainline workshops (which many people just call “CFAR workshops”), alumni reunions, and some topic-specific workshops for alumni that have nothing to do with AI risk (e.g., a double crux workshop). Together, this stuff constituted about 30% of our staff workshop-days over the last two years.
- The EA crowd often asks me why we run these. (“Why not just run AI safety workshops, since that is the part of your work that has more shot at helping large numbers of people?”) The answer is that when I imagine removing the mainline workshops, CFAR begins to feel like a table without enough legs—unstable, liable to work for awhile but then fall over, lacking enough contact with the ground.
- More concretely: we’re developing and spreading a nonstandard mental toolkit (inner sim, double crux, Gendlin’s Focusing, etc.). That’s a tricky and scary thing to do. It’s really helpful to get to try it on a variety of people—especially smart, thoughtful, reflective, articulate people who will let us know what seems like a terrible idea, or what causes them help in their lives, or disruption in their lives. The mainline workshops (plus follow-up sessions, alumni workshops, alumni reunions, etc.) let us develop this alleged “bridge” between common sense and singularity scenarios in a way that avoids overfitting it all to just “AI alignment work.” Which is basically to say that they let us develop and test our models of “applied rationality”.
“Sandboxes” toward trying to understand how to have a healthy culture in contact with AI safety:
- I often treat the AIRCS workshops as “sandboxes”, and try within them to create small temporary “cultures” in which we try to get research to be able to flourish, or try to get people to be able to both be normal humans and slowly figure out how to approach AI alignment, or whatever. I find them a pretty productive vehicle for trying to figure out the “social context” thing, and not just the “individual thinking habits” thing. I care about this experimentation-with-feedback because I want MIRI and other longer-term teams to eventually have the right cultural base.
Our instructor training program, and our attempt to maintain a staff who is skilled at seeing what cognitive processes are actually running in people:
- There’s a lot of trainable, transferable skill to seeing what people are thinking. CFAR staff have a bunch of this IMO, and we seem to me to be transferring a bunch of it to the instructor candidates too. We call it “seeking PCK”.
- The “seeking PCK” skillset is obviously helpful for learning to “bridge between common sense and singularity scenarios”—it helps us see what the useful patterns folks have are, and what the not-so-useful patterns folks have are, and what exactly is happening as we attempt to intervene (so that we can adjust our interventions).
- Thus, improving and maintaining the “seeking PCK” skillset probably makes us faster at developing any other curriculum.
- More mundanely, of course, instructor training also gives us guest instructors who can help us run workshops—many of whom are also out and about doing other interesting things, and porting wisdom/culture/data back and forth between those endeavors and our workshops.
To explain what “bridging betwen common sense and singularity scenarios” has to do with “applied rationality” and the LW Sequences and so on:
- The farther off you need to extrapolate, the more you need reasoning (vs being able to lean on either received wisdom, or known data plus empirical feedback loops). And singularity scenarios sure are far from the everyday life our heuristics are developed for, so singularity scenarios benefit more than most from trying to be the lens that sees its flaws, and from Sequences-style thinking more broadly.
What links here?
- Misha_Yagudin's comment on 2019 AI Alignment Literature Review and Charity Comparison by Larks (EA Forum; 27 Dec 2019 15:24 UTC; 4 points)
- AnnaSalamon 24 Dec 2019 18:22 UTC
  61 points
  Parent
  Examples of some common ways that people sometimes find Singularity scenarios disorienting:
  
  When a person loses their childhood religion, there’s often quite a bit of bucket error. A person updates on the true fact “Jehovah is not a good explanation of the fossil record” and accidentally confuses that true fact with any number of other things, such as “and so I’m not allowed to take my friends’ lives and choices as real and meaningful.”
  
  I claimed above that “coming to take singularity scenarios seriously” seems in my experience to often cause even more disruption / bucket errors / confusions / false beliefs than does “losing a deeply held childhood religion.” I’d like to elaborate on that here by listing some examples of the kinds of confusions/errors I often encounter.
  
  None of these are present in everyone who encounters Singularity scenarios, or even in most people who encounter it. Still, each confusion below is one where I’ve seen it or near-variants of it multiple times.
  
  Also note that all of these things are “confusions”, IMO. People semi-frequently have them at the beginning and then get over them. These are not the POV I would recommend or consider correct—more like the opposite—and I personally think each stems from some sort of fixable thinking error.)
  - The imagined stakes in a singularity are huge. Common confusions related to this:
    Confusion about whether it is okay to sometimes spend money/time/etc. on oneself, vs. having to give it all to attempting to impact the future.
    Confusion about whether one wants to take in singularity scenarios, given that then maybe one will “have to” (move across the country / switch jobs / work all the time / etc.)
    Confusion about whether it is still correct to follow common sense moral heuristics, given the stakes.
    Confusion about how to enter “hanging out” mode, given the stakes and one’s panic. (“Okay, here I am at the beach with my friends, like my todo list told me to do to avoid burnout. But how is it that I used to enjoy their company? They seem to be making meaningless mouth-noises that have nothing to do with the thing that matters…”)
    Confusion about how to take an actual normal interest in one’s friends’ lives, or one’s partner’s lives, or one’s Lyft drivers’ lives, or whatever, given that within the person’s new frame, the problems they are caught up in seem “small” or “irrelevant” or to have “nothing to do with what matters”.
  - The degrees of freedom in “what should a singularity maybe do with the future?” are huge. And people are often morally disoriented by that part. Should we tile the universe with a single repeated mouse orgasm, or what?
    Are we allowed to want humans and ourselves and our friends to stay alive? Is there anything we actually want? Or is suffering bad without anything being better-than-nothing?
    If I can’t concretely picture what I’d do with a whole light-cone (maybe because it is vastly larger than any time/money/resources I’ve ever personally obtained feedback from playing with) -- should I feel that the whole future is maybe meaningless and no good?
  - The world a person finds themselves in once they start taking Singularity scenarios seriously is often quite different from what the neighbors think, which itself can make things hard
    Can I have a “real” conversation with my friends? Should I feel crazy? Should I avoid taking all this in on a visceral level so that I’ll stay mentally in the same world as my friends?
    How do I keep regarding other peoples’ actions as good and reasonable? The imagined scales are very large, with the result that one can less assume “things are locally this way” is an adequate model.
    Given this, should I get lost in “what about simulations / anthropics” to the point of becoming confused about normal day-today events?
  - In order to imagine this stuff, folks need to take seriously reasoning that is neither formal mathematics, nor vetted by the neighbors or academia, nor strongly based in empirical feedback loops.
    Given this, shall I go ahead and take random piles of woo seriously also?
  There are lots more where these came from, but I’m hoping this gives some flavor, and makes it somewhat plausible why I’m claiming that “coming to take singularity scenarios seriously can be pretty disruptive to common sense,” and such that it might be nice to try having a “bridge” that can help people lose less of the true parts of common sense as their world changes (much as it might be nice for someone who has just lost their childhood religion to have a bridge to “okay, here are some other atheists, and they don’t think that God is why they should get up in the morning and care about others, but they do still seem to think they should get up in the morning and care about others”).
  What links here?
  - Ben Pace's comment on AIRCS Workshop: How I failed to be recruited at MIRI. by Arthur Milchior (7 Jan 2020 2:12 UTC; 21 points)
  - Wei Dai 25 Dec 2019 5:43 UTC
    22 points
    Parent
    
    and makes it somewhat plausible why I’m claiming that “coming to take singularity scenarios seriously can be pretty disruptive to common sense,” and such that it might be nice to try having a “bridge” that can help people lose less of the true parts of common sense as their world changes
    
    Can you say a bit more about how CFAR helps people do this? Some of the “confusions” you mentioned are still confusing to me. Are they no longer confusing to you? If so, can you explain how that happened and what you ended up thinking on each of those topics? For example lately I’m puzzling over something related to this:
    
    Given this, should I get lost in “what about simulations / anthropics” to the point of becoming confused about normal day-today events?
  - Howie Lempel 25 Dec 2019 17:48 UTC
    9 points
    Parent
    [Possibly digging a bit too far into the specifics so no worries if you’d rather bow out.]
    Do you think these confusions[1] are fairly evenly dispersed throughout the community (besides what you already mentioned: “People semi-frequently have them at the beginning and then get over them.”)?
    Two casual observations: (A) the confusions seem less common among people working full-time at EA/Rationalist/x-risk/longtermist organisation than in other people who “take singularity scenarios seriously.”[2] (B) I’m very uncertain but they also seem less prevalent to me in the EA community than the rationalist community (to the extent the communities can be separated).[3] [4]
    Do A and B sound right to you? If so, do you have a take on why that is?
    If A or B *are* true, do you think this is in any part caused by the relative groups taking the singularity [/x-risk/the future/the stakes] less seriously? If so, are there important costs from this?
    
    [1] Using your word while withholding my own judgment as to whether every one of these is actually a confusion.
    [2] If you’re right that a lot of people have them at the beginning and then get over them, a simple potential explanation would be that by the time you’re working at one of these orgs, that’s already happened.
    Other hypothesis: (a) selection effects; (b) working FT in the community gives you additional social supports and makes it more likely others will notice if you start spiraling; (c) the cognitive dissonance with the rest of society is a lot of what’s doing the damage. It’s easier to handle this stuff psychologically if the coworkers you see every day also take the singularity seriously.[i]
    [3] For example perhaps less common at Open Phil, GPI, 80k, and CEA than CFAR and MIRI but I also think this holds outside of professional organisations.
    [4] One potential reason for this is that a lot of EA ideas are more “in the air” than rationalist/singularity ones. So a lot of EAs may have had their ‘crisis of faith’ before arriving in the community. (For example, I know plenty of EAs (myself included) who did some damage to themselves in their teens or early twenties by “taking Peter Singer really seriously.”
    [i] I’ve seen this kind of dissidence offered as a (partial) explanation of why PTSD has become so common among veterans & why it’s so hard for them to reintegrate after serving a combat tour. No clue if the source is reliable/widely held/true. It’s been years but I think I got it from Odysseus in America or perhaps its predecessor, Achilles in Vietnam.
  - Howie Lempel 25 Dec 2019 17:15 UTC
    4 points
    Parent
    This seemed really useful. I suspect you’re planning to write up something like this at some point down the line but wanted to suggest posting this somewhere more prominent in the meantime (otoh, idea inoculation, etc.)
  - Artyom Kazak 9 Jan 2020 1:24 UTC
    1 point
    Parent
    The state of confusion you’re describing sounds a lot like Kegan’s 4.5 nihilism (pretty much everything at meaningness.com is relevant). A person’s values have been demolished by a persuasive argument, but they haven’t yet internalized that people are “allowed” to create their own systems and values. Alright.
    1. I assume that LW-adjacent people should actually be better at guiding people out of this stage, because a lot of people in the community have gone through the same process and there is an extensive body of work on the topic (Eliezer’s sequences on human values, David Chapman’s work, Scott Alexander’s posts on effective altruism / axiology-vs-morality / etc).
    2. I also assume that in general we want people to go through this process – it is a necessary stage of adult development.
    Given this, I’m leaning towards “guiding people towards nihilism is good as long as you don’t leave them in the philosophical dark re/ how to get out of it”. So, taking a random smart person, persuading them they
    should care about Singularity, and leaving – this isn’t great. But introducing people to AI risk in the context of LW seems much more benign to me.
- Eli Tyre 24 Dec 2019 23:47 UTC
  12 points
  Parent
  We’ve been internally joking about renaming ourselves this for some months now.
  I’m not really joking about it. I wish the name better expressed what the organization does.
  Though I admit that CfBCSSS, leaves a lot to be desired in terms of acronyms.
  - johnswentworth 25 Dec 2019 0:39 UTC
    34 points
    Parent
    I nominate “Society of Effective Epistemics For AI Risk” or SEE-FAR for short.
    - AnnaSalamon 25 Dec 2019 2:02 UTC
      13 points
      Parent
      :) There’s something good about “common sense” that isn’t in “effective epistemics”, though—something about wanting not to lose the robustness of the ordinary vetted-by-experience functioning patterns. (Even though this is really hard, plausibly impossible, when we need to reach toward contexts far from those in which our experiences were based.)
    - Eli Tyre 25 Dec 2019 0:41 UTC
      8 points
      Parent
      This is the best idea I’ve heard yet.
      It would be pretty confusing to people, and yet...
  - AnnaSalamon 25 Dec 2019 2:01 UTC
    7 points
    Parent
    To clarify: we’re not joking about the need to get “what we do” and “what people think we do” more in alignment, via both communicating better and changing our organizational name if necessary. We put that on our “goals for 2020” list (both internally, and in our writeup). We are joking that CfBCSSS is an acceptable name (due to its length making it not-really-that).
    
    (Eli works with us a lot but has been taking a leave of absence for the last few months and so didn’t know that bit, but lots of us are not-joking about getting our name and mission clear.)
- Howie Lempel 25 Dec 2019 17:16 UTC
  7 points
  Parent
  My closest current stab is that we’re the “Center for Bridging between Common Sense and Singularity Scenarios.
  [I realise there might not be precise answers to a lot of these but would still be interested in a quick take on any of them if anybody has one.]
  Within CFAR, how much consensus is there on this vision? How stable/likely to change do you think it is? How long has this been the vision for (alternatively, how long have you been playing with this vision for)? Is it possible to describe what the most recent previous vision was?
- habryka 24 Dec 2019 18:31 UTC
  4 points
  Parent
  These programs aim at equipping people to contribute to Al alignment technical work at MIRI and elsewhere; in the last two years, N hires have come out of them. I’m sure some (but not all) of those hires would’ve happened even without these programs; I suspect though that
  Typo: My guess is that the N should be replaced with a number, and the sentence wasn’t intended to trail off like that.