Interpersonal Approaches for X-Risk Education
Much of the AI research community remains unaware of the Alignment Problem (according to my personal experience), and I haven’t seen much discussion about how to deliberately expand the community (all I’ve seen to this effect is Scott’s A/B/C/D/E testing on alignment articles).
Expanding the number of people aware of (and ideally, working on) the alignment problem is a high-leverage activity: a constant amount of effort spent educating someone in exchange for a chance of recruiting an ally who will work hard at our sides. Another metric by which we should evaluate approaches is whether we have to convince or simply educate; professors and high-status researchers may be more dismissive (possibly due to the inside view, their wariness of strange-sounding ideas, and overconfidence in their long-term predictions), but their influence would be greater. On the other hand, a good friend in a CS or Math under-/post-graduate program may be more receptive.
In my case, I stumbled upon HP:MoR one year ago, read the Sequences, and then read more about Alignment and CEV. I appreciated that Alignment was a serious problem, but it wasn’t until I got through Superintelligence that I realized it’s basically The Problem. Being in the second year of my doctorate program, I didn’t know whether I was “too late” to start learning the math, “too far behind” people like Eliezer to make a difference. What I did know is that everyone can’t defect—we need people to put in the work, and we probably need substantially more people doing so.
What happened to me took a lot of time and may be unrealistic to recommend to others. The articles Scott tested seem equally effective; instead, I’d like to discuss what social approaches work best for taking people from friend to friend-who-takes-alignment-seriously (while optimizing against effort expended), and whether this is an efficient use of our time.
It’s also high-variance; I think there’s a risk here that you’re not modeling, along the lines of idea inoculation. If you do a bad job raising awareness you can hurt your cause by making the cause look dumb, low-status, a thing that only cranks worry about, etc. and this could be extremely difficult to undo. I am mostly relieved and not worried to see that most people are not even trying, and basically happy to leave this job to people like Andrew Critch, who is doing it with in-person talks and institutional backing.
There’s an opposite problem which I’m less worried about, but if working on AI safety becomes too high-status then the people who show up to do it need to be filtered for not trying to take advantage, so there’s a cost there. Currently the ways in which it’s still somewhat difficult to learn enough about AI safety to care acts as the filter.
My read on the situation is that I am happy for people to reach out to their close friends, and generally expect little harm to come from that, but encourage people to be very hesitant to reach out to large communities or the public at large.
This article struck me as more asking for advice on how to get people you are already close to interested in AI alignment, which strikes me as significantly less high-variance.
can confirm, that’s what I had in mind (at least in my case).
Oh, good, that seems much less dangerous. Usually when people talk about “raising awareness” they’re talking about mass awareness, protests, T-shirts, rallies, etc.
Yeah, want to add that my initial response to the post was strongly negative (becase pattern matching), but after a closer reading (and the title change <3) I’m super happy with this post.
I agree, and I realized this a bit after leaving my keyboard. The problem is that we don’t have enough people doing this kind of outreach, in my opinion. It might be a good idea to get more people doing pretty good outreach than just have a few doing great outreach.
The other question is how hard it is to find people like me—constant effort for a very low probability outcome could be suboptimal compared to just spending more time on the problem ourselves. I don’t think we’re there yet, but it’s something to consider.
What happened to you seems pretty representative to me of how a lot of the most promising people who showed up in the last 5 years started working on AI alignment. So it’s not obvious to me that recommending others to read the same things you did is completely infeasible or the wrong thing to do.
In general it strikes me as more promising to encourage someone to read HPMOR and then the Sequences, than to get them to read a single article directly on AI and from there try to get them interested in working on AI alignment things. The content of the Sequences strikes me as a more important thing to know to talk sensibly about AI alignment than the object-level problem, and I have a general sense that fields of inquiry are more defined by a shared methodology than a shared object-level problem. Which makes me hesitant to promote AI risk to people whose methodology I expect to fail to make any progress on the problem (and instead would first focus on showing them a methodology that might allow them to actually make some progress on it).
I agree that HPMOR may be the best way to get someone to want read the initially opaque-seeming Sequences: “what if my thought processes were as clear as Rational!Harry’s?”. But the issue then becomes how to send a credible signal about why HPMOR is more than a fun read for those who have less to do, especially for individuals who don’t already read regularly (which was me at the time; luckily, I have a slightly addictive personality and got sucked in).
My little brother will be entering college soon, so I gave him the gift I wish I had received at that age: a set of custom-printed HPMOR tomes. I think this is a stronger signal, but it‘s too costly (and probably strange) to do this for people with whom we aren’t as close.
Not sure I agree. HPMOR is cool, but also a turn-off for many. I’d just mention that I work on AI alignment, and when pressed for details, refer them to Superintelligence.
Nitpick: I’m not yet working on alignment. Also, if someone had given me Superintelligence a year ago, I probably would have fixated on all the words I didn’t know instead of taking the problem seriously. They might become aware of the problem, and maybe even work on it—but as habryka pointed out, they wouldn’t be using rationalist methodology.
Edit: a lot of the value of reading Superintelligence came from having to seriously consider the problem for an extended period of time. I had already read CEV, WaitButWhy, IntelligenceExplosion, and various LW posts about malevolent genies, but it still hadn’t reached the level of “I, personally, want and need to take serious action on this”. I find it hard to imagine that someone could simply pick up Superintelligence and skip right to this state of mind, but maybe I’m generalizing too much from my situation.