Probably good projects for the AI safety ecosystem
At EAGxBerkeley 2022, I was asked several times what new projects might benefit the AI safety and longtermist research ecosystem. I think that several existing useful-according-to-me projects (e.g., SERI MATS, REMIX, CAIS, etc.) could urgently absorb strong management and operations talent, but I think the following projects would also probably be useful to the AI safety/longtermist project. Criticisms are welcome.
Projects I might be excited to see, in no particular order:
A London-based MATS clone to build the AI safety research ecosystem there, leverage mentors in and around London (e.g., DeepMind, CLR, David Krueger, Aligned AI, Conjecture, etc.), and allow regional specialization. This project should probably only happen once MATS has ironed out the bugs in its beta versions and grown too large for one location (possibly by Winter 2023). Please contact the MATS team before starting something like this to ensure good coordination and to learn from our mistakes.
Rolling admissions alternatives to MATS’ cohort-based structure for mentors and scholars with different needs (e.g., to support alignment researchers who suddenly want to train/use research talent at irregular intervals but don’t have the operational support to do this optimally).
A combined research mentorship and seminar program that aims to do for AI governance research what MATS is trying to do for technical AI alignment research.
A dedicated bi-yearly workshop for AI safety university group leaders that teaches them how to recognize talent, foster useful undergraduate research projects, and build a good talent development pipeline or “user journey” (including a model of alignment macrostrategy and where university groups fit in).
An organization that does for the Open Philanthropy worldview investigations team what GCP did to supplement CEA’s workshops and 80,000 Hours’ career advising calls.
Further programs like ARENA that aim to develop ML safety engineering talent at scale by leveraging good ML tutors and proven curricula like CAIS’ Intro to ML Safety, Redwood Research’s MLAB, and Jacob Hilton’s DL curriculum for large language module alignment.
More contests like ELK with well-operationalized research problems (i.e., clearly explain what builder/breaker steps look like), clear metrics of success, and have a well-considered target audience (who is being incentivized to apply and why?) and user journey (where do prize winners go next?). Possible contest seeds:
Evan Hubinger’s SERI MATS deceptive AI challenge problem;
Vivek Hebbar’s and Nate Soares’ SERI MATS diamond maximizer selection problem;
Alex Turner’s and Quintin Pope’s SERI MATS training stories selection problem.
More “plug-and-play” curriculums for AI safety university groups, like AGI Safety Fundamentals, Alignment 201, and Intro to ML Safety.
A well-considered “precipism” university course template that critically analyzes Toby Ord’s “The Precipice,” Holden Karnofsky’s “The Most Important Century,” Will MacAskill’s “What We Owe The Future,” some Open Philanthropy worldview investigations reports, some Global Priorities Institute ethics papers, etc.
Hackathons in which people with strong ML knowledge (not ML novices) write good-faith critiques of AI alignment papers and worldviews (e.g., what Jacob Steinhardt’s “ML Systems Will Have Weird Failure Modes” does for Hubinger et al.’s “Risks From Learned Optimization”).
A New York-based alignment hub that aims to provide talent search and logistical support for NYU Professor Sam Bowman’s planned AI safety research group.
More organizations like CAIS that aim to recruit established ML talent into alignment research with clear benchmarks, targeted hackathons/contests with prizes, and offers of funding for large compute projects that focus on alignment. To avoid accidentally furthering AI capabilities, this type of venture needs strong vetting of proposals, possibly from extremely skeptical and doomy alignment researchers.
A talent recruitment and onboarding organization targeting cyber security researchers to benefit AI alignment, similar to Jeffrey Ladish’s and Lennart Heim’s theory of change. A possible model for this organization is the Legal Priorities Project, which aims to recruit and leverage legal talent for longtermist research.
- AI Safety − 7 months of discussion in 17 minutes by 15 Mar 2023 23:41 UTC; 89 points) (EA Forum;
- AGI safety field building projects I’d like to see by 19 Jan 2023 22:40 UTC; 68 points) (
- EA & LW Forums Weekly Summary (5th Dec − 11th Dec 22′) by 13 Dec 2022 2:53 UTC; 27 points) (EA Forum;
- AGI safety field building projects I’d like to see by 24 Jan 2023 23:30 UTC; 25 points) (EA Forum;
- AI Safety − 7 months of discussion in 17 minutes by 15 Mar 2023 23:41 UTC; 25 points) (
- Announcing the EA Project Ideas Database by 22 Jun 2023 20:20 UTC; 14 points) (EA Forum;
- EA & LW Forums Weekly Summary (5th Dec − 11th Dec 22′) by 13 Dec 2022 2:53 UTC; 7 points) (
- 6 Dec 2022 22:04 UTC; 1 point) 's comment on Announcing the AI Safety Field Building Hub, a new effort to provide AISFB projects, mentorship, and funding by (EA Forum;
This is somewhat risky, and should get a lot of oversight. One of the biggest obstacles to discussing safety in academic settings is that academics are increasingly turned off by clumsy, arrogant presentations of the basic arguments for concern.
Thanks for writing this, Ryan! I’d be excited to see more lists like this. A few ideas I’ll add:
An organization that runs high-quality AI safety workshops every month. Target audience = Top people who have finished AGISF or exceptionally-talented people who haven’t yet been exposed to AI safety. Somewhat similar to Impact Generator workshops, the Bright Futures workshop, and these workshops.
MATS adding mentors for non-technical research (e.g., people like Ajeya Cotra or Daniel Kokotajlo)
An organization that produces research designed to make alignment problems more concrete & unlocks new ways for people to contribute to them. Stuff like the specification gaming post & the goal misgeneralization papers.
Writing an intro to AIS piece that is (a) short, (b) technically compelling, and (c) emotionally compelling.
An organization that regularly organizes talks at big tech companies, prestigious academic labs, and AI labs. (The talks are given by senior AIS researchers who are good at presenting ideas to new audiences, but the organization does a lot of the ops/organizing).
I’m skeptical of the usefulness of such an organization. I think we have a plethora of motivated talent passing through AGISF currently that doesn’t need another short workshop or a bunch of low-context 1-1s with researchers who probably have better-vetted people to spend their scarce time on (I think vetting is very hard). I think the AI safety talent development pipeline needs less centralized, short-duration, broad-funnel workshops and more programs that allow for longer-term talent incubation into specific niches (e.g., programs/low-stakes jobs that build critical skills in research vs. management vs. operations) that don’t eat up valuable researcher time unnecessarily and encourage decentralization of alignment research hubs. Sorry if this sounds like bad-faith criticism; it’s not intended to be.
Seems like a great idea. I think I’d strictly prefer “a combined research mentorship and seminar program that aims to do for AI governance research what MATS is trying to do for technical AI alignment research,” because it feels like the optimal program for Cotra or Kokotajlo is a bit different from MATS and likely includes other great governance/worldview/macrostrategy researchers. However, I’ll probably talk to Cotra and Kokotajlo to see if we can add value to them!
I think this is rather the domain of existing academic and industry research groups (e.g., Krueger’s lab, Anthropic, CHAI, OpenAI safety team, DeepMind safety team, etc.), as these groups have the necessary talent and, presumably, motivation. I’d also be excited for MATS scholars and alumni and LTFF-funded independent researchers to work on this.
Seems good, if hard (and not something I’d expect a competition to help with, on priors, if the most capable/aligned people already are not working on this). In particular, I’d be excited to see something that discusses, from first principles, whether solving alignment is significantly harder than similar scientific problems (credit to Caleb for often talking about this).
I’m skeptical that this would be low-risk (in regards to making researchers more skeptical of alignment and less likely to listen to AI safety spokespeople at a critical date) or a counterfactually valuable use of senior AI safety researcher time.
Regarding (2), I think the best AI gov research training org is substantially different from MATS’ idealized form because:
To the extent possible, I think MATS should be trying to solve a compartmentalization of the alignment problem (i.e., the technical part, to the extent that’s separable from governance) because I think people probably grow more as deep researchers from academic cohorts with a dominant focus;
The Bay Area, where MATS is based, is not the governance hub of the US;
One should market differently to pol. sci., international relations, and complex systems researchers compared to ML, maths, physics, and neuroscience researchers;
The MATS seminar program is geared towards technical alignment research and an ideal AI governance seminar program would look substantially different;
I don’t understand the governance literature as well as I understand the technical safety literature and there are probably unknown unknowns;
Currently MATS expects applicants to have background knowledge at the level of the AGI Safety Fundamentals AI Alignment Curriculum and not the AI Governance Curriculum;
I’d rather do one thing well than two things poorly.
Thoughts, Mauricio?
Additional reasons why MATS + MAGS > MATS + governance patch:
MATS is quite large (55 in-person scholars for winter);
There are good alignment mentors MATS could support that might get bumped out if we add governance mentors;
At some point, more MATS team members (to support a larger cohort) might make it hard for us to work effectively (as per Lightcone’s model);
A small, cohesive MAGS team could work in parallel with the MATS team for greater impact than a unified team;
GovAI are already doing something like the hypothetical MAGS, and the OpenAI governance team might want to do the same, which would mean a lot of potential mentors (maybe enough for a MATS-sized program, funders + mentors willing).
The Bay is an AI hub, home to OpenAI, Google, Meta, etc., and therefore an AI governance hub. Governance is not governments. Important decisions are being made there—maybe more important decisions than in DC. To quote Allan Dafoe:
Also, many, many AI governance projects go hand-in-hand with technical expertise.
Maybe more broadly, AI strategy is part of AI governance.
Regarding point 5: AI safety researchers are already taking the time to write talks and present them (e.g., Rohin Shah’s introduction to AI alignment, though I think he has a more ML-oriented version). If we work off of an existing talk or delegate the preparation of the talk, then it wouldn’t take much time for a researcher to present it.
This is a pretty bad time for AI research in China, with the tech recession and chip ban. So there’s a lot of talented researchers looking for jobs. What are your opinions on a Chinese hub for conceptual AI safety research? As far as I can tell, there are no MIRI-level AI safety research teams here in the Sinosphere.
How about individual researchers? Is there anyone prominently associated with “risks of AI takeover” and similar topics? edit: Or even associated with “benefits of AI takeover”!
I would be all for that, personally! Can’t speak for the broader community though, so recommend asking around :)
Currently, MATS finds it hard to bring Chinese scholars over to the US for our independent research/educational seminar program because of US visa restrictions on Chinese citizens. I think there is probably significant interest in a MATS-like program among Chinese residents and certainly lots of Chinese ML talent. I’m generally concerned by approaches to leverage ML talent for AI safety that don’t select hard for value alignment with the longtermist/AI alignment project as this could accidentally develop ML talent that furthers AI capabilities. That said, if participant vetting is good enough and there are clear pathways for alumni to contribute to AI safety, I’m very excited about programs based in China or that focus on recruiting Chinese talent. I think the projects that I’m most excited about in this space would be MATS-like, ARENA-like, or CAIS-like. If you have further ideas, let me know!
What do you mean by “select hard for value alignment”? Chinese culture is very different from that of the US, and EA is almost unheard of. You can influence things by hiring, but expecting very tight conformance with EA culture is… unlikely to work. Are people interested in AI capabilities research currently barred from being hired by alignment orgs? I am very curious what the situation on the ground is.
Also, there are various local legal issues when it comes to advanced research. Sharing genomics data with foreign orgs is pretty illegal, for example. There’s also the problem of not having the possibility of keeping research closed. All companies above a certain size are required to hire a certain number of Party members to act as informants.
So what has stopped there from being more alignment orgs in China? Is it bottleneck local coordination, interest, vetting, or funding? I’d very much be interested in participating in any new projects.
Value alignment here means being focused on improving humanity’s long term future by reducing existential risk, not other specific cultural markers (identifying as EA or rationalist, for example, is not necessary). Having people working towards same goal seems vital for organizational cohesion, and I think alignment orgs would rightly not hire people who are not focused on trying to solve alignment. Upskilling people who are happy to do capabilities jobs without pushing hard internally for capabilities orgs to be more safety focused seems net negative.
I think it’s important for AI safety initiatives to screen for participants that are very likely to go into AI safety research because:
AI safety initiatives eat up valuable free energy in the form of AI safety researchers, engineers, and support staff that could benefit other initiatives;
Longtermist funding is ~30% depleted post-FTX, and therefore the quality and commitment of participants funded by longtermist money are more important now;
Some programs like MLAB might counterfactually improve a participant’s ability to get hired as an AI capabilities researcher, which might mean the program contributes insufficiently to the field of alignment relative to accelerating capabilities.
These concerns might be addressed by:
Requiring all participants in MLAB-style programs to engage with AGISF first;
Selecting ML talent for research programs (like MATS is trying) rather than building ML talent with engineer upskilling programs;
Encouraging participants to seek non-longtermist funding and mentorship for their projects, perhaps through supporting research projects in academia that leverage non-AI safety academic ML research mentorship and funding for AI safety-relevant projects;
Interviewing applicants to assess their motivations;
Offering ~30% less money (and slightly less prestige) than tech internships to filter out people who will leave safety research and work on capabilities after the program.
Could/should there be an organization that helps grant visas to people working on AI safety?
I don’t have insider information, but I think that Aligned AI, Anthropic, ARC, CLR, Conjecture, DeepMind, Encultured AI, FAR AI, MIRI, OpenAI, and Redwood Research (not an all-inclusive list) could all probably offer visas to employees. The MATS Program currently assists scholars in obtaining US B-1 visas or ESTAs and UK Standard Visitor visas. Are you asking whether there should be an organization that aims to hire people to work long-term on AI safety niches that these organizations do not fill, and if so, which niches?
That might be interesting, but I was wondering if one organization could be “the visa people” who do most of the visa-related work for all the organizations you listed. But maybe this work requires little time or is difficult to outsource?
Rethink Priorities and Effective Ventures are fiscal sponsors for several small AI safety organizations and this role could include handling their visas. There might be room for more such fiscal sponsor charities, as Rethink Charity are closing down their fiscal sponsorship program and Players Philanthropy Fund isn’t EA-specific.
We’ve seen a profusion of empirical ML hackathons and contests recently.
Based on Bowman’s comment, I no longer think this is worthwhile.
Apart Research runs hackathons, but these are largely empirical in nature (and still valuable).
Palisade Research now exists and are running the AI Security Forum. However, I don’t think Palisade are quite what I envisaged for this hiring pipeline.
This exists!
These proposals all seem robustly good. What would you think of adding an annual AI existential safety conference?
Currently, it seems that the AI Safety Unconference and maybe the Stanford Existential Risks Conference are filling this niche. I think neither of these conferences is optimizing entirely for the AI safety researcher experience; the former might aim to appeal to NeurIPS attendees outside the field of AI safety, and the latter includes speakers on a variety of x-risks. I think a dedicated AI safety conference/unconference detached from existing ML conferences might be valuable. I also think boosting the presence of AI safety at NeurIPS or other ML conferences might be good, but I haven’t thought deeply about this.
:D
I think my lab is bottlenecked on things other than talent and outside support for now, but there probably is more that could be done to help build/coordinate an alignment research scene in NYC more broadly.
Random thoughts:
Wouldn’t it be best for the rolling admissions MATS be part of MATS?
Some ML safety engineering bootcamps scare me. Once you’re taking in large groups of new-to-EA/new-to-safety people and teaching them how to train transformers, I’m worried about downside risks. I have heard that Redwood has been careful about this. Cool if true.
What does building a New York-based hub look like?
Currently, MATS is somewhat supporting rolling admissions for a minority of mentors with our Autumn and Spring cohorts (which are generally extensions of our Summer and Winter cohorts, respectively). Given that MATS is mainly focused on optimizing the cohort experience for scholars (because we think starting a research project in an academic cohort of people with similar experience with targeted seminars and workshops is ideal), we are probably a worse experience for scholars or mentors who ideally would start research projects at irregular intervals. Some scholars might not benefit as much from the academic cohort experience as others. Some mentors might ideally commit to mentorship during times of the year outside of MATS’ primary Winter/Summer cohorts. Also, MATS’ seminar program doesn’t necessarily run year-round, and we don’t offer as much logistical support to scholars outside of Winter/Summer. There is definitely free energy here for a complementary program, I think.
I am also scared about ML upskilling bootcamps that act as feeder grounds for AI capabilities organizations. I think vetting (including perhaps AGISF prerequisite) is key, as is a clear understanding of where the participants will go next. I only recommend this kind of project because hundreds of people seemingly complete AGISF and want to upskill to work on AI alignment but have scant opportunities. Also, MATS’ theory of change includes adding value through accelerating the development of (rare) “research leads” to increase the “carrying capacity” of the alignment research ecosystem (which theoretically is not principally bottlenecked by “research supporter” talent because training/buying such talent scales much easier than training/buying “research lead” talent). I will publish my reasoning for the latter point as soon as I have time.
Probably ask Sam Bowman. At a minimum, it might consist of an office space for longtermist organizations, like Lightcone or Constellation in Berkeley, some operations staff to make the office run, and some AI safety outreach to NYU and other strong universities nearby, like Columbia. I think some people might already be working on this?
Quick note on 2: CBAI is pretty concerned about our winter ML bootcamp attracting bad-faith applicants and plan to use a combo of AGISF and references to filter pretty aggressively for alignment interest. Somewhat problematic in the medium term if people find out they can get free ML upskilling by successfully feigning interest in alignment, though...
Hey Ryan, I’m considering adding these to AISafety.com/projects but I’m conscious that this post is nearly two years old – is there anything you would change since you first wrote it?
I don’t think I’d change it, but my priorities have shifted. Also, many of the projects I suggested now exist, as indicated in my comments!
This EA Infosec bookclub seems like a good start, but more could be done!
This NSF grant addresses this!
Rethink Priorities are doing this!
Several workshops have run since I posted this, but none quite fits my vision.
The GovAI Summer & Winter Fellowships Program is addressing this niche!
IAPS AI Policy Fellowship also exists now!
The AI Security Initiative at the UC Berkeley Center for Long-Term Cybersecurity probably fulfills this role, though I imagine additional players in this space would be useful.
I envisage something like AI Safety Camp or the AI safety Mentors and Mentees Program, except with the capacity to offer visas.
SPAR exists! Though I don’t think it can offer visas.