Aspiring AI safety researchers should ~argmax over AGI timelines
Epistemic status: This model is mostly based on a few hours of dedicated thought, and the post was written in 30 min. Nevertheless, I think this model is probably worth considering.
Many people seem to be entering the AI safety ecosystem, acquiring a belief in short timelines and high P(doom), and immediately dropping everything to work on AI safety agendas that might pay off in short-timeline worlds. However, many of these people might not have a sufficient “toolbox” or research experience to have much marginal impact in short timelines worlds.
Rather than tell people what they should do on the object level, I sometimes tell them:
Write out your credences for AGI being realized in 2027, 2032, and 2042;
Write out your plans if you had 100% credence in each of 2027, 2032, and 2042;
Write out your marginal impact in lowering P(doom) via each of those three plans;
Work towards the plan that is the argmax of your marginal impact, weighted by your credence in the respective AGI timelines.
Some further considerations
If you are risk averse over your marginal impact, you should maybe avoid a true argmax approach and instead choose a plan that pays out some marginal impact in the three timeline scenarios. For example, some shovel-ready, short-timeline AI safety research agendas may help prepare you for long-timeline AI safety research more than others. Consider blending elements of your plans in the three timeline scenarios (the “~” in “~argmax”). Perhaps you have side constraints on your minimal impact in the world where AGI is realized in 2027?
Your immediate plans might be similar in some scenarios. If so, congratulations, you have an easier decision! However, I suspect most aspiring AI safety researchers without research experience should have different plans for different AGI timeline scenarios. For example, getting a Ph.D. in a top lab probably makes most people much better at some aspects of research and working in emerging tech probably makes most people much better at software engineering and operations.
You should be wary of altering your timeline credences in an attempt to rationalize your preferred plan or highest-probability timeline scenario. However, don’t be afraid to update your credences over AGI timelines or your expected marginal impact in those worlds! Revisit your plan often and expect them to change (though hopefully not in predictable ways, as this would make you a bad Bayesian).
Consider how the entire field of AI talent might change if everyone followed the argmax approach I laid out here. Are there any ways they might do something you think is predictably wrong? Does this change your plan?
If you want to develop more finely-grained estimates over timelines (e.g., 2023, 2024, etc.) and your marginal impact in those worlds, feel free to. I prefer to keep the number of options manageable.
Your marginal impact might also change with respect to the process by which AGI is created in different timeline worlds. For example, if AGI arrives in 2023, I imagine that the optimal mechanistic interpretability researcher might not have as high an impact as they would if AGI arrived some years later, when interpretability has potentially had time to scale.
I think this is true for some people, but I also think people tend to overestimate the amount of years it takes to have enough research experience to contribute.
I think a few people have been able to make useful contributions within their first year (though in fairness they generally had backgrounds in ML or AI, so they weren’t starting completely from scratch), and several highly respected senior researchers have just a few years of research experience. (And they, on average, had less access to mentorship/infrastructure than today’s folks).
I also think people often overestimate the amount of time it takes to become an expert in a specific area relevant to AI risk (like subtopics in compute governance, information security, etc.)
Finally, I think people should try to model community growth & neglectedness of AI risk in their estimates. Many people have gotten interested in AI safety in the last 1-3 years. I expect that many more will get interested in AI safety in the upcoming years. Being one researcher in a field of 300 seems more useful than being one researcher in a field of 1500.
With all that in mind, I really like this exercise, and I expect that I’ll encourage people to do this in the future:
[Note: written on a phone, quite rambly and disorganized]
I broadly agree with the approach, some comments:
people’s timelines seem to be consistently updated in the same direction (getting shorter). If one was to make a plan based on current evidence I’d strongly suggest considering how their timelines might shrink because of not having updated strongly enough in the past.
a lot of my coversations with aspiring ai safety researchers goes something like “if timelines were so short I’d have basically no impact, that’s why I’m choosing to do a PhD” or “[specific timelines report] gives X% of TAI by YYYY anyway”. I believe people who choose to do research drastically underestimate the impact they could have in short timelines worlds (esp. through under-explored non-research paths, like governance / outreach etc) and overestimate the probability of AI timelines reports being right.
as you said, it makes senses to consider plans that works in short timelines and improve things in medium/long timelines as well. Thus you might actually want to estimate the EV of a research policy for 2023-2027 (A), 2027-2032 (B) and 2032-2042 (C) where by plicy I mean you apply a strategy for either A and update if no AGI in 2027, or you apply a strategy for A+B and update in 2032, etc.
It also makes sense to consider who could help you with your plan. If you plan to work at Anthropic, OAI, Conjecture etc it seems that many people there consider seriously the 2027 scenario, and teams there would be working on short timelines agendas matter what.
if you’d have 8x more impact on a long timelines scenario than short timelines, but consider short timelines only 7x more likely, working as if long timelines were true would create a lot of cognitive dissonance which could turn out to be counterproductive
if everyone was doing this and going to PhD, the community would end up producing less research now, therefore having less research for the ML community to interact with in the meantime. It would also reduce the number of low-quality research, and admittedly doing PhD one would also publish papers that would be a better way to attract more academics to the field.
one should stress the importance of testing for personal fit early on. If you think you’d be a great researcher in 10 years but have never tried research, consider doing internships / publishing research before going through the grad school pipeline? Also PhD can be a lonely path and unproductive for many. Especially if the goal is to do AI Safety research, test the fit for direct work as early as possible (alignment research is surprisingly more pre-paradigmatic than mainstream ML research)
Hmm. Since most of my probability mass is in <5 years range, it seems this is just going to mislead people into not being at all helpful? Why not do this but for the years 2024, 2026, 2028? What makes you privilege the years you chose to mention?
These days have particular significance in my AGI timelines ranking and I think are a good default spread based on community opinion. However, there is no reason you shouldn’t choose alternate years!
While teamwork seems to be assumed in the article, I believe it’s worth spelling out explicitly that argmaxing for a plan with highest marginal impact might mean joining and/or building a team where the team effort will make the most impact, not optimizing for highest individual contribution.
Spending time to explain why a previous research failed might help 100 other groups to learn from our mistake, so it could be more impactful than pursuing the next shiny idea.
We don’t want to optimize for the naive feeling of individual marginal impact, we want to keep in mind the actual goal is to make an Aligned AGI.
This seems basically reasonable, but as stated I think importantly misses that the plan you follow will change the accuracy of your estimates in steps (1) and (3) when you come to reassess. With 100% credence on some year, there’s no value in picking a plan that gets you evidence about timelines, or evidence of your likely impact in scenarios you’re assuming won’t happen.
It’s not enough to revisit the plan often if the plan you’re following isn’t giving you much new evidence.
Seems right. Explore vs. exploit is another useful frame.
Explore vs. exploit is a frame I naturally use (Though I do like your timeline-argmax frame, as well), where I ask myself “Roughly how many years should I feel comfortable exploring before I really need to be sitting down and attacking the hard problems directly somehow”?
Admittedly, this is confounded a bit by how exactly you’re measuring it. If I have 15-year timelines for median AGI-that-can-kill-us (which is about right, for me) then I should be willing to spend 5-6 years exploring by the standard 1/e algorithm. But when did “exploring” start? Obviously I should count my last eight months of upskilling and research as part of the exploration process. But what about my pre-alignment software engineering experience? If so, that’s now 4⁄19 years spent exploring, giving me about three left. If I count my CS degree as well, that’s 8⁄23 and I should start exploiting in less than a year.
Another frame I like is “hill-climbing”—namely, take the opportunity that seems best at a given moment. Though it is worth asking what makes something the best opportunity if you’re comparing, say, maximum impact now vs. maximum skill growth for impact later.