I am about to start working on a frontier lab safety team. This post presents a varied set of perspectives that I collected and thought through before accepting my offer. Thanks to the many people I spoke to about this.
For
You’re close to the action. As AI continues to heatup, being closer to the action seems increasingly important. Being at a frontier lab allows you to better understand how frontier AI development actually happens and make better predictions about how it might play out in future. You can build a gears level model of what goes into the design and deployment of current and future frontier systems, and the bureaucratic and political processes behind this, which might inform the kinds of work you decide to do in future (and more broadly, your life choices).
Access to frontier models, compute, and infrastructure. Many kinds of prosaic safety research benefit massively from having direct and elevated access to frontier models and infrastructure to work with them. For instance: Responsible Scaling Policy focussed work that directly evaluates model capabilities and mitigations against specific threat models, model organisms work that builds demonstrations of threat models to serve as a testing ground for safety techniques and scalable oversight work attempting to figure out how to bootstrap and amplify our ability to provide oversight to models in the superhuman regime, to name a few. Other safety agendas might also benefit from access to large amounts of compute and infrastructure: e.g. mechanistic interpretability currently seems to be moving in a more compute-centric direction. Labs are very well resourced in general, and have a large amount of funding that can be somewhat flexibly spent as and when needed (e.g. on contractors, data labellers, etc). Access to non-public models potentially significantly beyond the public state of the art might also generically speed up all work that you do.
Much of the work frontier labs do on empirical technical AI safety is the best in the world. AI safety is talent constrained. There are still not enough people pushing on many of the directions labs work on. By joining, you increase the labs capacity to do such work. If this work is published, this may have a positive impact on safety at all frontier labs. If not, you may still directly contribute to future AGIs built by your lab being safer, either through informing deployment decisions or through research that eventually makes its way into frontier models. The metric of success for lab safety work seems closer to “actually improve safety” than e.g. “publish conference papers”.
Often shorter route to impact. Technical safety work can only have an impact if it either directly or indirectly influences some future important deployed system. The further you are from such a system, the lower your influence might be. For the kinds of work that strive to directly improve safety, if you aren’t at the important lab itself, the causal impact chain must route through people who directly touch the system(s) of importance reading your work, thinking it is good enough to change what they are doing, and then using your ideas. Relatedly, if AGI timelines are short, there is less time for external or earlier stage work to percolate into lab thinking. If you are at the lab, the causal chain becomes much shorter; it is someone in your management line’s job to convince relevant stakeholders that your work is important for improving the safety of the future important deployed system (though note you might not always be able to rely on this mechanism working effectively). That said, plenty of external technical work can also have a large impact. This is often (but not always) through work whose goal is to indirectly influence future systems. I discuss this point in more detail later.
Intellectual environment. Frontier labs generally have a very high saturation of smart, ambitious, talented and experienced people. Having competent collaborators accelerates your work. Mentorship accelerates your development as a technical contributor. More broadly, your intellectual environment really matters, and can make a big difference on both your happiness and outputs. Where you work directly influences who you talk to on a day to day basis, which feeds into feedback on your work, which feeds into your work quality, which feeds into your eventual impact. Labs are not the only place with high densities of people thinking carefully about how to make AI go well, but are one of a select few such places.
Career capital. Working at a frontier lab continues to offer a large amount of career capital. It is among the best ways to gain prosaic AI-specific research and engineering skills. It is arguably even more prestigious and high status now than it used to be, as (general) AI rapidly becomes more and more important in the world. Frontier labs compensate their technical staff extremely well. Besides the obvious benefits, money increases your runway and ability to pursue riskier paths later in life, and capacity to fund progress on top world problems (see GWWC or this advice for giving opportunities in AI safety). If you believe that AGI is only a few years away and will make human intellectual labour obsolete, accruing wealth in advance of that point seems potentially even moreimportant than normal throughout history. The prospects of ex-lab employees are generally strong, and their opinions are respected by a wide range of people. For instance, an OpenAI whistleblower recently testified in front of a senate committee on matters of AI safety, and ex-lab employees (much like ex-FAANG employees) generally have an easy time raising VC funding for startup ventures. On the flip side, there are several career risks to working at a frontier lab worth considering. It seems possible (likely?) that there will be some non-existential AI powered catastrophe in the next few years, and that this may worsen the reputation of AI labs and thus change the prospects of AI lab researchers. Another risk is that working at an AI lab may “tarnish” your reputation and ability to later work in government or strategy positions (though empirically, many ex-lab employees still end up doing this, and working at a lab also increases your ability to work in such a position in other ways).
Making the lab you work for more powerful might be good, actually. Indirect impact may come via it actually being good for the lab you work for to be more powerful. For example, you might believe that your lab will act sufficiently safely and responsibly with their eventual AGI, shift industry culture to be more pro-safety, do valuable safety work with their powerful models, or advocate for good regulation. This argument necessarily varies considerably across labs, and can’t be true for all labs at once – so be careful applying this argument.
Against
Some very important safety work happens outside of frontier AI labs. For instance, external organizations such as AI SafetyInstitutes, Apollo Research and METR conduct dangerous capability evaluations of frontier models. On top of directly evaluating risk, they shape the public discussion of AI risk significantly, and may have more hard power in future. While this work does happen at frontier labs too, there are both good reasons for it to happen externally, and external organizations provide further manpower on the direction over what might be capable at the labs alone. External organizations are also able to legibly challenge the positions AI labs hold, by for example, suggesting that historic deployment decisions were actually dangerous. Work directly challenging the positions held by AI labs may become more important over time as lab profit incentives to deploy unsafe systems increase. More broadly, the types of research that happen at labs are generally those that are comparatively advantaged to happen at labs (i.e. those that require access to frontier models, compute, and infrastructure – see above). This means there are plenty of types of technical AI safety work that don’t happen at labs and which might be important. The most salient examples are highly theoretical work, such as what ARC currently does or the agent foundations work MIRI used to do. John Wentworth argues a more cynical take here that lab work is uniformly streetlighty, and doesn’t tackle the hard safety problems. See also the 80000 hours job board for further roles outside of frontier labs.
Low neglectedness. While it might well be the case that the work happening at a frontier lab is both important and tractable, it’s possible it’s not all that neglected. Many more people want to work on frontier lab safety teams than there is capacity to hire. This oversupply should not be at all surprising; as discussed above, working at a lab is a high paying, stable-ish and prestigious career path. Supposing you do get an offer, it’s pretty unclear how replaceable you are: the next best hire may (or may not) be all that much worse than you. It currently feels like everyone and their dog wants to work at a frontier lab (and this effect is likely larger outside of our bubble), and that an entire generation of smart, agentic and motivated individuals who care a lot about making AI go well are ending up at the frontier labs. Is this really optimal? On the one hand, it seems a shame that incentive gradients suck everyone into working at the same places, on the same problems, and converging on similar views. See here and here for more extreme versions of this take. On the other hand, I would much rather have AI labs staffed by such people than by status-climbing individuals who care less about the mission.
Low intellectual freedom. Wherever you work, unless you are really quite senior or otherwise given an unusually large amount of freedom over what you work on, you should expect the bulk of your impact to come through accelerating some existing agenda. In Peter Thiel’s language, this is like going from “one to n”. To the extent you believe that such an agenda is good and useful, this is great! But is it the best possible use of your time? Are there places you think people are obviously dropping the ball? If you are comparatively advantaged to work on something that seems comparably important but significantly more neglected, and have a track record (or just sufficient drive) for succeeding in doing your own thing, it may be of higher expected value to consider doing that instead. Even if you don’t have any such ideas, it might be worth considering asking others for advice, taking time to explore, brainstorming, and iterating anyway. Most existing promising AI safety agendas were not born at frontier labs. They were cultivated elsewhere, and eventually imported to labs, once sufficient promise was shown (the most recent such example is AI control, which was pioneered by Redwood Research). There are several good essays online that discuss how ambitious individuals should orient to maximize their chances of doing greatwork; they all emphasise the importance of freedom to work on your own questions, ideas and projects. AI safety might need more novel bets that take us from “zero to one”. Most people will struggle to execute their own highly exploratory and highly risky research bets at labs. Various other places seem better suited for such work; for instance, a PhD offers a large amount of freedom and seems like a uniquely good place to foster the skill of developing research taste, though has other downsides. Some counterarguments to this are that timelines might be short, so you may not have a good idea externally in time for it to matter, and that there are strong personal incentives against this (e.g. see the above career capital section). Finally, “making AI go well” requires so much more than just technical safety work, and may indeed be bottlenecked on some of theseotherproblems, which some (but by no means all) would-be lab researchers seem particularly well placed to carry out. Beyond “ability to do technical AI safety research”, technical AI safety researchers have a number of skills and unique beliefs about the world that might prove useful in pursuing such other routes to impact, via for instance entrepreneurship or policy.
Shifting perspectives. Working at a frontier lab will likely change your views about AI safety in ways that your present self may not endorse. This may happen slowly and sneakily, in a way that you might not locally notice. You should acknowledge and accept that your perspectives may change before you join. I think of this as mostly a negative, but it’s also possible that your views move closer to the truth, if people at the lab hold a more correct view than you do. The exact mechanisms behind how this happens are not clear to me, but may include some of the following causes.
Information environment. Your information environment has a large influence on your views. Your information environment includes what you read and who you talk to every day. To first approximation, you should expect your views to move towards the median of your information environment, unless you are very sure of your views and extremely good at arguing for them. Lab perspectives are likely different to those of the wider AI safety community, the ML community, and the wider world. The median person at a frontier lab may be less scared about future systems than you might be, and more optimistic that we are on track to succeed in building AGI safely. That said, you might not be surrounded by the median lab person, especially if the lab is very large and has a very diffuse culture. Relatedly, there may also be some risk overly deferring to the views of your seniors.
Financial incentives. You are extremely financially correlated with the success of the lab, which might incentivize making risky strategic decisions. This might make it harder to think objectively about risks. I would be especially worried about this if I were a key decision maker behind deployment decisions, and less the further removed I am from such a position. I don’t think being extremely far removed from decision making reduces this risk to zero though. One concern might be that financial incentives shape your worldview, such that future decisions you make (in perhaps a more senior capacity) may differ. For labs where your equity can be publicly traded (e.g. GDM or Meta), this is somewhat less of an issue than at labs where you can only rarely sell your stock options (e.g. Anthropic and OpenAI). If you decide that remaining at the lab is a bad idea and want to leave, you may still have various conflicts of interest (e.g. unsold equity) and constraints in what you can discuss publicly (e.g. via NDAs) even after leaving. Notably, prior to May 2024, OpenAI used financial incentives to get employees to sign non-disparagement agreements upon leaving. Note further that the vesting schedules for equity may incentivise you to stay at a frontier lab longer than you might like if you do decide you want to leave.
It might be hard to influence the lab. A common belief is that by joining a frontier lab and advocating for safety, you might be able to change the lab’s perspectives and prioritisation. While there is some truth to this, this is probably far harder than you think. For instance, in spring 2024, many safety focussed employees (some of which were extremely senior) left OpenAI, after having lost confidence in OpenAI leadership to sufficiently prioritise safety, despite their internal pressures. It may be possible to shift your team’s local perspectives on safety, but you should expect it to be substantially harder to change the views of the organisation as a whole. On the flip side, employees certainly have some power – employee support is why Sam Altman remains the CEO of OpenAI today after the board fiasco of 2023. Relatedly, the lab environment may influence the kinds of work you do in ways you don’t expect: theremay be incentives to produce work that supports lab leadership’s desired “vibe”; their vision for what they want to achieve and communicate – rather than maximally scientifically helpful or impactful work.
Safetywashing. Your work may be used for safetywashing; it may be exploited for PR while either doing nothing to improve safety or even differentially improving capabilities rather than safety. This of course depends quite heavily on what your exact role is. Note too that just because you currently think your work might not have this negative externality, this does not mean it won’t in future. You might be moved to working on projects which are less good on this axis. It might be hard for you to realise this is happening at the time, even harder for you to do something about it, and impossible to predict ahead of time. It might be a good idea to stare in to the abyss often and ask yourself if your work remains good for the world, though it might be stressful having to constantly make this sort of evaluation. How much you should weigh the safetywashing concern might also depend on the degree of trust you put in your labs leadership to make responsible decisions.
Speaking publicly. You might be restricted or otherwise constrained in what you can talk about publicly, especially on topics relating to AI timelines or AI safety. The extent to which this is the case seems to differ wildly across labs. On top of explicit restrictions, you might also be implicitly disincentivized from speaking about or doing things that your colleagues or seniors may disapprove of. For instance, you may think that PauseAI are doing good work, but struggle to publicly support it. If AGI projects become nationalized and lab security increases substantially, there may be greater restrictions on your personal life.
External collaborations. The degree to which you can collaborate with the wider safety community on projects and research might be restricted. This again often depends on role specific details. For instance, the Anthropic interpretability team generally do not talk about non-public research and also generally do not collaborate externally. In contrast, the Anthropic alignment science and GDM interpretability teams engage and collaborate more widely. Uniformly, you should expect your ability to engage in external policy and strategy related projects to be heavily restricted. Though if you are early career, your legibility increases by being at a lab, somewhat counteracting this point.
Bureaucracy. Labs often have a bunch of irritating bureaucracy that makes various things harder. Publishing papers and open sourcing code or models is challenging. There are often pressures or constraints incentivizing employees to use in-house infra, even if it is worse than open source tooling. Internal non-meritocratic politics can sometimes play a role in what work teams are allowed to do: there often exists internal competition between teams over access to resources and ability to ship. Finally, lab legal and comms teams are set up to prevent bad things happening, rather than to make good things happen, which can sometimes slow things down. Downside risk is much more important for large actors than potential upside. Note that many of these points are not unique to frontier labs, but to large organizations in general. The flip-side of this is that bureaucracy also often protects technical contributors from worrying about various forms of legal and financial risk that smaller actors have to worry about more. Being part of a large and stable organization also often ensures various basics are taken care of; individual contributors don’t need to worry about things such as office space, food, IT, etc.
AGI seems inherently risky. AGI will dramatically alter humanity’s trajectory if achieved, for better or for worse. One possible future is one in which AGI causes a large amount of harm and threatens humanities extinction. Each lab working on creating AGI may be shortening timelines and bringing us closer to such a future. The effects of AGI on the world are complex to model and predict, but it feels reasonable to feel bad about working at an organisation building such a technology given uncertainty and plausible downside risk on non-consequentialist deontological grounds, even if your role promotes safety.
Disclaimers
80000 hours have previously discussed some of these considerations. In this piece I discuss many additional considerations. See also their more general considerations of whether it is ever okay to work at a harmful company to do good.
This post is targeted at people considering working on technical AI safety at a frontier lab. Some considerations willgeneralise to people considering other roles at frontier labs, or those considering working on technical AI safety at other organisations.
In an attempt to make this post maximally useful to a wide audience, I do not compare to specific counterfactual options, but encourage readers considering such a role to think through these when reading.
Many of these points have high variance both across labs, and across teams and roles within the same lab.
Many of these points are subtle, and not strict pros or cons. I try to convey such nuance in the writing under each point, and list the points under the heading that most makes sense to me.
Despite using the term “lab” throughout, AI labs are now best thought of as “companies”. They no longer just do research, and profit incentives increasingly play a role in lab strategy.
Reasons for and against working on technical AI safety at a frontier AI lab
I am about to start working on a frontier lab safety team. This post presents a varied set of perspectives that I collected and thought through before accepting my offer. Thanks to the many people I spoke to about this.
For
You’re close to the action. As AI continues to heat up, being closer to the action seems increasingly important. Being at a frontier lab allows you to better understand how frontier AI development actually happens and make better predictions about how it might play out in future. You can build a gears level model of what goes into the design and deployment of current and future frontier systems, and the bureaucratic and political processes behind this, which might inform the kinds of work you decide to do in future (and more broadly, your life choices).
Access to frontier models, compute, and infrastructure. Many kinds of prosaic safety research benefit massively from having direct and elevated access to frontier models and infrastructure to work with them. For instance: Responsible Scaling Policy focussed work that directly evaluates model capabilities and mitigations against specific threat models, model organisms work that builds demonstrations of threat models to serve as a testing ground for safety techniques and scalable oversight work attempting to figure out how to bootstrap and amplify our ability to provide oversight to models in the superhuman regime, to name a few. Other safety agendas might also benefit from access to large amounts of compute and infrastructure: e.g. mechanistic interpretability currently seems to be moving in a more compute-centric direction. Labs are very well resourced in general, and have a large amount of funding that can be somewhat flexibly spent as and when needed (e.g. on contractors, data labellers, etc). Access to non-public models potentially significantly beyond the public state of the art might also generically speed up all work that you do.
Much of the work frontier labs do on empirical technical AI safety is the best in the world. AI safety is talent constrained. There are still not enough people pushing on many of the directions labs work on. By joining, you increase the labs capacity to do such work. If this work is published, this may have a positive impact on safety at all frontier labs. If not, you may still directly contribute to future AGIs built by your lab being safer, either through informing deployment decisions or through research that eventually makes its way into frontier models. The metric of success for lab safety work seems closer to “actually improve safety” than e.g. “publish conference papers”.
Often shorter route to impact. Technical safety work can only have an impact if it either directly or indirectly influences some future important deployed system. The further you are from such a system, the lower your influence might be. For the kinds of work that strive to directly improve safety, if you aren’t at the important lab itself, the causal impact chain must route through people who directly touch the system(s) of importance reading your work, thinking it is good enough to change what they are doing, and then using your ideas. Relatedly, if AGI timelines are short, there is less time for external or earlier stage work to percolate into lab thinking. If you are at the lab, the causal chain becomes much shorter; it is someone in your management line’s job to convince relevant stakeholders that your work is important for improving the safety of the future important deployed system (though note you might not always be able to rely on this mechanism working effectively). That said, plenty of external technical work can also have a large impact. This is often (but not always) through work whose goal is to indirectly influence future systems. I discuss this point in more detail later.
Intellectual environment. Frontier labs generally have a very high saturation of smart, ambitious, talented and experienced people. Having competent collaborators accelerates your work. Mentorship accelerates your development as a technical contributor. More broadly, your intellectual environment really matters, and can make a big difference on both your happiness and outputs. Where you work directly influences who you talk to on a day to day basis, which feeds into feedback on your work, which feeds into your work quality, which feeds into your eventual impact. Labs are not the only place with high densities of people thinking carefully about how to make AI go well, but are one of a select few such places.
Career capital. Working at a frontier lab continues to offer a large amount of career capital. It is among the best ways to gain prosaic AI-specific research and engineering skills. It is arguably even more prestigious and high status now than it used to be, as (general) AI rapidly becomes more and more important in the world. Frontier labs compensate their technical staff extremely well. Besides the obvious benefits, money increases your runway and ability to pursue riskier paths later in life, and capacity to fund progress on top world problems (see GWWC or this advice for giving opportunities in AI safety). If you believe that AGI is only a few years away and will make human intellectual labour obsolete, accruing wealth in advance of that point seems potentially even more important than normal throughout history. The prospects of ex-lab employees are generally strong, and their opinions are respected by a wide range of people. For instance, an OpenAI whistleblower recently testified in front of a senate committee on matters of AI safety, and ex-lab employees (much like ex-FAANG employees) generally have an easy time raising VC funding for startup ventures. On the flip side, there are several career risks to working at a frontier lab worth considering. It seems possible (likely?) that there will be some non-existential AI powered catastrophe in the next few years, and that this may worsen the reputation of AI labs and thus change the prospects of AI lab researchers. Another risk is that working at an AI lab may “tarnish” your reputation and ability to later work in government or strategy positions (though empirically, many ex-lab employees still end up doing this, and working at a lab also increases your ability to work in such a position in other ways).
Making the lab you work for more powerful might be good, actually. Indirect impact may come via it actually being good for the lab you work for to be more powerful. For example, you might believe that your lab will act sufficiently safely and responsibly with their eventual AGI, shift industry culture to be more pro-safety, do valuable safety work with their powerful models, or advocate for good regulation. This argument necessarily varies considerably across labs, and can’t be true for all labs at once – so be careful applying this argument.
Against
Some very important safety work happens outside of frontier AI labs. For instance, external organizations such as AI Safety Institutes, Apollo Research and METR conduct dangerous capability evaluations of frontier models. On top of directly evaluating risk, they shape the public discussion of AI risk significantly, and may have more hard power in future. While this work does happen at frontier labs too, there are both good reasons for it to happen externally, and external organizations provide further manpower on the direction over what might be capable at the labs alone. External organizations are also able to legibly challenge the positions AI labs hold, by for example, suggesting that historic deployment decisions were actually dangerous. Work directly challenging the positions held by AI labs may become more important over time as lab profit incentives to deploy unsafe systems increase. More broadly, the types of research that happen at labs are generally those that are comparatively advantaged to happen at labs (i.e. those that require access to frontier models, compute, and infrastructure – see above). This means there are plenty of types of technical AI safety work that don’t happen at labs and which might be important. The most salient examples are highly theoretical work, such as what ARC currently does or the agent foundations work MIRI used to do. John Wentworth argues a more cynical take here that lab work is uniformly streetlighty, and doesn’t tackle the hard safety problems. See also the 80000 hours job board for further roles outside of frontier labs.
Low neglectedness. While it might well be the case that the work happening at a frontier lab is both important and tractable, it’s possible it’s not all that neglected. Many more people want to work on frontier lab safety teams than there is capacity to hire. This oversupply should not be at all surprising; as discussed above, working at a lab is a high paying, stable-ish and prestigious career path. Supposing you do get an offer, it’s pretty unclear how replaceable you are: the next best hire may (or may not) be all that much worse than you. It currently feels like everyone and their dog wants to work at a frontier lab (and this effect is likely larger outside of our bubble), and that an entire generation of smart, agentic and motivated individuals who care a lot about making AI go well are ending up at the frontier labs. Is this really optimal? On the one hand, it seems a shame that incentive gradients suck everyone into working at the same places, on the same problems, and converging on similar views. See here and here for more extreme versions of this take. On the other hand, I would much rather have AI labs staffed by such people than by status-climbing individuals who care less about the mission.
Low intellectual freedom. Wherever you work, unless you are really quite senior or otherwise given an unusually large amount of freedom over what you work on, you should expect the bulk of your impact to come through accelerating some existing agenda. In Peter Thiel’s language, this is like going from “one to n”. To the extent you believe that such an agenda is good and useful, this is great! But is it the best possible use of your time? Are there places you think people are obviously dropping the ball? If you are comparatively advantaged to work on something that seems comparably important but significantly more neglected, and have a track record (or just sufficient drive) for succeeding in doing your own thing, it may be of higher expected value to consider doing that instead. Even if you don’t have any such ideas, it might be worth considering asking others for advice, taking time to explore, brainstorming, and iterating anyway. Most existing promising AI safety agendas were not born at frontier labs. They were cultivated elsewhere, and eventually imported to labs, once sufficient promise was shown (the most recent such example is AI control, which was pioneered by Redwood Research). There are several good essays online that discuss how ambitious individuals should orient to maximize their chances of doing great work; they all emphasise the importance of freedom to work on your own questions, ideas and projects. AI safety might need more novel bets that take us from “zero to one”. Most people will struggle to execute their own highly exploratory and highly risky research bets at labs. Various other places seem better suited for such work; for instance, a PhD offers a large amount of freedom and seems like a uniquely good place to foster the skill of developing research taste, though has other downsides. Some counterarguments to this are that timelines might be short, so you may not have a good idea externally in time for it to matter, and that there are strong personal incentives against this (e.g. see the above career capital section). Finally, “making AI go well” requires so much more than just technical safety work, and may indeed be bottlenecked on some of these other problems, which some (but by no means all) would-be lab researchers seem particularly well placed to carry out. Beyond “ability to do technical AI safety research”, technical AI safety researchers have a number of skills and unique beliefs about the world that might prove useful in pursuing such other routes to impact, via for instance entrepreneurship or policy.
Shifting perspectives. Working at a frontier lab will likely change your views about AI safety in ways that your present self may not endorse. This may happen slowly and sneakily, in a way that you might not locally notice. You should acknowledge and accept that your perspectives may change before you join. I think of this as mostly a negative, but it’s also possible that your views move closer to the truth, if people at the lab hold a more correct view than you do. The exact mechanisms behind how this happens are not clear to me, but may include some of the following causes.
Information environment. Your information environment has a large influence on your views. Your information environment includes what you read and who you talk to every day. To first approximation, you should expect your views to move towards the median of your information environment, unless you are very sure of your views and extremely good at arguing for them. Lab perspectives are likely different to those of the wider AI safety community, the ML community, and the wider world. The median person at a frontier lab may be less scared about future systems than you might be, and more optimistic that we are on track to succeed in building AGI safely. That said, you might not be surrounded by the median lab person, especially if the lab is very large and has a very diffuse culture. Relatedly, there may also be some risk overly deferring to the views of your seniors.
Financial incentives. You are extremely financially correlated with the success of the lab, which might incentivize making risky strategic decisions. This might make it harder to think objectively about risks. I would be especially worried about this if I were a key decision maker behind deployment decisions, and less the further removed I am from such a position. I don’t think being extremely far removed from decision making reduces this risk to zero though. One concern might be that financial incentives shape your worldview, such that future decisions you make (in perhaps a more senior capacity) may differ. For labs where your equity can be publicly traded (e.g. GDM or Meta), this is somewhat less of an issue than at labs where you can only rarely sell your stock options (e.g. Anthropic and OpenAI). If you decide that remaining at the lab is a bad idea and want to leave, you may still have various conflicts of interest (e.g. unsold equity) and constraints in what you can discuss publicly (e.g. via NDAs) even after leaving. Notably, prior to May 2024, OpenAI used financial incentives to get employees to sign non-disparagement agreements upon leaving. Note further that the vesting schedules for equity may incentivise you to stay at a frontier lab longer than you might like if you do decide you want to leave.
It might be hard to influence the lab. A common belief is that by joining a frontier lab and advocating for safety, you might be able to change the lab’s perspectives and prioritisation. While there is some truth to this, this is probably far harder than you think. For instance, in spring 2024, many safety focussed employees (some of which were extremely senior) left OpenAI, after having lost confidence in OpenAI leadership to sufficiently prioritise safety, despite their internal pressures. It may be possible to shift your team’s local perspectives on safety, but you should expect it to be substantially harder to change the views of the organisation as a whole. On the flip side, employees certainly have some power – employee support is why Sam Altman remains the CEO of OpenAI today after the board fiasco of 2023. Relatedly, the lab environment may influence the kinds of work you do in ways you don’t expect: there may be incentives to produce work that supports lab leadership’s desired “vibe”; their vision for what they want to achieve and communicate – rather than maximally scientifically helpful or impactful work.
Safetywashing. Your work may be used for safetywashing; it may be exploited for PR while either doing nothing to improve safety or even differentially improving capabilities rather than safety. This of course depends quite heavily on what your exact role is. Note too that just because you currently think your work might not have this negative externality, this does not mean it won’t in future. You might be moved to working on projects which are less good on this axis. It might be hard for you to realise this is happening at the time, even harder for you to do something about it, and impossible to predict ahead of time. It might be a good idea to stare in to the abyss often and ask yourself if your work remains good for the world, though it might be stressful having to constantly make this sort of evaluation. How much you should weigh the safetywashing concern might also depend on the degree of trust you put in your labs leadership to make responsible decisions.
Speaking publicly. You might be restricted or otherwise constrained in what you can talk about publicly, especially on topics relating to AI timelines or AI safety. The extent to which this is the case seems to differ wildly across labs. On top of explicit restrictions, you might also be implicitly disincentivized from speaking about or doing things that your colleagues or seniors may disapprove of. For instance, you may think that PauseAI are doing good work, but struggle to publicly support it. If AGI projects become nationalized and lab security increases substantially, there may be greater restrictions on your personal life.
External collaborations. The degree to which you can collaborate with the wider safety community on projects and research might be restricted. This again often depends on role specific details. For instance, the Anthropic interpretability team generally do not talk about non-public research and also generally do not collaborate externally. In contrast, the Anthropic alignment science and GDM interpretability teams engage and collaborate more widely. Uniformly, you should expect your ability to engage in external policy and strategy related projects to be heavily restricted. Though if you are early career, your legibility increases by being at a lab, somewhat counteracting this point.
Bureaucracy. Labs often have a bunch of irritating bureaucracy that makes various things harder. Publishing papers and open sourcing code or models is challenging. There are often pressures or constraints incentivizing employees to use in-house infra, even if it is worse than open source tooling. Internal non-meritocratic politics can sometimes play a role in what work teams are allowed to do: there often exists internal competition between teams over access to resources and ability to ship. Finally, lab legal and comms teams are set up to prevent bad things happening, rather than to make good things happen, which can sometimes slow things down. Downside risk is much more important for large actors than potential upside. Note that many of these points are not unique to frontier labs, but to large organizations in general. The flip-side of this is that bureaucracy also often protects technical contributors from worrying about various forms of legal and financial risk that smaller actors have to worry about more. Being part of a large and stable organization also often ensures various basics are taken care of; individual contributors don’t need to worry about things such as office space, food, IT, etc.
AGI seems inherently risky. AGI will dramatically alter humanity’s trajectory if achieved, for better or for worse. One possible future is one in which AGI causes a large amount of harm and threatens humanities extinction. Each lab working on creating AGI may be shortening timelines and bringing us closer to such a future. The effects of AGI on the world are complex to model and predict, but it feels reasonable to feel bad about working at an organisation building such a technology given uncertainty and plausible downside risk on non-consequentialist deontological grounds, even if your role promotes safety.
Disclaimers
80000 hours have previously discussed some of these considerations. In this piece I discuss many additional considerations. See also their more general considerations of whether it is ever okay to work at a harmful company to do good.
This post is targeted at people considering working on technical AI safety at a frontier lab. Some considerations will generalise to people considering other roles at frontier labs, or those considering working on technical AI safety at other organisations.
In an attempt to make this post maximally useful to a wide audience, I do not compare to specific counterfactual options, but encourage readers considering such a role to think through these when reading.
Many of these points have high variance both across labs, and across teams and roles within the same lab.
Many of these points are subtle, and not strict pros or cons. I try to convey such nuance in the writing under each point, and list the points under the heading that most makes sense to me.
Despite using the term “lab” throughout, AI labs are now best thought of as “companies”. They no longer just do research, and profit incentives increasingly play a role in lab strategy.