2023 marks the year AI Safety went mainstream. And though I am happy it is finally getting more attention, and finally has highly talented people who want to work in it; personally, it could not have been worse for my professional life. This isn’t a thing I normally talk about, because it’s a very weird thing to complain about. I rarely permit myself to even complain about it internally. But I can’t stop the nagging sensation that if I had just pivoted to alignment research one year sooner than I did, everything would have been radically easier for me.
I hate saturated industries. I hate hyped-up industries. I hate fields that constantly make the news and gain mainstream attention. This was one of the major reasons why I had to leave the crypto scene, because it had become so saturated with attention, grift, and hype, that I found it completely unbearable. Alignment and AGI was one of those things almost no one even knew about, and fewer even talked about, which made it ideal for me. I was happy with the idea of doing work that might never be appreciated or understood by the rest of the world.
Since 2015, I had planned to get involved, but at the time I had no technical experience or background. So I went to college, majoring in Computer Science. Working on AI and what would later be called “Alignment” was always the plan, though. I remember having a shelf in my college dorm, which I used to represent all my life goals and priorities: AI occupied the absolute top. My mentality, however, was that I needed to establish myself enough, and earn enough money, before I could transition to it. I thought I had all the time in the world.
Eventually, I got frustrated with myself for dragging my feet for so long. So in Fall 2022, I quit my job in cybersecurity, accepted a grant from the Long Term Future Fund, and prepared for spending a year of skilling-up to do alignment research. I felt fulfilled. When my brain normally nagged me about not doing enough, or how I should be working on something more important, I finally felt content. I was finally doing it. I was finally working on the Extremely Neglected, Yet Conveniently Super Important Thing™.
And then ChatGPT came out two months later, and even my mother was talking about AI.
If I believed in fate, I would say it seems as though I was meant to enter AI and Alignment during the early days. I enjoy fields where almost nothing has been figured out. I hate prestige. I embrace the weird, and hate any field that starts worrying about its reputation. I’m not a careerist. I can imagine many alternative worlds where I got in early, maybe ~2012 (I’ve been around the typical lesswrong/rationalist/transhumanist group for my entire adult life). I’d get in, start to figure out the early stuff, identify some of the early assumptions and problems, and then get out once 2022/2023 came around. It’s the weirdest sensation to feel like I’m too old to join the field now, and also feel as though I’ve been part of the field for 10+ years. I’m pretty sure I’m just 1-2 Degrees from literally everyone in the field.
The shock of the field/community going from something almost no one was talking about to something even the friggin’ Pope is weighing in on is something I think I’m still trying to adjust to. Some part of me keeps hoping the bubble will burst, AI will “hit a wall”, marking the second time in history Gary Marcus was right about something, and I’ll feel as though the field will have enough space to operate in again. As it stands now, I don’t really know what place it has for me. It is no longer the Extremely Neglected, Yet Conveniently Super Important Thing™, but instead just the Super Important Thing. When I was briefly running an AI startup (don’t ask), I was getting 300+ applicants for each role we were hiring in. We never once advertised the roles, but they somehow found them anyway, and applied in swarms. Whenever I get a rejection email from an AI Safety org, I’m usually told they receive somewhere in the range of 400-700 applications for every given job. That’s, at best, a 0.25% chance of acceptance: substantially lower than Harvard. It becomes difficult for me to answer why I’m still trying to get into such an incredibly competitive field, when literally doing anything else would be easier. “It’s super important” is not exactly making sense as a defense at this point, since there are obviously other talented people who would get the job if I didn’t.
I think it’s that I could start to see the shape of what I could have had, and what I could have been. It’s vanity. Part of me really loved the idea of working on the Extremely Neglected, Yet Conveniently Super Important Thing™. And now I have a hard time going back to working on literally anything else, because anything else could never hope to be remotely as important. And at the same time, despite the huge amount of new interest in alignment, and huge number of new talent interested in contributing to it, somehow the field still feels undersaturated. In a market-driven field, we would see more jobs and roles growing as the overall interest in working in the field did, since interest normally correlates with growth in consumers/investors/etc. Except we’re not seeing that. Despite everything, by most measurements, there seems to still be fewer than 1000 people working on it fulltime, maybe as low as ~300, depending on what you count.
So I oscillate between thinking I should just move on to other things; and thinking I absolutely should be working on this at all cost. It’s made worse by sometimes briefly doing temp work for an underfunded org, sometimes getting to the final interview stage for big labs, and overall thinking that doing the Super Important Thing™ is just around the corner… and for all I know, it might be. It’s really hard for me to tell if this is a situation where it’s smart for me to be persistent in, or if being persistent is dragging me ever-closer to permanent unemployment, endless poverty/homelessness/whatever-my-brain-is-feeling-paranoid-about… which isn’t made easier by the fact that, if the AI train does keep going, my previous jobs in software engineering and cybersecurity will probably not be coming back.
Not totally sure what I’m trying to get out of writing this. Maybe someone has advice about what I should be doing next. Or maybe, after a year of my brain nagging me each day about how I should have gotten involved in the field sooner, I just wanted to admit that: despite wanting the world to be saved, despite wanting more people to be working on the Extremely Neglected, Yet Conveniently Super Important Thing™, some selfish, not-too-bright, vain part of me is thinking “Oh, great. More competition.”
One small thought is that things end up feeling extremely neglected once you index on particular subquestions. Like, on a high-level, it is indeed the case that AI safety has gotten more mainstream.
But when you zoom in, there are a lot of very important topics that have <5 people seriously working on them. I work in AI policy, so I’m more familiar with the policy/governance ones but I imagine this is also true in technical (also, maybe consider swapping to governance/policy!)
Also, especially in hype waves, I think a lot of people end up just working on the popular thing. If you’re willing to deviate from the popular thing, you can often find important parts of the problem that nearly no one is addressing.
Thanks for writing this, I think this is a common and pretty rough experience.
Have you considered doing cybersecurity work related to AI safety? i.e. work would help prevent bad actors stealing model weights and AIs themselves escaping. I think this kind of work would likely be more useful than most alignment work.
I think this kind of work is probably extremely useful, and somewhat neglected, it especially seems to be missing people who know about cybersecurity and care about AGI/alignment.
Working on this seems good insofar as greater control implies more options. With good security, it’s still possible to opt in to whatever weight-sharing / transparency mechanisms seem net positive—including with adversaries. Without security there’s no option.
Granted, the [more options are likely better] conclusion is clearer if we condition on wise strategy. However, [we have great security, therefore we’re sharing nothing with adversaries] is clearly not a valid inference in general.
Not necessarily. If we have the option to hide information, then even if we reveal information, adversaries may still assume (likely correctly) we aren’t sharing all our information, and are closer to a decisive strategic advantage than we appear. Even in the case where we do share all our information (which we won’t).
Of course the more options are likely better option holds if the lumbering, slow, disorganized, and collectively stupid organizations which have those options somehow perform the best strategy, but they’re not actually going to take the best strategy. Especially when it comes to US-China relations.
ETA:
[we have great security, therefore we’re sharing nothing with adversaries] is clearly not a valid inference in general.
I don’t think the conclusion holds if that is true in general, and I don’t think I ever assumed or argued it was true in general.
then even if we reveal information, adversaries may still assume (likely correctly) we aren’t sharing all our information
I think the same reasoning applies if they hack us: they’ll assume that the stuff they were able to hack was the part we left suspiciously vulnerable, and the really important information is behind more serious security.
I expect they’ll assume we’re in control either way—once the stakes are really high. It seems preferable to actually be in control.
I’ll grant that it’s far from clear that the best strategy would be used.
(apologies if I misinterpreted your assumptions in my previous reply)
One option would be to find a role in AI more generally that allows you to further develop your skills whilst also not accelerating capabilities.
Another alternative: I suspect that more people should consider working at a normal job three or four days per week and doing AI Safety things on the side one or two days.
I suspect that working on capabilities (edit: preferably applications rather than building AGI) in some non-maximally-harmful position is actually the best choice for most junior x-risk concerned people who want to do something technical. Safety is just too crowded and still not very tractable.
Since many people seem to disagree, I’m going to share some reasons why I believe this:
AGI that poses serious existential risks seems at least 6 years away, and safety work seems much more valuable at crunch time, such that I think more than half of most peoples’ impact will be more than 5 years away. So skilling up quickly should be the primary concern, until your timelines shorten.
Mentorship for safety is still limited. If you can get an industry safety job or get into MATS, this seems better than some random AI job, but most people can’t.
I think there are also many policy roles that are not crowded and potentially time-critical, which people should also take if available.
Funding is also limited in the current environment. I think most people cannot get funding to work on alignment if they tried? This is fairly cruxy and I’m not sure of it, so someone should correct me if I’m wrong.
The relative impact of working on capabilities is smaller than working on alignment—there are still 10x as many people doing capabilities as alignment, so unless returns don’t diminish or you are doing something unusually harmful, you can work for 1 year on capabilities and 1 year on alignment and only reduce your impact 10%.
However, it seems bad to work at places that have particularly bad safety cultures and are explicitly trying to create AGI, possibly including OpenAI.
Safety could get even more crowded, which would make upskilling to work on safety net negative. This should be a significant concern, but I think most people can skill up faster than this.
Skills useful in capabilities are useful for alignment, and if you’re careful about what job you take there isn’t much more skill penalty in transferring them than, say, switching from vision model research to language model research.
Capabilities often has better feedback loops than alignment because you can see whether the thing works or not. Many prosaic alignment directions also have this property. Interpretability is getting there, but not quite. Other areas, especially in agent foundations, are significantly worse.
These are pretty sane takes (conditional on my model of Thomas Kwa of course), and I don’t understand why people have downvoted this comment. Here’s an attempt to unravel my thoughts and potential disagreements with your claims.
AGI that poses serious existential risks seems at least 6 years away, and safety work seems much more valuable at crunch time, such that I think more than half of most peoples’ impact will be more than 5 years away.
I think safety work gets less and less valuable at crunch time actually. I think you have this Paul Christiano-like model of getting a prototypical AGI and dissecting it and figuring out how it works—I think it is unlikely that any individual frontier lab would perceive itself to have the slack to do so. Any potential “dissection” tools will need to be developed beforehand, such as scalable interpretability tools (SAEs seem like rudimentary examples of this). The problem with “prosaic alignment” IMO is that a lot of this relies on a significant amount of schlep—a lot of empirical work, a lot of fucking around. That’s probably why, according to the MATS team, frontier labs have a high demand for “iterators”—their strategy involves having a lot of ideas about stuff that might work, and without a theoretical framework underlying their search path, a lot of things they do would look like trying things out.
I expect that once you get AI researcher level systems, the die is cast. Whatever prosaic alignment and control measures you’ve figured out, you’ll now be using that in an attempt to play this breakneck game of getting useful work out of a potentially misaligned AI ecosystem, that would also be modifying itself to improve its capabilities (because that is the point of AI researchers). (Sure, its easier to test for capability improvements. That doesn’t mean you can’t transfer information embedded into these proposals such that modified models will be modified in ways the humans did not anticipate or would not want if they had a full understanding of what is going on.)
Mentorship for safety is still limited. If you can get an industry safety job or get into MATS, this seems better than some random AI job, but most people can’t.
Yeah—I think most “random AI jobs” are significantly worse for trying to do useful work in comparison to just doing things by yourself or with some other independent ML researchers. If you aren’t in a position to do this, however, it does make sense to optimize for a convenient low-cognitive-effort set of tasks that provides you the social, financial and/or structural support that will benefit you, and perhaps look into AI safety stuff as a hobby.
I agree that mentorship is a fundamental bottleneck to building mature alignment researchers. This is unfortunate, but it is the reality we have.
Funding is also limited in the current environment. I think most people cannot get funding to work on alignment if they tried? This is fairly cruxy and I’m not sure of it, so someone should correct me if I’m wrong.
Yeah, post-FTX, I believe that funding is limited enough that you have to be consciously optimizing for getting funding (as an EA-affiliated organization, or as an independent alignment researcher). Particularly for new conceptual alignment researchers, I expect that funding is drastically limited since funding organizations seem to explicitly prioritize funding grantees who will work on OpenPhil-endorsed (or to a certain extent, existing but not necessarily OpenPhil-endorsed) agendas. This includes stuff like evals.
The relative impact of working on capabilities is smaller than working on alignment—there are still 10x as many people doing capabilities as alignment, so unless returns don’t diminish or you are doing something unusually harmful, you can work for 1 year on capabilities and 1 year on alignment and gain 10x.
This is a very Paul Christiano-like argument—yeah sure the math makes sense, but I feel averse to agreeing with this because it seems like you may be abstracting away significant parts of reality and throwing away valuable information we already have.
Anyway, yeah I agree with your sentiment. It seems fine to work on non-SOTA AI / ML / LLM stuff and I’d want people to do so such that they live a good life. I’d rather they didn’t throw themselves into the gauntlet of “AI safety” and get chewed up and spit out by an incompetent ecosystem.
Safety could get even more crowded, which would make upskilling to work on safety net negative. This should be a significant concern, but I think most people can skill up faster than this.
I still don’t understand what causal model would produce this prediction. Here’s mine: One big limiting factor to the amount of safety researchers the current SOTA lab ecosystem can handle is bottlenecked by their expectations for how many researchers they want or need. On one hand, more schlep during pre-AI-researcher-era means more hires. On the other hand, more hires requires more research managers or managerial experience. Anecdotally, it seems like many AI capabilities and alignment organizations (both in the EA space and in the frontier lab space) seemed to have been historically bottlenecked on management capacity. Additionally, hiring has a cost (both the search process and the onboarding), and it is likely that as labs get closer to creating AI researchers, they’d believe that the opportunity cost of hiring continues to increase.
Skills useful in capabilities are useful for alignment, and if you’re careful about what job you take there isn’t much more skill penalty in transferring them than, say, switching from vision model research to language model research.
Nah, I found very little stuff from my vision model research work (during my undergrad) contributed to my skill and intuition related to language model research work (again during my undergrad, both around 2021-2022). I mean, specific skills of programming and using PyTorch and debugging model issues and data processing and containerization—sure, but the opportunity cost is ridiculous when you could be actually working with LLMs directly and reading papers relevant to the game you want to play. High quality cognitive work is extremely valuable and spending it on irrelevant things like the specifics of diffusion models (for example) seems quite wasteful unless you really think this stuff is relevant.
Capabilities often has better feedback loops than alignment because you can see whether the thing works or not. Many prosaic alignment directions also have this property. Interpretability is getting there, but not quite. Other areas, especially in agent foundations, are significantly worse.
Yeah this makes sense for extreme newcomers. If someone can get a capabilities job, however, I think they are doing themselves a disservice by playing the easier game of capabilities work. Yes, you have better feedback loops than alignment research / implementation work. That’s like saying “Search for your keys under the streetlight because that’s where you can see the ground most clearly.” I’d want these people to start building the epistemological skills to thrive even with a lower intensity of feedback loops such that they can do alignment research work effectively.
And the best way to do that is to actually attempt to do alignment research, if you are in a position to do so.
I think safety work gets less and less valuable at crunch time actually.
[...]
Whatever prosaic alignment and control measures you’ve figured out, you’ll now be using that in an attempt to play this breakneck game of getting useful work out of a potentially misaligned AI ecosystem
Sure, but you have to actually implement these alignment/control methods at some point? And likely these can’t be (fully) implemented far in advance. I usually use the term “crunch time” in a way which includes the period where you scramble to implement in anticipation of the powerful AI.
One (oversimplified) model is that there are two trends:
Implementation and research on alignment/control methods becomes easier because of AIs (as test subjects).
AIs automate away work on alignment/control.
Eventually, the second trend implies that safety work is less valuable, but probably safety work has already massively gone up in value by this point.
(Also, note that the default way of automating safety work will involve large amounts of human labor for supervision. Either due to issues with AIs or because of lack of trust in these AIs systems (e.g. human labor is needed for a control scheme.)
My biggest reason for disagreeing (though there are others) is thinking that people often underestimate the effects that your immediate cultural environment has on your beliefs over time. I don’t think humans have the kind of robust epistemics necessary to fully combat changes in priors from prolonged exposure to something (for example, I know someone who was negative on ads, joined Google and found they were becoming more positive on it without coming from or leading to any object-level changes in their views, and back after they left.)
Application seems better than other kinds of capabilities. I was thinking of capabilities as everything other than alignment work, so inclusive of application.
I got into AI at the worst time possible
2023 marks the year AI Safety went mainstream. And though I am happy it is finally getting more attention, and finally has highly talented people who want to work in it; personally, it could not have been worse for my professional life. This isn’t a thing I normally talk about, because it’s a very weird thing to complain about. I rarely permit myself to even complain about it internally. But I can’t stop the nagging sensation that if I had just pivoted to alignment research one year sooner than I did, everything would have been radically easier for me.
I hate saturated industries. I hate hyped-up industries. I hate fields that constantly make the news and gain mainstream attention. This was one of the major reasons why I had to leave the crypto scene, because it had become so saturated with attention, grift, and hype, that I found it completely unbearable. Alignment and AGI was one of those things almost no one even knew about, and fewer even talked about, which made it ideal for me. I was happy with the idea of doing work that might never be appreciated or understood by the rest of the world.
Since 2015, I had planned to get involved, but at the time I had no technical experience or background. So I went to college, majoring in Computer Science. Working on AI and what would later be called “Alignment” was always the plan, though. I remember having a shelf in my college dorm, which I used to represent all my life goals and priorities: AI occupied the absolute top. My mentality, however, was that I needed to establish myself enough, and earn enough money, before I could transition to it. I thought I had all the time in the world.
Eventually, I got frustrated with myself for dragging my feet for so long. So in Fall 2022, I quit my job in cybersecurity, accepted a grant from the Long Term Future Fund, and prepared for spending a year of skilling-up to do alignment research. I felt fulfilled. When my brain normally nagged me about not doing enough, or how I should be working on something more important, I finally felt content. I was finally doing it. I was finally working on the Extremely Neglected, Yet Conveniently Super Important Thing™.
And then ChatGPT came out two months later, and even my mother was talking about AI.
If I believed in fate, I would say it seems as though I was meant to enter AI and Alignment during the early days. I enjoy fields where almost nothing has been figured out. I hate prestige. I embrace the weird, and hate any field that starts worrying about its reputation. I’m not a careerist. I can imagine many alternative worlds where I got in early, maybe ~2012 (I’ve been around the typical lesswrong/rationalist/transhumanist group for my entire adult life). I’d get in, start to figure out the early stuff, identify some of the early assumptions and problems, and then get out once 2022/2023 came around. It’s the weirdest sensation to feel like I’m too old to join the field now, and also feel as though I’ve been part of the field for 10+ years. I’m pretty sure I’m just 1-2 Degrees from literally everyone in the field.
The shock of the field/community going from something almost no one was talking about to something even the friggin’ Pope is weighing in on is something I think I’m still trying to adjust to. Some part of me keeps hoping the bubble will burst, AI will “hit a wall”, marking the second time in history Gary Marcus was right about something, and I’ll feel as though the field will have enough space to operate in again. As it stands now, I don’t really know what place it has for me. It is no longer the Extremely Neglected, Yet Conveniently Super Important Thing™, but instead just the Super Important Thing. When I was briefly running an AI startup (don’t ask), I was getting 300+ applicants for each role we were hiring in. We never once advertised the roles, but they somehow found them anyway, and applied in swarms. Whenever I get a rejection email from an AI Safety org, I’m usually told they receive somewhere in the range of 400-700 applications for every given job. That’s, at best, a 0.25% chance of acceptance: substantially lower than Harvard. It becomes difficult for me to answer why I’m still trying to get into such an incredibly competitive field, when literally doing anything else would be easier. “It’s super important” is not exactly making sense as a defense at this point, since there are obviously other talented people who would get the job if I didn’t.
I think it’s that I could start to see the shape of what I could have had, and what I could have been. It’s vanity. Part of me really loved the idea of working on the Extremely Neglected, Yet Conveniently Super Important Thing™. And now I have a hard time going back to working on literally anything else, because anything else could never hope to be remotely as important. And at the same time, despite the huge amount of new interest in alignment, and huge number of new talent interested in contributing to it, somehow the field still feels undersaturated. In a market-driven field, we would see more jobs and roles growing as the overall interest in working in the field did, since interest normally correlates with growth in consumers/investors/etc. Except we’re not seeing that. Despite everything, by most measurements, there seems to still be fewer than 1000 people working on it fulltime, maybe as low as ~300, depending on what you count.
So I oscillate between thinking I should just move on to other things; and thinking I absolutely should be working on this at all cost. It’s made worse by sometimes briefly doing temp work for an underfunded org, sometimes getting to the final interview stage for big labs, and overall thinking that doing the Super Important Thing™ is just around the corner… and for all I know, it might be. It’s really hard for me to tell if this is a situation where it’s smart for me to be persistent in, or if being persistent is dragging me ever-closer to permanent unemployment, endless poverty/homelessness/whatever-my-brain-is-feeling-paranoid-about… which isn’t made easier by the fact that, if the AI train does keep going, my previous jobs in software engineering and cybersecurity will probably not be coming back.
Not totally sure what I’m trying to get out of writing this. Maybe someone has advice about what I should be doing next. Or maybe, after a year of my brain nagging me each day about how I should have gotten involved in the field sooner, I just wanted to admit that: despite wanting the world to be saved, despite wanting more people to be working on the Extremely Neglected, Yet Conveniently Super Important Thing™, some selfish, not-too-bright, vain part of me is thinking “Oh, great. More competition.”
Thanks for sharing your experience here.
One small thought is that things end up feeling extremely neglected once you index on particular subquestions. Like, on a high-level, it is indeed the case that AI safety has gotten more mainstream.
But when you zoom in, there are a lot of very important topics that have <5 people seriously working on them. I work in AI policy, so I’m more familiar with the policy/governance ones but I imagine this is also true in technical (also, maybe consider swapping to governance/policy!)
Also, especially in hype waves, I think a lot of people end up just working on the popular thing. If you’re willing to deviate from the popular thing, you can often find important parts of the problem that nearly no one is addressing.
Thanks for writing this, I think this is a common and pretty rough experience.
Have you considered doing cybersecurity work related to AI safety? i.e. work would help prevent bad actors stealing model weights and AIs themselves escaping. I think this kind of work would likely be more useful than most alignment work.
I’d recommend reading Holden Karnofsky’s takes, as well as the recent huge RAND report on securing model weights. Redwood’s control agenda might also be relevant.
I think this kind of work is probably extremely useful, and somewhat neglected, it especially seems to be missing people who know about cybersecurity and care about AGI/alignment.
I note that I am uncertain whether working on such a task would increase or decrease global stability & great power conflicts.
Working on this seems good insofar as greater control implies more options. With good security, it’s still possible to opt in to whatever weight-sharing / transparency mechanisms seem net positive—including with adversaries. Without security there’s no option.
Granted, the [more options are likely better] conclusion is clearer if we condition on wise strategy.
However, [we have great security, therefore we’re sharing nothing with adversaries] is clearly not a valid inference in general.
Not necessarily. If we have the option to hide information, then even if we reveal information, adversaries may still assume (likely correctly) we aren’t sharing all our information, and are closer to a decisive strategic advantage than we appear. Even in the case where we do share all our information (which we won’t).
Of course the more options are likely better option holds if the lumbering, slow, disorganized, and collectively stupid organizations which have those options somehow perform the best strategy, but they’re not actually going to take the best strategy. Especially when it comes to US-China relations.
ETA:
I don’t think the conclusion holds if that is true in general, and I don’t think I ever assumed or argued it was true in general.
I think the same reasoning applies if they hack us: they’ll assume that the stuff they were able to hack was the part we left suspiciously vulnerable, and the really important information is behind more serious security.
I expect they’ll assume we’re in control either way—once the stakes are really high.
It seems preferable to actually be in control.
I’ll grant that it’s far from clear that the best strategy would be used.
(apologies if I misinterpreted your assumptions in my previous reply)
One option would be to find a role in AI more generally that allows you to further develop your skills whilst also not accelerating capabilities.
Another alternative: I suspect that more people should consider working at a normal job three or four days per week and doing AI Safety things on the side one or two days.
I suspect that working on capabilities (edit: preferably applications rather than building AGI) in some non-maximally-harmful position is actually the best choice for most junior x-risk concerned people who want to do something technical. Safety is just too crowded and still not very tractable.
Since many people seem to disagree, I’m going to share some reasons why I believe this:
AGI that poses serious existential risks seems at least 6 years away, and safety work seems much more valuable at crunch time, such that I think more than half of most peoples’ impact will be more than 5 years away. So skilling up quickly should be the primary concern, until your timelines shorten.
Mentorship for safety is still limited. If you can get an industry safety job or get into MATS, this seems better than some random AI job, but most people can’t.
I think there are also many policy roles that are not crowded and potentially time-critical, which people should also take if available.
Funding is also limited in the current environment. I think most people cannot get funding to work on alignment if they tried? This is fairly cruxy and I’m not sure of it, so someone should correct me if I’m wrong.
The relative impact of working on capabilities is smaller than working on alignment—there are still 10x as many people doing capabilities as alignment, so unless returns don’t diminish or you are doing something unusually harmful, you can work for 1 year on capabilities and 1 year on alignment and only reduce your impact 10%.
However, it seems bad to work at places that have particularly bad safety cultures and are explicitly trying to create AGI, possibly including OpenAI.
Safety could get even more crowded, which would make upskilling to work on safety net negative. This should be a significant concern, but I think most people can skill up faster than this.
Skills useful in capabilities are useful for alignment, and if you’re careful about what job you take there isn’t much more skill penalty in transferring them than, say, switching from vision model research to language model research.
Capabilities often has better feedback loops than alignment because you can see whether the thing works or not. Many prosaic alignment directions also have this property. Interpretability is getting there, but not quite. Other areas, especially in agent foundations, are significantly worse.
These are pretty sane takes (conditional on my model of Thomas Kwa of course), and I don’t understand why people have downvoted this comment. Here’s an attempt to unravel my thoughts and potential disagreements with your claims.
I think safety work gets less and less valuable at crunch time actually. I think you have this Paul Christiano-like model of getting a prototypical AGI and dissecting it and figuring out how it works—I think it is unlikely that any individual frontier lab would perceive itself to have the slack to do so. Any potential “dissection” tools will need to be developed beforehand, such as scalable interpretability tools (SAEs seem like rudimentary examples of this). The problem with “prosaic alignment” IMO is that a lot of this relies on a significant amount of schlep—a lot of empirical work, a lot of fucking around. That’s probably why, according to the MATS team, frontier labs have a high demand for “iterators”—their strategy involves having a lot of ideas about stuff that might work, and without a theoretical framework underlying their search path, a lot of things they do would look like trying things out.
I expect that once you get AI researcher level systems, the die is cast. Whatever prosaic alignment and control measures you’ve figured out, you’ll now be using that in an attempt to play this breakneck game of getting useful work out of a potentially misaligned AI ecosystem, that would also be modifying itself to improve its capabilities (because that is the point of AI researchers). (Sure, its easier to test for capability improvements. That doesn’t mean you can’t transfer information embedded into these proposals such that modified models will be modified in ways the humans did not anticipate or would not want if they had a full understanding of what is going on.)
Yeah—I think most “random AI jobs” are significantly worse for trying to do useful work in comparison to just doing things by yourself or with some other independent ML researchers. If you aren’t in a position to do this, however, it does make sense to optimize for a convenient low-cognitive-effort set of tasks that provides you the social, financial and/or structural support that will benefit you, and perhaps look into AI safety stuff as a hobby.
I agree that mentorship is a fundamental bottleneck to building mature alignment researchers. This is unfortunate, but it is the reality we have.
Yeah, post-FTX, I believe that funding is limited enough that you have to be consciously optimizing for getting funding (as an EA-affiliated organization, or as an independent alignment researcher). Particularly for new conceptual alignment researchers, I expect that funding is drastically limited since funding organizations seem to explicitly prioritize funding grantees who will work on OpenPhil-endorsed (or to a certain extent, existing but not necessarily OpenPhil-endorsed) agendas. This includes stuff like evals.
This is a very Paul Christiano-like argument—yeah sure the math makes sense, but I feel averse to agreeing with this because it seems like you may be abstracting away significant parts of reality and throwing away valuable information we already have.
Anyway, yeah I agree with your sentiment. It seems fine to work on non-SOTA AI / ML / LLM stuff and I’d want people to do so such that they live a good life. I’d rather they didn’t throw themselves into the gauntlet of “AI safety” and get chewed up and spit out by an incompetent ecosystem.
I still don’t understand what causal model would produce this prediction. Here’s mine: One big limiting factor to the amount of safety researchers the current SOTA lab ecosystem can handle is bottlenecked by their expectations for how many researchers they want or need. On one hand, more schlep during pre-AI-researcher-era means more hires. On the other hand, more hires requires more research managers or managerial experience. Anecdotally, it seems like many AI capabilities and alignment organizations (both in the EA space and in the frontier lab space) seemed to have been historically bottlenecked on management capacity. Additionally, hiring has a cost (both the search process and the onboarding), and it is likely that as labs get closer to creating AI researchers, they’d believe that the opportunity cost of hiring continues to increase.
Nah, I found very little stuff from my vision model research work (during my undergrad) contributed to my skill and intuition related to language model research work (again during my undergrad, both around 2021-2022). I mean, specific skills of programming and using PyTorch and debugging model issues and data processing and containerization—sure, but the opportunity cost is ridiculous when you could be actually working with LLMs directly and reading papers relevant to the game you want to play. High quality cognitive work is extremely valuable and spending it on irrelevant things like the specifics of diffusion models (for example) seems quite wasteful unless you really think this stuff is relevant.
Yeah this makes sense for extreme newcomers. If someone can get a capabilities job, however, I think they are doing themselves a disservice by playing the easier game of capabilities work. Yes, you have better feedback loops than alignment research / implementation work. That’s like saying “Search for your keys under the streetlight because that’s where you can see the ground most clearly.” I’d want these people to start building the epistemological skills to thrive even with a lower intensity of feedback loops such that they can do alignment research work effectively.
And the best way to do that is to actually attempt to do alignment research, if you are in a position to do so.
Sure, but you have to actually implement these alignment/control methods at some point? And likely these can’t be (fully) implemented far in advance. I usually use the term “crunch time” in a way which includes the period where you scramble to implement in anticipation of the powerful AI.
One (oversimplified) model is that there are two trends:
Implementation and research on alignment/control methods becomes easier because of AIs (as test subjects).
AIs automate away work on alignment/control.
Eventually, the second trend implies that safety work is less valuable, but probably safety work has already massively gone up in value by this point.
(Also, note that the default way of automating safety work will involve large amounts of human labor for supervision. Either due to issues with AIs or because of lack of trust in these AIs systems (e.g. human labor is needed for a control scheme.)
My biggest reason for disagreeing (though there are others) is thinking that people often underestimate the effects that your immediate cultural environment has on your beliefs over time. I don’t think humans have the kind of robust epistemics necessary to fully combat changes in priors from prolonged exposure to something (for example, I know someone who was negative on ads, joined Google and found they were becoming more positive on it without coming from or leading to any object-level changes in their views, and back after they left.)
Why not work on a role related to application though?
Application seems better than other kinds of capabilities. I was thinking of capabilities as everything other than alignment work, so inclusive of application.
I would suggest that it’s important to make this distinction to avoid sending people off down the wrong path.
SAME