I stream-of-consciousness’d this out and I’m not happy with how it turned out, but it’s probably better I post this than delete it for not being polished and eloquent. Can clarify with responses in comments.
Glad you posted this and I’m also interested in hearing what others say. I’ve had these questions for myself in tiny bursts throughout the last few months.
When I get the chance to speak to people earlier in their career stage than myself (starting undergrad, or is a high schooler attending a mathcamp I went to) who are undecided about their careers, I bring up my interest in AI Alignment and why I think it’s important, and share resources for them after the call in case they’re interested in learning more about it. I don’t have very many opportunities like this because I don’t actively seek to identify and “recruit” them. I only bring it up by happenstance (e.g. joining a random discord server for homotopy type theory, seeing an intro by someone who went to the same mathcamp as me and is interested in cogsci, and scheduling a call to talk about my research background in cogsci and how my interests have evolved/led me to alignment over time).
I know very talented people who are around my age at MIT and from a math program I attended; students who are breezing by technical double majors with perfect GPAs, IMO participants, good competitive programmers, etc. Some things that make it hard for me:
If I know them well, I can talk about my research interests and try to get them to see my motivation, but if I’m only catching up with them 1-2x a year, it feels very unnatural and synthetic for me to be spending that time trying to convert them into doing alignment work. If I am still very close to them / talk to them frequently, there’s still an issue of bringing it up naturally and having a chance to convince them. Most of these people are doing Math PhDs, or trading in finance, or working on a startup, or… The point is that they are fresh on their sprint down the path that they have chosen. They are all the type who are very focused and determined to succeed on the goals they have settled on. It is not “easy” to get them (or for this matter, almost any college student) to halt their “exploit” mode, take 10 steps back and lots of time from their busy lives, and then “explore” another option that I’m seemingly imposing onto them. FWIW, the people I know who are in trading seem to be the most likely to switch out (explicitly have told me in conversations that they just enjoy the challenge of the work, but want to find more fulfilling things down the road. And to these people I share ideas and resources about AI Safety.)
I shared resources after the call, talked about why I’m interested in alignment, and that’s the furthest I’ve gone wrt potentially converting someone who is already in a separate career track, to consider alignment.
If it was MUCH easier to convince people that ai alignment is worth thinking about in under an hour, and I could reach out to people to talk to me about this for a hour without looking like a nutjob and potentially damaging our relationship because it seems like I’m just trying to convert them into something else, AND the field of AI Alignment was more naturally compelling for them to join, I’d do much more of this outreach. On that last point, what I mean is: for one moment, let’s suspend the object level importance of solving AI Alignment. In reality, there are things that are incredibly important/attractive for people when pursuing a career. Status, monetary compensation, and recognition (and not being labeled a nutjob) are some big ones. If these things were better (and I think they are getting much better recently), it would be easier to get people to spend more time at least thinking about the possibility of working on AI Alignment, and eventually some would work on it because I don’t think the arguments for x-risk from AI are hard to understand. If I personally didn’t have so much support by way of programs the community had started (SERI, AISC, EA 1-1s, EAG AI Safety researchers making time to talk to me), or it felt like the EA/X-risk group was not at all “prestigious”, I don’t know how engaged I would’ve been in the beginning, when I started my own journey learning about all this. As much as I wish it weren’t true, I would not be surprised at all if the first instinctual thing that led me down this road was noticing that EAs/LW users were intelligent and had a solidly respectable community, before choosing to spend my time engaging with the content (a lot of which was about X-risks).
In reality, there are things that are incredibly important/attractive for people when pursuing a career. Status, monetary compensation, and recognition (and not being labeled a nutjob) are some big ones.
This is imo the biggest factor holding back (people going into) AI safety research by a wide margin. I personally know at least one very talented engineer who would currently be working on AI safety if the pay was anywhere near what they could make working for big tech companies.
Could be! The highest I recall is ~$220k/year at Anthropic, with the stipulation that you must live in the Bay (they offer partial remote). Compared to big tech, there is less career capital/informal status due to reduced name recognition. Additionally, at big tech, there exists the possibility of very lucrative internal promotion.
I sympathise with this view, but I think there’s a huge issue with treating “AI safety research” as a kind of fixed value here. We need more of the right kinds of AI safety research. Not everything labelled “AIS research” qualifies.
Caveats—the below applies:
To roles to the extent that they’re influencing research directions. (so less for engineers than scientists—but still somewhat)
To people who broadly understand the AIS problem.
“AIS research” will look very different depending on whether motivations are primarily intrinsic or extrinsic. The danger with extrinsic motivation is that you increase the odds that the people so motivated will pursue [look like someone doing important work] rather than [do important work]. This is a much bigger problem in areas where we don’t have ground truth feedback, and our estimates are poor, e.g. AI safety.
The caricature is that by incentivizing monetarily we get a load of people who have “AI Safety Research” written on their doors, can write convincing grant proposals and publish papers, but are doing approximately nothing to solve important problems.
It’s not clear to me how far the reality departs from such a caricature.
It’s also unclear to me whether this kind of argument fails for engineers: heading twice as fast in even slightly the wrong direction may be a poor trade.
I think it’s very important to distinguish financial support from financial incentive here. By all means pay people in the bay area $100k or $125k If someone needs >$250k before they’ll turn up (without a tremendously good reason), then I don’t want them in a position to significantly influence research directions.
This is less clear when people filter out non-competitive salary areas before learning about them. However, I’d want to solve this problem in a different way than [pay high salaries so that people learn about the area].
[Edit: I initially thought of this purely tongue in cheek, but maybe there is something here that is worth examining further?]
You have cognitively powerful agents (highly competent researchers) who have incentives (250k+ salaries) to do things that you don’t want them to do (create AGIs that are likely unaligned), and you want them to instead do things that benefit humanity (work on alignment) instead.
It seems to me that offering $100k salaries to work for you instead is not an effective solution to this alignment problem. It relies on the agents being already aligned to the extent that a $150k/yr loss is outweighed by other incentives.
If money were not a tight constraint, it seems to me that offering $250k/yr would be worthwhile even if for no other reason than having them not work on racing to AGI.
The “pay to not race to AGI” would only make sense if there were a smallish pool of replacements ready to step in and work on racing to AGI. This doesn’t seem to be the case. The difference it’d make might not be zero, but close enough to be obviously inefficient.
In particular, there are presumably effective ways to use money to create a greater number of aligned AIS researchers—it’s just that [give people a lot of money to work on it] probably isn’t one of them.
In those terms the point is that paying $250k+ does not align them—it simply hides the problem of their misalignment.
[work on alignment] does not necessarily benefit humanity, even in expectation. That requires [the right kind of work on alignment] - and we don’t have good tests for that. Aiming for the right kind of work is no guarantee you’re doing it—but it beats not aiming for it. (again, argument is admittedly less clear for engineers)
Paying $100k doesn’t solve this alignment problem—it just allows us to see it. We want defection to be obvious. [here I emphasize that I don’t have any problem with people who would work for $100k happening to get $250k, if that seems efficient]
Worth noting that we don’t require a “benefit humanity” motivation—intellectual curiosity will do fine (and I imagine this is already a major motivation for most researchers: how many would be working on the problem if it were painfully dull?). We only require that they’re actually solving the problem. If we knew how to get them to do that for other reasons that’d be fine—but I don’t think money or status are good levers here. (or at the very least, they’re levers with large downsides)
I stream-of-consciousness’d this out and I’m not happy with how it turned out, but it’s probably better I post this than delete it for not being polished and eloquent. Can clarify with responses in comments.
Glad you posted this and I’m also interested in hearing what others say. I’ve had these questions for myself in tiny bursts throughout the last few months.
When I get the chance to speak to people earlier in their career stage than myself (starting undergrad, or is a high schooler attending a mathcamp I went to) who are undecided about their careers, I bring up my interest in AI Alignment and why I think it’s important, and share resources for them after the call in case they’re interested in learning more about it. I don’t have very many opportunities like this because I don’t actively seek to identify and “recruit” them. I only bring it up by happenstance (e.g. joining a random discord server for homotopy type theory, seeing an intro by someone who went to the same mathcamp as me and is interested in cogsci, and scheduling a call to talk about my research background in cogsci and how my interests have evolved/led me to alignment over time).
I know very talented people who are around my age at MIT and from a math program I attended; students who are breezing by technical double majors with perfect GPAs, IMO participants, good competitive programmers, etc. Some things that make it hard for me:
If I know them well, I can talk about my research interests and try to get them to see my motivation, but if I’m only catching up with them 1-2x a year, it feels very unnatural and synthetic for me to be spending that time trying to convert them into doing alignment work. If I am still very close to them / talk to them frequently, there’s still an issue of bringing it up naturally and having a chance to convince them. Most of these people are doing Math PhDs, or trading in finance, or working on a startup, or… The point is that they are fresh on their sprint down the path that they have chosen. They are all the type who are very focused and determined to succeed on the goals they have settled on. It is not “easy” to get them (or for this matter, almost any college student) to halt their “exploit” mode, take 10 steps back and lots of time from their busy lives, and then “explore” another option that I’m seemingly imposing onto them. FWIW, the people I know who are in trading seem to be the most likely to switch out (explicitly have told me in conversations that they just enjoy the challenge of the work, but want to find more fulfilling things down the road. And to these people I share ideas and resources about AI Safety.)
I shared resources after the call, talked about why I’m interested in alignment, and that’s the furthest I’ve gone wrt potentially converting someone who is already in a separate career track, to consider alignment.
If it was MUCH easier to convince people that ai alignment is worth thinking about in under an hour, and I could reach out to people to talk to me about this for a hour without looking like a nutjob and potentially damaging our relationship because it seems like I’m just trying to convert them into something else, AND the field of AI Alignment was more naturally compelling for them to join, I’d do much more of this outreach. On that last point, what I mean is: for one moment, let’s suspend the object level importance of solving AI Alignment. In reality, there are things that are incredibly important/attractive for people when pursuing a career. Status, monetary compensation, and recognition (and not being labeled a nutjob) are some big ones. If these things were better (and I think they are getting much better recently), it would be easier to get people to spend more time at least thinking about the possibility of working on AI Alignment, and eventually some would work on it because I don’t think the arguments for x-risk from AI are hard to understand. If I personally didn’t have so much support by way of programs the community had started (SERI, AISC, EA 1-1s, EAG AI Safety researchers making time to talk to me), or it felt like the EA/X-risk group was not at all “prestigious”, I don’t know how engaged I would’ve been in the beginning, when I started my own journey learning about all this. As much as I wish it weren’t true, I would not be surprised at all if the first instinctual thing that led me down this road was noticing that EAs/LW users were intelligent and had a solidly respectable community, before choosing to spend my time engaging with the content (a lot of which was about X-risks).
This is imo the biggest factor holding back (people going into) AI safety research by a wide margin. I personally know at least one very talented engineer who would currently be working on AI safety if the pay was anywhere near what they could make working for big tech companies.
I heard recently that at least some of the AI safety groups are making offers competitive with major tech companies. (Is this not the case?)
Could be! The highest I recall is ~$220k/year at Anthropic, with the stipulation that you must live in the Bay (they offer partial remote). Compared to big tech, there is less career capital/informal status due to reduced name recognition. Additionally, at big tech, there exists the possibility of very lucrative internal promotion.
See: Have You Tried Hiring People?
I sympathise with this view, but I think there’s a huge issue with treating “AI safety research” as a kind of fixed value here. We need more of the right kinds of AI safety research. Not everything labelled “AIS research” qualifies.
Caveats—the below applies:
To roles to the extent that they’re influencing research directions. (so less for engineers than scientists—but still somewhat)
To people who broadly understand the AIS problem.
“AIS research” will look very different depending on whether motivations are primarily intrinsic or extrinsic. The danger with extrinsic motivation is that you increase the odds that the people so motivated will pursue [look like someone doing important work] rather than [do important work]. This is a much bigger problem in areas where we don’t have ground truth feedback, and our estimates are poor, e.g. AI safety.
The caricature is that by incentivizing monetarily we get a load of people who have “AI Safety Research” written on their doors, can write convincing grant proposals and publish papers, but are doing approximately nothing to solve important problems.
It’s not clear to me how far the reality departs from such a caricature.
It’s also unclear to me whether this kind of argument fails for engineers: heading twice as fast in even slightly the wrong direction may be a poor trade.
I think it’s very important to distinguish financial support from financial incentive here.
By all means pay people in the bay area $100k or $125k
If someone needs >$250k before they’ll turn up (without a tremendously good reason), then I don’t want them in a position to significantly influence research directions.
This is less clear when people filter out non-competitive salary areas before learning about them. However, I’d want to solve this problem in a different way than [pay high salaries so that people learn about the area].
[Edit: I initially thought of this purely tongue in cheek, but maybe there is something here that is worth examining further?]
You have cognitively powerful agents (highly competent researchers) who have incentives (250k+ salaries) to do things that you don’t want them to do (create AGIs that are likely unaligned), and you want them to instead do things that benefit humanity (work on alignment) instead.
It seems to me that offering $100k salaries to work for you instead is not an effective solution to this alignment problem. It relies on the agents being already aligned to the extent that a $150k/yr loss is outweighed by other incentives.
If money were not a tight constraint, it seems to me that offering $250k/yr would be worthwhile even if for no other reason than having them not work on racing to AGI.
The “pay to not race to AGI” would only make sense if there were a smallish pool of replacements ready to step in and work on racing to AGI. This doesn’t seem to be the case. The difference it’d make might not be zero, but close enough to be obviously inefficient.
In particular, there are presumably effective ways to use money to create a greater number of aligned AIS researchers—it’s just that [give people a lot of money to work on it] probably isn’t one of them.
In those terms the point is that paying $250k+ does not align them—it simply hides the problem of their misalignment.
[work on alignment] does not necessarily benefit humanity, even in expectation. That requires [the right kind of work on alignment] - and we don’t have good tests for that. Aiming for the right kind of work is no guarantee you’re doing it—but it beats not aiming for it. (again, argument is admittedly less clear for engineers)
Paying $100k doesn’t solve this alignment problem—it just allows us to see it. We want defection to be obvious. [here I emphasize that I don’t have any problem with people who would work for $100k happening to get $250k, if that seems efficient]
Worth noting that we don’t require a “benefit humanity” motivation—intellectual curiosity will do fine (and I imagine this is already a major motivation for most researchers: how many would be working on the problem if it were painfully dull?).
We only require that they’re actually solving the problem. If we knew how to get them to do that for other reasons that’d be fine—but I don’t think money or status are good levers here. (or at the very least, they’re levers with large downsides)