I would be interested to hear opinions about what fraction of people could possibly produce useful alignment work?
Ignoring the hurdle of “knowing about AI safety at all”, i.e. assuming they took some time to engage with it (e.g. they took the AGI Safety Fundamentals course). Also assume they got some good mentorship (e.g. from one of you) and then decided to commit full-time (and got funding for that). The thing I’m trying to get at is more about having the mental horsepower + epistemics + creativity + whatever other qualities are useful, or likely being able to get there after some years of training.
Also note that I mean direct useful work, not indirect meta things like outreach or being a PA to a good alignment researcher etc. (these can be super important, but I think it’s productive to think of them as a distinct class). E.g. I would include being a software engineer at Anthropic, but exclude doing grocery-shopping for your favorite alignment researcher.
An answer could look like “X% of the general population” or “half the people who could get a STEM degree at Ivy League schools if they tried” or “a tenth of the people who win the Fields medal”.
I think it’s useful to have a sense of this for many purposes, incl. questions about community growth and the value of outreach in different contexts, as well as priors about one’s own ability to contribute. Hence, I think it’s worth discussing honestly, even though it can obviously be controversial (with some possible answers implying that most current AI safety people are not being useful).
(Off the cuff answer including some random guesses and estimates I won’t stand behind, focused on the kind of theoretical alignment work I’m spending most of my days thinking about right now.)
Over the long run I would guess that alignment is broadly similar to other research areas, where a large/healthy field could support lots of work from lots of people, where some kinds of contributions are very heavy-tailed but there is a lot of complementarity and many researchers are having large overall marginal impacts.
Right now I think difficulties (at least for growing the kind of alignment work I’m most excited about) are mostly related to trying to expand quickly, greatly exacerbated by not having a good idea what’s going on / what we should be trying to do, and not having a straightforward motivating methodology/test case since you are trying to do things in advance motivated by altruistic impact. I’m still optimistic that we will be able to scale up reasonably quickly such that many more people are helpfully engaged in the future and eventually these difficulties will be resolved.
In the very short term, while other bottlenecks are severe, I think it’s mostly a question of how to use complementary resources (like mentorship and discussion) rather than “who could do useful work in an absolute sense.” My vague guess is that in the short term the bar will be kind of quantitatively similar to “would get a tenure-track role at a top university” though obviously our evaluations of the bar will be highly noisy and we are selecting on different properties and at an earlier career stage.
I think it’s much easier for lots of people to work more independently and take swings at the problem and that this could also be quite valuable (though there are lots of valuable things to do). Unfortunately I think that’s a somewhat harder task and there are fewer people who will have a good time doing it. But at least the hard parts depend on a somewhat different set of skills (e.g. more loaded on initiative and entrepreneurial spirit and being able to figure things out on your own) so may cover some people who wouldn’t make sense as early hires, and also there may be a lot of people who would be great hires but where it’s too hard to tell from an application process.
“Possibly produce useful alignment work” is a really low bar, such that the answer is ~100%. Lots of things are possible. I’m going to instead answer “for what fraction of people would I think that the Long-Term Future Fund should fund them on the current margin”.
If you imagine that the people are motivated to work on AI safety, get good mentorship, and are working full-time, then I think on my views most people who could get into an ML PhD in any university would qualify, and a similar number of other people as well (e.g. strong coders who are less good at the random stuff that academia wants). Primarily this is because I think that the mentors have useful ideas that could progress faster with “normal science” work (rather than requiring “paradigm-defining” work).
In practice, there is not that much mentorship to go around, and so the mentors end up spending time with the strongest people from the previous category, and so the weakest people end up not having mentorship and so aren’t worth funding on the current margin.
I’d hope that this changes in the next few years, with the field transitioning from “you can do ‘normal science’ if you are frequently talking to one of the people who have paradigms in their head” to “the paradigms are understandable from the online written material; one can do ‘normal science’ within a paradigm autonomously”.
Hey Rohin, thanks a lot, that’s genuinely super helpful. Drawing analogies to “normal science” seems both reasonable and like it clears the picture up a lot.
Anthropic says that they’re looking for experienced engineers who are able to dive into an unfamiliar codebase and solve nasty bags and/or are able to handle interesting problems with distributed systems and parallel processing. I was personally surprised to get an internship offer from CHAI and expected the bar for getting an AI safety role to be much higher. I’d guess that the average person able to get a software engineering job at Facebook, Microsoft, Google, etc. (not that I’ve ever received an offer from any of those companies), or perhaps a broader category of people, could do useful direct work, especially if they committed time to gaining relevant skills if necessary. But I might be wrong. (This is all assuming that Anthropic, Redwood, CHAI, etc. are doing useful alignment work.)
I would be interested to hear opinions about what fraction of people could possibly produce useful alignment work?
Ignoring the hurdle of “knowing about AI safety at all”, i.e. assuming they took some time to engage with it (e.g. they took the AGI Safety Fundamentals course). Also assume they got some good mentorship (e.g. from one of you) and then decided to commit full-time (and got funding for that). The thing I’m trying to get at is more about having the mental horsepower + epistemics + creativity + whatever other qualities are useful, or likely being able to get there after some years of training.
Also note that I mean direct useful work, not indirect meta things like outreach or being a PA to a good alignment researcher etc. (these can be super important, but I think it’s productive to think of them as a distinct class). E.g. I would include being a software engineer at Anthropic, but exclude doing grocery-shopping for your favorite alignment researcher.
An answer could look like “X% of the general population” or “half the people who could get a STEM degree at Ivy League schools if they tried” or “a tenth of the people who win the Fields medal”.
I think it’s useful to have a sense of this for many purposes, incl. questions about community growth and the value of outreach in different contexts, as well as priors about one’s own ability to contribute. Hence, I think it’s worth discussing honestly, even though it can obviously be controversial (with some possible answers implying that most current AI safety people are not being useful).
(Off the cuff answer including some random guesses and estimates I won’t stand behind, focused on the kind of theoretical alignment work I’m spending most of my days thinking about right now.)
Over the long run I would guess that alignment is broadly similar to other research areas, where a large/healthy field could support lots of work from lots of people, where some kinds of contributions are very heavy-tailed but there is a lot of complementarity and many researchers are having large overall marginal impacts.
Right now I think difficulties (at least for growing the kind of alignment work I’m most excited about) are mostly related to trying to expand quickly, greatly exacerbated by not having a good idea what’s going on / what we should be trying to do, and not having a straightforward motivating methodology/test case since you are trying to do things in advance motivated by altruistic impact. I’m still optimistic that we will be able to scale up reasonably quickly such that many more people are helpfully engaged in the future and eventually these difficulties will be resolved.
In the very short term, while other bottlenecks are severe, I think it’s mostly a question of how to use complementary resources (like mentorship and discussion) rather than “who could do useful work in an absolute sense.” My vague guess is that in the short term the bar will be kind of quantitatively similar to “would get a tenure-track role at a top university” though obviously our evaluations of the bar will be highly noisy and we are selecting on different properties and at an earlier career stage.
I think it’s much easier for lots of people to work more independently and take swings at the problem and that this could also be quite valuable (though there are lots of valuable things to do). Unfortunately I think that’s a somewhat harder task and there are fewer people who will have a good time doing it. But at least the hard parts depend on a somewhat different set of skills (e.g. more loaded on initiative and entrepreneurial spirit and being able to figure things out on your own) so may cover some people who wouldn’t make sense as early hires, and also there may be a lot of people who would be great hires but where it’s too hard to tell from an application process.
Hey Paul, thanks for taking the time to write that up, that’s very helpful!
“Possibly produce useful alignment work” is a really low bar, such that the answer is ~100%. Lots of things are possible. I’m going to instead answer “for what fraction of people would I think that the Long-Term Future Fund should fund them on the current margin”.
If you imagine that the people are motivated to work on AI safety, get good mentorship, and are working full-time, then I think on my views most people who could get into an ML PhD in any university would qualify, and a similar number of other people as well (e.g. strong coders who are less good at the random stuff that academia wants). Primarily this is because I think that the mentors have useful ideas that could progress faster with “normal science” work (rather than requiring “paradigm-defining” work).
In practice, there is not that much mentorship to go around, and so the mentors end up spending time with the strongest people from the previous category, and so the weakest people end up not having mentorship and so aren’t worth funding on the current margin.
I’d hope that this changes in the next few years, with the field transitioning from “you can do ‘normal science’ if you are frequently talking to one of the people who have paradigms in their head” to “the paradigms are understandable from the online written material; one can do ‘normal science’ within a paradigm autonomously”.
Hey Rohin, thanks a lot, that’s genuinely super helpful. Drawing analogies to “normal science” seems both reasonable and like it clears the picture up a lot.
Anthropic says that they’re looking for experienced engineers who are able to dive into an unfamiliar codebase and solve nasty bags and/or are able to handle interesting problems with distributed systems and parallel processing. I was personally surprised to get an internship offer from CHAI and expected the bar for getting an AI safety role to be much higher. I’d guess that the average person able to get a software engineering job at Facebook, Microsoft, Google, etc. (not that I’ve ever received an offer from any of those companies), or perhaps a broader category of people, could do useful direct work, especially if they committed time to gaining relevant skills if necessary. But I might be wrong. (This is all assuming that Anthropic, Redwood, CHAI, etc. are doing useful alignment work.)