I’m not sure which of the people “have ties to dangerous organizations such as Anthropic” in the post (besides Shauna Kravec & Nova DasSarma, who work at Anthropic), but of the current fund managers, I suspect that I have the most direct ties to Anthropic and OAI through my work at ARC Evals. I also have done a plurality of grant evaluations in AI Safety in the last month. So I think I should respond to this comment with my thoughts.
I personally empathize significantly with the concerns raised by Linch and Oli. In fact, when I was debating joining Evals last November, my main reservations centered around direct capabilities externalities and safety washing.
I will say the following facts about AI Safety advancing capabilities:
Empirically, when we look at previous capability advancements produced by people working in the name of “AI Safety” from this community, the overwhelming majority were produced by people who were directly aiming to improve capabilities.
That is, they were not capability externalities from safety research, so much as direct capabilities work.
E.g, it definitely was not the case that GPT-3 was a side effect of alignment research, and OAI and Anthropic are both orgs who explicitly focus on scaling and keeping at the frontier of AI development.
I think the sole exception are a few people who started doing applied RLHF research. Yeah, I think the people who made LLMs commercially viable via did not do a good thing. My main uncertainty is what exactly happened here and how much we contribute to this on the margin.
I generally think that research is significantly more useful when it is targeted (this is a very common view in the community as well). I’m not sure what the exact multiplier is, but I think targeted, non-foundational research is probably 10x more effective than incidentally related research. So the net impact of safety research on capabilities via externalities is probably significantly smaller than the impact of safety research on safety research, or the impact of targeted capabilities research on capabilities research.
I think this point is often overstated or overrated, but the scale of capabilities researchers at this point is really big, and it’s easy to overestimate the impact of one or two particular high profile people.
For what it’s worth, I think that if we are to actually produce good independent alignment research, we need to fund it, and LTFF is basically the only funder in this space. My current guess is a lack of LTFF funding is probably producing more researchers at Anthropic than otherwise, because there just that aren’t many opportunities for people to work on safety or safety-adjacent roles. E.g. I know of people who are interviewing for Anthropic capability teams because idk man, they just want a safety-adjacent job with a minimal amount of security, and it’s what’s available. Having spoken to a bunch of people, I strongly suspect that of the people that I’d want to fund but won’t be funded, at least a good fraction are significantly less likely to join a scaling lab if they were funded, and not more.
(Another possibly helpful datapoint here is that I received an offer from Anthropic last december, and I turned them down.)
My current guess is a lack of LTFF funding is probably producing more researchers at Anthropic than otherwise, because there just that aren’t many opportunities for people to work on safety or safety-adjacent roles. E.g. I know of people who are interviewing for Anthropic capability teams because idk man, they just want a safety-adjacent job with a minimal amount of security, and it’s what’s available. Having spoken to a bunch of people, I strongly suspect that of the people that I’d want to fund but won’t be funded, at least a good fraction are significantly less likely to join a scaling lab if they were funded, and not more.
I think this is true at the current margin, because we have so limited money.. But if we receive say enough funding to lower the bar to closer to what our early 2023 bar was, I will still want to make skill-up grants to fairly talented/promising people, and I still think they are quite cost-effective. I do expect those grants to have more capabilities externalities (at least in terms of likelihood, maybe in expectation as well) than when we give grants to people who currently could be hired at (eg) Anthropic but choose not to.
It’s possible you (and maybe Oli?) disagree and think we should fund moderate-to-good direct work projects over all (or almost all) skillup grants; in that case this is a substantive disagreement about what we should do in the future.
E.g. I know of people who are interviewing for Anthropic capability teams because idk man, they just want a safety-adjacent job with a minimal amount of security, and it’s what’s available
That feels concerning. Are there any obvious things that would help with this situation, eg: better career planning and reflection resources for people in this situation, AI safety folks being more clear about what they see as the value/disvalue of working in those types of capability roles?
Seems weird for someone to explicitly want a “safety-adjacent” job unless there are weird social dynamics encouraging people to do that even when there isn’t positive impact to be had from such a job.
I’m not sure which of the people “have ties to dangerous organizations such as Anthropic” in the post (besides Shauna Kravec & Nova DasSarma, who work at Anthropic), but of the current fund managers, I suspect that I have the most direct ties to Anthropic and OAI through my work at ARC Evals. I also have done a plurality of grant evaluations in AI Safety in the last month. So I think I should respond to this comment with my thoughts.
I personally empathize significantly with the concerns raised by Linch and Oli. In fact, when I was debating joining Evals last November, my main reservations centered around direct capabilities externalities and safety washing.
I will say the following facts about AI Safety advancing capabilities:
Empirically, when we look at previous capability advancements produced by people working in the name of “AI Safety” from this community, the overwhelming majority were produced by people who were directly aiming to improve capabilities.
That is, they were not capability externalities from safety research, so much as direct capabilities work.
E.g, it definitely was not the case that GPT-3 was a side effect of alignment research, and OAI and Anthropic are both orgs who explicitly focus on scaling and keeping at the frontier of AI development.
I think the sole exception are a few people who started doing applied RLHF research. Yeah, I think the people who made LLMs commercially viable via did not do a good thing. My main uncertainty is what exactly happened here and how much we contribute to this on the margin.
I generally think that research is significantly more useful when it is targeted (this is a very common view in the community as well). I’m not sure what the exact multiplier is, but I think targeted, non-foundational research is probably 10x more effective than incidentally related research. So the net impact of safety research on capabilities via externalities is probably significantly smaller than the impact of safety research on safety research, or the impact of targeted capabilities research on capabilities research.
I think this point is often overstated or overrated, but the scale of capabilities researchers at this point is really big, and it’s easy to overestimate the impact of one or two particular high profile people.
For what it’s worth, I think that if we are to actually produce good independent alignment research, we need to fund it, and LTFF is basically the only funder in this space. My current guess is a lack of LTFF funding is probably producing more researchers at Anthropic than otherwise, because there just that aren’t many opportunities for people to work on safety or safety-adjacent roles. E.g. I know of people who are interviewing for Anthropic capability teams because idk man, they just want a safety-adjacent job with a minimal amount of security, and it’s what’s available. Having spoken to a bunch of people, I strongly suspect that of the people that I’d want to fund but won’t be funded, at least a good fraction are significantly less likely to join a scaling lab if they were funded, and not more.
(Another possibly helpful datapoint here is that I received an offer from Anthropic last december, and I turned them down.)
I think this is true at the current margin, because we have so limited money.. But if we receive say enough funding to lower the bar to closer to what our early 2023 bar was, I will still want to make skill-up grants to fairly talented/promising people, and I still think they are quite cost-effective. I do expect those grants to have more capabilities externalities (at least in terms of likelihood, maybe in expectation as well) than when we give grants to people who currently could be hired at (eg) Anthropic but choose not to.
It’s possible you (and maybe Oli?) disagree and think we should fund moderate-to-good direct work projects over all (or almost all) skillup grants; in that case this is a substantive disagreement about what we should do in the future.
That feels concerning. Are there any obvious things that would help with this situation, eg: better career planning and reflection resources for people in this situation, AI safety folks being more clear about what they see as the value/disvalue of working in those types of capability roles?
Seems weird for someone to explicitly want a “safety-adjacent” job unless there are weird social dynamics encouraging people to do that even when there isn’t positive impact to be had from such a job.