Something I find interesting is the relationship between believing that the marginal researcher’s impact, if taking a capabilities role, is likely to be negligible, and having a position somewhere on the spectrum other than “this is obviously a terrible idea”.
On one hand, that seems obvious, maybe even logically necessary. On the other hand, I think that the impact of the marginal researcher on capabilities research has a much higher mean than median (impact is heavy-tailed), and this may be even more true for those listening to this advice. I also think the arguments for working on capabilities seem quite weak:
“up-skilling”
My first objection is that it’s not clear why anybody needs to up-skill in a capabilities role before switching to work on alignment. Most alignment organizations don’t have bureaucratic requirements like “[x] years of experience in a similar role”, and being an independent researcher obviously has no requirements whatsoever. The actual skills that might make one more successful at either option… well, that leads to my second objection.
My second objection is that “capabilities” is a poorly-defined term. If one wants to up-skill in ML engineering by e.g. working at an organization which only uses existing techniques to build consumer features, I expect this to have approximately no first-order[1] risk of advancing the capabilities frontier. However, this kind of role by definition doesn’t help you up-skill in areas like “conduct research on unsolved (or worse, unspecified) problems”. To the extent that a role does exercise that kind of skill, that role becomes correspondingly riskier. tl;dr: you can up-skill in “python + ML libraries” pretty safely, as long as the systems you’re working on don’t themselves target inputs to AI (i.e. making cheaper chips, better algorithms, etc), but not in “conduct novel research”.
“influence within capabilities organization”
I think the median outcome of an early-career alignment researcher joining a capabilities org and attempting to exert influence to steer the organization in a more alignment-friendly direction is net-negative (though I’m pretty uncertain). I suspect that for this to be a good idea, it needs to be the primary focus of the person going into the organization, and that person needs to have a strong model of what exactly they’re trying to accomplish and how they’re going to accomplish it, given the structure and political landscape of the organization that they’ll be joining. If you don’t have experience successfully doing this in at least one prior organization, it’s difficult to imagine a justified inside-view expectation of success.
“connections”
See “influence”—what is the plan here? Admittedly connections can at least preserve some optionality when you leave, but I don’t think I’ve actually seen anyone argue the case for how valuable they expect connections to be, and what their model is for deriving that.
In general, I think the balance of considerations quite strongly favors not working on capabilities (in the narrower sense, rather than the “any ML application that isn’t explicitly alignment” sense). The experts themselves seem to be largely split between “obviously bad” and “unclear, balance of trade-offs”, and the second set seem to mostly be conditional on beliefs like:
“I don’t think it’s obvious that capabilities work is net negative”
“I don’t think on the margin AI risk motivated individuals working in these spaces would boost capabilities much”
other confusions or disagreements around the category of “capabilities work”
what I think are very optimistic beliefs about the ability of junior researchers to exert influence over large organizations
I recognize that “We think this is a hard question!” is not necessarily a summary of the surveyed experts’ opinions, but I would be curious to hear the “positive” case for taking a capabilities role implied by it, assuming there’s ground not covered by the opinions above.
Re: “up-skilling”: I think this is underestimating the value of developing maturity in an area before trying to do novel research. These are two separate skills, and developing both simultaneously from scratch doesn’t seem like the fastest path to proficiency to me. Difficulties often multiply.
There is a long standing certification for “proving you’ve learned to do novel research”, the PhD. A prospective student would find it difficult to enter a grad program without any relevant coursework, and it’s not because those institutions think they have equal chances of success as a student who does.
Something I find interesting is the relationship between believing that the marginal researcher’s impact, if taking a capabilities role, is likely to be negligible, and having a position somewhere on the spectrum other than “this is obviously a terrible idea”.
On one hand, that seems obvious, maybe even logically necessary. On the other hand, I think that the impact of the marginal researcher on capabilities research has a much higher mean than median (impact is heavy-tailed), and this may be even more true for those listening to this advice. I also think the arguments for working on capabilities seem quite weak:
“up-skilling”
My first objection is that it’s not clear why anybody needs to up-skill in a capabilities role before switching to work on alignment. Most alignment organizations don’t have bureaucratic requirements like “[x] years of experience in a similar role”, and being an independent researcher obviously has no requirements whatsoever. The actual skills that might make one more successful at either option… well, that leads to my second objection.
My second objection is that “capabilities” is a poorly-defined term. If one wants to up-skill in ML engineering by e.g. working at an organization which only uses existing techniques to build consumer features, I expect this to have approximately no first-order[1] risk of advancing the capabilities frontier. However, this kind of role by definition doesn’t help you up-skill in areas like “conduct research on unsolved (or worse, unspecified) problems”. To the extent that a role does exercise that kind of skill, that role becomes correspondingly riskier. tl;dr: you can up-skill in “python + ML libraries” pretty safely, as long as the systems you’re working on don’t themselves target inputs to AI (i.e. making cheaper chips, better algorithms, etc), but not in “conduct novel research”.
“influence within capabilities organization”
I think the median outcome of an early-career alignment researcher joining a capabilities org and attempting to exert influence to steer the organization in a more alignment-friendly direction is net-negative (though I’m pretty uncertain). I suspect that for this to be a good idea, it needs to be the primary focus of the person going into the organization, and that person needs to have a strong model of what exactly they’re trying to accomplish and how they’re going to accomplish it, given the structure and political landscape of the organization that they’ll be joining. If you don’t have experience successfully doing this in at least one prior organization, it’s difficult to imagine a justified inside-view expectation of success.
“connections”
See “influence”—what is the plan here? Admittedly connections can at least preserve some optionality when you leave, but I don’t think I’ve actually seen anyone argue the case for how valuable they expect connections to be, and what their model is for deriving that.
In general, I think the balance of considerations quite strongly favors not working on capabilities (in the narrower sense, rather than the “any ML application that isn’t explicitly alignment” sense). The experts themselves seem to be largely split between “obviously bad” and “unclear, balance of trade-offs”, and the second set seem to mostly be conditional on beliefs like:
“I don’t think it’s obvious that capabilities work is net negative”
“I don’t think on the margin AI risk motivated individuals working in these spaces would boost capabilities much”
other confusions or disagreements around the category of “capabilities work”
what I think are very optimistic beliefs about the ability of junior researchers to exert influence over large organizations
I recognize that “We think this is a hard question!” is not necessarily a summary of the surveyed experts’ opinions, but I would be curious to hear the “positive” case for taking a capabilities role implied by it, assuming there’s ground not covered by the opinions above.
And I think the second-order effects, like whatever marginal impact your decision has on the market for ML engineers, in pretty trivial in this case.
Re: “up-skilling”: I think this is underestimating the value of developing maturity in an area before trying to do novel research. These are two separate skills, and developing both simultaneously from scratch doesn’t seem like the fastest path to proficiency to me. Difficulties often multiply.
There is a long standing certification for “proving you’ve learned to do novel research”, the PhD. A prospective student would find it difficult to enter a grad program without any relevant coursework, and it’s not because those institutions think they have equal chances of success as a student who does.