However, I’m quite skeptical that (mechanistic) interpretability research in particularly gets much more funding due to it directly being a good commercial bet (as in, it is worth it because it ends up directly being commercially useful). And, my guess is that alignment/safety people at AI companies which are ostensibly focused on x-risk/AI takeover prevention are less than 1⁄2 funded via the directly commercial case. (I won’t justify this here, but I think this due to a combination of personal experience and thinking about what these teams tend to work on.)
I think the primary commercial incentive on mechanistic interpretability research is that it’s the alignment research that most provides training and education to become a standard ML engineer who can then contribute to commercial objectives. I am quite confident that a non-trivial fraction of young alignment researchers are going into mech-interp because it also gets them a lot of career capital as a standard ML engineer.
I think the primary commercial incentive on mechanistic interpretability research is that it’s the alignment research that most provides training and education to become a standard ML engineer who can then contribute to commercial objectives.
Is your claim here that a major factor in why Anthropic and GDM do mech interp is to train employees who can later be commercially useful? I’m skeptical of this.
Maybe the claim is that many people go into mech interp so they can personally skill up and later might pivot into something else (including jobs which pay well)? This seems plausible/likely to me, though it is worth noting that this is a pretty different argument with very different implications from the one in the post.
Yep, I am saying that supply for mech-interp alignment researchers is plenty because of career capital being much more fungible with extremely well-paying ML jobs, and Anthropic and GDM seem interested in sponsoring things like mech-interp MATS streams or other internship and junior-positions, because those fit neatly into their existing talent pipeline, they know how to evaluate that kind of work, and they think that those hires are also more likely to convert into people working on capabilities work.
I’m pretty skeptical that Neel’s MATS stream is partially supported/subsidized by GDM’s desire to generally hire for capabilities . (And I certainly don’t think they directly fund this.) Same for other mech interp hiring at GDM, I doubt that anyone is thinking “these mech interp employees might convert into employees for capabilities”. That said, this sort of thinking might subsidize the overall alignment/safety team at GDM to some extent, but I think this would mostly be a mistake for the company.
Seems plausible that this is an explicit motivation for junior/internship hiring on the Anthropic interp team. (I don’t think the Anthropic interp team has a MATS stream.)
I think Neel seems to have a somewhat unique amount of freedom, so I have less of a strong take there, but I am confident that GDM would be substantially less excited about its employees taking time off to mentor a bunch of people if the kind of work they were doing would produce artifacts that were substantially less well-respected by the ML crowd, or did not look like they are demonstrating the kind of skills that are indicative of good ML engineering capability.
(I think random (non-leadership) GDM employees generally have a lot of freedom while employees of other companies have much less in-practice freedom (except for maybe longer time OpenAI employees who I think have a lot of freedom).)
(My sense is this changed a lot after the Deepmind/GBrain merger and ChatGPT, and the modern GDM seems to give people a lot less slack in the same way, though you are probably still directionally correct)
Why are you confident it’s not the other way around? People who decide to pursue alignment research may have prior interest or experience in ML engineering that drives them towards mech-interp.
I think the primary commercial incentive on mechanistic interpretability research is that it’s the alignment research that most provides training and education to become a standard ML engineer who can then contribute to commercial objectives. I am quite confident that a non-trivial fraction of young alignment researchers are going into mech-interp because it also gets them a lot of career capital as a standard ML engineer.
Is your claim here that a major factor in why Anthropic and GDM do mech interp is to train employees who can later be commercially useful? I’m skeptical of this.
Maybe the claim is that many people go into mech interp so they can personally skill up and later might pivot into something else (including jobs which pay well)? This seems plausible/likely to me, though it is worth noting that this is a pretty different argument with very different implications from the one in the post.
Yep, I am saying that supply for mech-interp alignment researchers is plenty because of career capital being much more fungible with extremely well-paying ML jobs, and Anthropic and GDM seem interested in sponsoring things like mech-interp MATS streams or other internship and junior-positions, because those fit neatly into their existing talent pipeline, they know how to evaluate that kind of work, and they think that those hires are also more likely to convert into people working on capabilities work.
I’m pretty skeptical that Neel’s MATS stream is partially supported/subsidized by GDM’s desire to generally hire for capabilities . (And I certainly don’t think they directly fund this.) Same for other mech interp hiring at GDM, I doubt that anyone is thinking “these mech interp employees might convert into employees for capabilities”. That said, this sort of thinking might subsidize the overall alignment/safety team at GDM to some extent, but I think this would mostly be a mistake for the company.
Seems plausible that this is an explicit motivation for junior/internship hiring on the Anthropic interp team. (I don’t think the Anthropic interp team has a MATS stream.)
I think Neel seems to have a somewhat unique amount of freedom, so I have less of a strong take there, but I am confident that GDM would be substantially less excited about its employees taking time off to mentor a bunch of people if the kind of work they were doing would produce artifacts that were substantially less well-respected by the ML crowd, or did not look like they are demonstrating the kind of skills that are indicative of good ML engineering capability.
(I think random (non-leadership) GDM employees generally have a lot of freedom while employees of other companies have much less in-practice freedom (except for maybe longer time OpenAI employees who I think have a lot of freedom).)
(My sense is this changed a lot after the Deepmind/GBrain merger and ChatGPT, and the modern GDM seems to give people a lot less slack in the same way, though you are probably still directionally correct)
(Huh, good to know this changed. I wasn’t aware of this.)
Why are you confident it’s not the other way around? People who decide to pursue alignment research may have prior interest or experience in ML engineering that drives them towards mech-interp.