ryan_greenblatt comments on A Rocket–Interpretability Analogy

ryan_greenblatt 21 Oct 2024 17:53 UTC
11 points
4

I think the primary commercial incentive on mechanistic interpretability research is that it’s the alignment research that most provides training and education to become a standard ML engineer who can then contribute to commercial objectives.

Is your claim here that a major factor in why Anthropic and GDM do mech interp is to train employees who can later be commercially useful? I’m skeptical of this.

Maybe the claim is that many people go into mech interp so they can personally skill up and later might pivot into something else (including jobs which pay well)? This seems plausible/likely to me, though it is worth noting that this is a pretty different argument with very different implications from the one in the post.
- habryka 21 Oct 2024 19:16 UTC
  6 points
  4
  Parent
  Yep, I am saying that supply for mech-interp alignment researchers is plenty because of career capital being much more fungible with extremely well-paying ML jobs, and Anthropic and GDM seem interested in sponsoring things like mech-interp MATS streams or other internship and junior-positions, because those fit neatly into their existing talent pipeline, they know how to evaluate that kind of work, and they think that those hires are also more likely to convert into people working on capabilities work.
  - ryan_greenblatt 21 Oct 2024 19:27 UTC
    10 points
    14
    Parent
    I’m pretty skeptical that Neel’s MATS stream is partially supported/subsidized by GDM’s desire to generally hire for capabilities . (And I certainly don’t think they directly fund this.) Same for other mech interp hiring at GDM, I doubt that anyone is thinking “these mech interp employees might convert into employees for capabilities”. That said, this sort of thinking might subsidize the overall alignment/safety team at GDM to some extent, but I think this would mostly be a mistake for the company.
    
    Seems plausible that this is an explicit motivation for junior/internship hiring on the Anthropic interp team. (I don’t think the Anthropic interp team has a MATS stream.)
    - habryka 21 Oct 2024 19:29 UTC
      6 points
      2
      Parent
      I think Neel seems to have a somewhat unique amount of freedom, so I have less of a strong take there, but I am confident that GDM would be substantially less excited about its employees taking time off to mentor a bunch of people if the kind of work they were doing would produce artifacts that were substantially less well-respected by the ML crowd, or did not look like they are demonstrating the kind of skills that are indicative of good ML engineering capability.
      - ryan_greenblatt 21 Oct 2024 21:29 UTC
        4 points
        0
        Parent
        (I think random (non-leadership) GDM employees generally have a lot of freedom while employees of other companies have much less in-practice freedom (except for maybe longer time OpenAI employees who I think have a lot of freedom).)
        habryka 21 Oct 2024 22:35 UTC
        4 points
        0
        Parent
        (My sense is this changed a lot after the Deepmind/GBrain merger and ChatGPT, and the modern GDM seems to give people a lot less slack in the same way, though you are probably still directionally correct)
        ryan_greenblatt 22 Oct 2024 1:18 UTC
        2 points
        0
        Parent
        (Huh, good to know this changed. I wasn’t aware of this.)