RobertM comments on Should we publish mechanistic interpretability research?

RobertM 22 Apr 2023 22:15 UTC
3 points
2
My guess is the interpretability team is under a lot of pressure to produce insights that would help the rest of the org with capabilities work
I would be somewhat surprised if this was true, assuming you mean a strong form of this claim (i.e. operationalizing “help with capabilities work” as relying predominantly on 1st-order effects of technical insights, rather than something like “help with capabilities work by making it easier to recruit people”, and “pressure” as something like top-down prioritization of research directions, or setting KPIs which rely on capabilities externalities, etc).
I think it’s more likely that the interpretability team(s) operate with approximately full autonomy with respect to their research directions, and to the extent that there’s any shaping of outputs, it’s happening mostly at levels like “who are we hiring” and “org culture”.
- habryka 23 Apr 2023 1:07 UTC
  5 points
  2
  Parent
  The pressure here looks more like “I want to produce work that the people around me are excited about, and the kind of thing they are most excited about is stuff that is pretty directly connected to improving capabilities”, whereby I include “getting AIs to perform a wider range of economically useful tasks” as “improving capabilities”.
  I definitely don’t think this is the only pressure the team is under! There are lots of pressures that are acting on them, and my current guess is that it’s not the primary pressure, but I would be surprised if it isn’t quite substantial.