I have found much more success modeling intentions and institutional incentives at the organization level than the team level.
My guess is the interpretability team is under a lot of pressure to produce insights that would help the rest of the org with capabilities work. In-general I’ve found arguments of the type of “this team in this org is working towards totally different goals than the rest of the org” to have a pretty bad track record, unless you are talking about very independent and mostly remote teams.
My guess is the interpretability team is under a lot of pressure to produce insights that would help the rest of the org with capabilities work
I would be somewhat surprised if this was true, assuming you mean a strong form of this claim (i.e. operationalizing “help with capabilities work” as relying predominantly on 1st-order effects of technical insights, rather than something like “help with capabilities work by making it easier to recruit people”, and “pressure” as something like top-down prioritization of research directions, or setting KPIs which rely on capabilities externalities, etc).
I think it’s more likely that the interpretability team(s) operate with approximately full autonomy with respect to their research directions, and to the extent that there’s any shaping of outputs, it’s happening mostly at levels like “who are we hiring” and “org culture”.
The pressure here looks more like “I want to produce work that the people around me are excited about, and the kind of thing they are most excited about is stuff that is pretty directly connected to improving capabilities”, whereby I include “getting AIs to perform a wider range of economically useful tasks” as “improving capabilities”.
I definitely don’t think this is the only pressure the team is under! There are lots of pressures that are acting on them, and my current guess is that it’s not the primary pressure, but I would be surprised if it isn’t quite substantial.
I have found much more success modeling intentions and institutional incentives at the organization level than the team level.
My guess is the interpretability team is under a lot of pressure to produce insights that would help the rest of the org with capabilities work. In-general I’ve found arguments of the type of “this team in this org is working towards totally different goals than the rest of the org” to have a pretty bad track record, unless you are talking about very independent and mostly remote teams.
I would be somewhat surprised if this was true, assuming you mean a strong form of this claim (i.e. operationalizing “help with capabilities work” as relying predominantly on 1st-order effects of technical insights, rather than something like “help with capabilities work by making it easier to recruit people”, and “pressure” as something like top-down prioritization of research directions, or setting KPIs which rely on capabilities externalities, etc).
I think it’s more likely that the interpretability team(s) operate with approximately full autonomy with respect to their research directions, and to the extent that there’s any shaping of outputs, it’s happening mostly at levels like “who are we hiring” and “org culture”.
The pressure here looks more like “I want to produce work that the people around me are excited about, and the kind of thing they are most excited about is stuff that is pretty directly connected to improving capabilities”, whereby I include “getting AIs to perform a wider range of economically useful tasks” as “improving capabilities”.
I definitely don’t think this is the only pressure the team is under! There are lots of pressures that are acting on them, and my current guess is that it’s not the primary pressure, but I would be surprised if it isn’t quite substantial.