Outside view: The proportion of junior researchers doing interp rather than other technical work is too high
Quite tangential[1] to your post but if true, I’m curious about what this suggests about the dynamics of field-building in AI safety.
Seems to me like certain organisations and individuals have an outsized influence in funneling new entrants into specific areas, and because the field is small (and has a big emphasis on community building) this seems more linked to who is running programmes that lots of people hear about and want to apply to (eg: Redwood’s MLAB, REMIX) or taking the time to do field-building-y stuff in general (like Neel’s 200 Concrete Open Problems in Mechanistic Interpretability) rather than the relative quality and promise of their research directions.
It did feel to me like in the past year, some promising university students I know invested a bunch in mechanistic interpretability because they were deferring a bunch to the above-mentioned organisations and individuals to an extent that seems bad for actually doing useful research and having original thoughts. I’ve also been at AI safety events and retreats and such where it seemed to me like the attendees were overupdating on points brought up by whichever speakers got invited to speak at the event/retreat.
I guess I could see it happening in the other direction as well with new people overupdating on for example Redwood moving away from interpretability or the general vibe being less enthusiastic about interp without a good personal understanding of the reasons.
I’d personally guess that the proportion is too high but also feel more positively about interpretability than you do (because of similar points as have been brought up by other commenters).
Link to talk above doesn’t seem to work for me.
Quite tangential[1] to your post but if true, I’m curious about what this suggests about the dynamics of field-building in AI safety.
Seems to me like certain organisations and individuals have an outsized influence in funneling new entrants into specific areas, and because the field is small (and has a big emphasis on community building) this seems more linked to who is running programmes that lots of people hear about and want to apply to (eg: Redwood’s MLAB, REMIX) or taking the time to do field-building-y stuff in general (like Neel’s 200 Concrete Open Problems in Mechanistic Interpretability) rather than the relative quality and promise of their research directions.
It did feel to me like in the past year, some promising university students I know invested a bunch in mechanistic interpretability because they were deferring a bunch to the above-mentioned organisations and individuals to an extent that seems bad for actually doing useful research and having original thoughts. I’ve also been at AI safety events and retreats and such where it seemed to me like the attendees were overupdating on points brought up by whichever speakers got invited to speak at the event/retreat.
I guess I could see it happening in the other direction as well with new people overupdating on for example Redwood moving away from interpretability or the general vibe being less enthusiastic about interp without a good personal understanding of the reasons.
I’d personally guess that the proportion is too high but also feel more positively about interpretability than you do (because of similar points as have been brought up by other commenters).