(context: I ran the most recent iteration of ARENA, and after this I joined Neel Nanda’s mech interp stream in SERI MATS)
Registering a strong pushback to the comment on ARENA. The primary purpose of capstone projects isn’t to turn people into AI safety technical researchers or to produce impressive capstones, it’s to give people engineering skills & experience working on group projects. The initial idea was not to even push for things that were safety-specific (much like Redwood’s recommendations—all of the suggested MLAB2 capstones were either mech interp or non-safety, iirc). The reason many people gravitated towards mech interp is that they spent a lot of time around researchers and people who were doing interesting work in mech interp, and it seemed like a good fit for both getting a feel for AI safety technical research and for general skilling up in engineering.
Additionally, I want to mention that participant responses to the question “how have your views on AI safety changed?” included both positive and negative updates on mech interp, but much more uniformly showed positive updates on AI safety technical research as a whole. Evidence like this updates me away from the hypothesis that mech interp is pulling safety researchers from other disciplines. To give a more personal example, I had done alignment research before being exposed to mech interp, but none of it made much of an impression on me. I didn’t choose mech interp instead of other technical safety research, I chose it instead of a finance career.
This being said, there is an argument that ARENA (at least the most recent iteration) had too much of a focus on mech interp, and this is something we may try to rectify in future iterations.
(context: I ran the most recent iteration of ARENA, and after this I joined Neel Nanda’s mech interp stream in SERI MATS)
Registering a strong pushback to the comment on ARENA. The primary purpose of capstone projects isn’t to turn people into AI safety technical researchers or to produce impressive capstones, it’s to give people engineering skills & experience working on group projects. The initial idea was not to even push for things that were safety-specific (much like Redwood’s recommendations—all of the suggested MLAB2 capstones were either mech interp or non-safety, iirc). The reason many people gravitated towards mech interp is that they spent a lot of time around researchers and people who were doing interesting work in mech interp, and it seemed like a good fit for both getting a feel for AI safety technical research and for general skilling up in engineering.
Additionally, I want to mention that participant responses to the question “how have your views on AI safety changed?” included both positive and negative updates on mech interp, but much more uniformly showed positive updates on AI safety technical research as a whole. Evidence like this updates me away from the hypothesis that mech interp is pulling safety researchers from other disciplines. To give a more personal example, I had done alignment research before being exposed to mech interp, but none of it made much of an impression on me. I didn’t choose mech interp instead of other technical safety research, I chose it instead of a finance career.
This being said, there is an argument that ARENA (at least the most recent iteration) had too much of a focus on mech interp, and this is something we may try to rectify in future iterations.