It seems that PIBBSS might be pivoting away from higher variance blue sky research to focus on more mainstream AI interpretability. While this might create more opportunities for funding, I think this would be a mistake. The AI safety ecosystem needs a home for “weird ideas” and PIBBSS seems the most reputable, competent, EA-aligned place for this! I encourage PIBBSS to “embrace the weird,” albeit while maintaining high academic standards for basic research, modelled off the best basic science institutions.
I was a recent PIBBSS mentor, and am a mech interp person who is likely to be considered mainstream by many people and for this reason I wanted to push back on this concern.
A few thoughts:
I don’t want to put words in your mouth but I do want to clarify that we shouldn’t conflate having some mainstream mech interp and being only mech interp. Importantly, to my knowledge, there is very little chance of PIBBSS entirely doing mech interp, and so I think the salient question is should they have “a bit” (say 5-10% of scholars) do mech interp (which is likely more than they used to). I would advocate for a steady state proportion of between 10 − 20%, see further points for detail).
In my opinion, the word “mainstream” suggests redundancy and brings to mind the idea that “well this could just be done at MATS”. There are two reasons I think this is inaccurate.
First, PIBBSS is likely to accept mentees who may not get into MATS / similar programs. Mentees with diverse background and possibly different skillsets. In my opinion, this kind of diversity can be highly valuable and bring new perspectives to mech interp (which is a pre-paradigmatic field in need of new takes). I’m moderately confident that to the extent existing programs are highly selective, we should expect diversity to suffer in them (if you take the top 10% by metrics like competence, you’re less likely to get the top 10% by diversity of intellectual background).
Secondly, I think there’s a statistical term for this but I forget what it is. PIBBSS being a home for weird ideas in mech interp as much as weird ideas in other areas of AI safety seems entirely reasonable to me.
I also think that even some mainstream mech interp (and possible other areas like evals) should be a part of PIBBSS because it enriches the entire program:
My experience of the PIBBSS retreat and subsequent interactions suggests that a lot of value is created by having people who do empirical work interact with people who do more theoretical work. Empiricists gain ideas and perspective from theorists and theoretical researchers are exposed to more real world observations second hand.
I weakly suspect that some debates / discussions in AI safety may be lopsided / missing details via the absence of sub-fields. In my opinion it’s valuable to sometimes mix up who is in the room but likely worse in expectation to always remove mech interp people (hence my advocacy for 10 − 20% empiricists, with half of them being interp).
Some final notes:
I’m happy to share details of the work my scholar and I did which we expect to publish in the next month.
I’ll be a bit forward and suggest that if you (Ryan) or any other funders find the arguments above convincing then it’s possible you might want to further PIBBSS and ask Nora how PIBBSS can source a bit more “weird” mech interp, a bit of mainstream mech interp and some other empirical sub-fields for the program.
I’ll share this in the PIBBSS slack to see if other’s want to comment :)
I was a recent PIBBSS mentor, and am a mech interp person who is likely to be considered mainstream by many people and for this reason I wanted to push back on this concern.
A few thoughts:
I don’t want to put words in your mouth but I do want to clarify that we shouldn’t conflate having some mainstream mech interp and being only mech interp. Importantly, to my knowledge, there is very little chance of PIBBSS entirely doing mech interp, and so I think the salient question is should they have “a bit” (say 5-10% of scholars) do mech interp (which is likely more than they used to). I would advocate for a steady state proportion of between 10 − 20%, see further points for detail).
In my opinion, the word “mainstream” suggests redundancy and brings to mind the idea that “well this could just be done at MATS”. There are two reasons I think this is inaccurate.
First, PIBBSS is likely to accept mentees who may not get into MATS / similar programs. Mentees with diverse background and possibly different skillsets. In my opinion, this kind of diversity can be highly valuable and bring new perspectives to mech interp (which is a pre-paradigmatic field in need of new takes). I’m moderately confident that to the extent existing programs are highly selective, we should expect diversity to suffer in them (if you take the top 10% by metrics like competence, you’re less likely to get the top 10% by diversity of intellectual background).
Secondly, I think there’s a statistical term for this but I forget what it is. PIBBSS being a home for weird ideas in mech interp as much as weird ideas in other areas of AI safety seems entirely reasonable to me.
I also think that even some mainstream mech interp (and possible other areas like evals) should be a part of PIBBSS because it enriches the entire program:
My experience of the PIBBSS retreat and subsequent interactions suggests that a lot of value is created by having people who do empirical work interact with people who do more theoretical work. Empiricists gain ideas and perspective from theorists and theoretical researchers are exposed to more real world observations second hand.
I weakly suspect that some debates / discussions in AI safety may be lopsided / missing details via the absence of sub-fields. In my opinion it’s valuable to sometimes mix up who is in the room but likely worse in expectation to always remove mech interp people (hence my advocacy for 10 − 20% empiricists, with half of them being interp).
Some final notes:
I’m happy to share details of the work my scholar and I did which we expect to publish in the next month.
I’ll be a bit forward and suggest that if you (Ryan) or any other funders find the arguments above convincing then it’s possible you might want to further PIBBSS and ask Nora how PIBBSS can source a bit more “weird” mech interp, a bit of mainstream mech interp and some other empirical sub-fields for the program.
I’ll share this in the PIBBSS slack to see if other’s want to comment :)