Ruby comments on EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024

Ruby Jun 5, 2024, 2:23 AM
10 points
7
Curated. I like this post for several reasons: Making predictions about future research seems neat and valuable – I could see the habit of doing this, especially if predicting results, helping one build skill in prioritizing research. As scasper says, interpretability isn’t yet practically helpful, and even if that’s been said a lot, it’s worth continuing to say that, especially as mech interp continues to be one of the most accessible/hottest AI safety tech work paths. And I like this work for being a review. LessWrong’s annual review works to elicit reviews like this, but they’re valuable to have immediately to help people put things in context and orient them.