I don’t think that the interp team is a part of Anthropic just because they might help with a capabilities edge; seems clear they’d love the agenda to succeed in a way that leaves neural nets no smarter but much better understood. But I’m sure that it’s part of the calculus that this kind of fundamental research is also worth supporting because of potential capability edges. (Especially given the importance of stuff like figuring out the right scaling laws in the competition with OpenAI.)
(Fwiw I don’t take issue with this sort of thing, provided the relationship isn’t exploitative. Like if the people doing the interp work have some power/social capital, and reason to expect derived capabilities to be used responsibly.)
I don’t think that the interp team is a part of Anthropic just because they might help with a capabilities edge; seems clear they’d love the agenda to succeed in a way that leaves neural nets no smarter but much better understood. But I’m sure that it’s part of the calculus that this kind of fundamental research is also worth supporting because of potential capability edges. (Especially given the importance of stuff like figuring out the right scaling laws in the competition with OpenAI.)
(Fwiw I don’t take issue with this sort of thing, provided the relationship isn’t exploitative. Like if the people doing the interp work have some power/social capital, and reason to expect derived capabilities to be used responsibly.)