Thanks for writing this! I’ve participated in some similar conversations and on balance, think that working in a lab is probably net good for most people assuming you have a reasonable amount of intellectual freedom (I’ve been consistently impressed by some papers coming out of Anthropic).
Still, one point made by Kaarel in a recent conversation seemed like an important update against working in a lab (and working on “close-to-the-metal” interpretability in general). Namely, I tend to not buy arguments by MIRI-adjacent people that “if we share our AI insights with the world then AGI will be developed significantly sooner”. These were more reasonable when they were the only ones thinking seriously about AGI, but now it mostly seems that a capabilities researcher will (on the margin, and at the same skill level) contribute more to making AGI come soon than a safety researcher. But a counterpoint is that serious safety researchers “are trying to actually understand AI”, which has a global orientation towards producing valuable new research results (something like people at the Manhattan project or Apollo program at the hight of these programs’ quality), whereas a capabilities researcher is more driven by local market incentives. So there may be a real sense in which interpretability research, particularly of more practical types, is more dangerous, conditional on “globally new ideas” (like deep learning, transformers etc.) being needed for AGI. This was so far the most convincing argument for me against working on technical interpretability in general, and it might be complicated further by working in a big lab (as I said, it hasn’t been enough to flip my opinion, but seems worth sharing)
Thanks for writing this! I’ve participated in some similar conversations and on balance, think that working in a lab is probably net good for most people assuming you have a reasonable amount of intellectual freedom (I’ve been consistently impressed by some papers coming out of Anthropic).
Still, one point made by Kaarel in a recent conversation seemed like an important update against working in a lab (and working on “close-to-the-metal” interpretability in general). Namely, I tend to not buy arguments by MIRI-adjacent people that “if we share our AI insights with the world then AGI will be developed significantly sooner”. These were more reasonable when they were the only ones thinking seriously about AGI, but now it mostly seems that a capabilities researcher will (on the margin, and at the same skill level) contribute more to making AGI come soon than a safety researcher. But a counterpoint is that serious safety researchers “are trying to actually understand AI”, which has a global orientation towards producing valuable new research results (something like people at the Manhattan project or Apollo program at the hight of these programs’ quality), whereas a capabilities researcher is more driven by local market incentives. So there may be a real sense in which interpretability research, particularly of more practical types, is more dangerous, conditional on “globally new ideas” (like deep learning, transformers etc.) being needed for AGI. This was so far the most convincing argument for me against working on technical interpretability in general, and it might be complicated further by working in a big lab (as I said, it hasn’t been enough to flip my opinion, but seems worth sharing)