The field of alignment seems to be increasingly dominated by interpretability. (and obedience[2])
I agree w.r.t potential downstream externalities of interpretability. However, my view from speaking to a fair number of those who join the field to work on interpretability is that the upstream cause is slightly different. Interpretability is easier to understand and presents “cleaner” projects than most others in the field. It’s also much more accessible and less weird to say, an academic, or the kind of person who hasn’t spent a lot of time thinking about alignment.
All of these are likely correlated with having downstream capabilities applications, but the exact mechanism doesn’t look like “people recognizing that this has huge profit potential and therefore growing much faster than other sub-fields”.
I agree that the effect you’re pointing to is real and a large part of what’s going on here, and could easily be convinced that it’s the main cause (along with the one flagged by @habryka). It’s definitely a more visible motivation from the perspective of an individual going through the funnel than the one this post highlights. I was focusing on making one concise point rather than covering the whole space of argument, and am glad comments have brought up other angles.
I agree w.r.t potential downstream externalities of interpretability. However, my view from speaking to a fair number of those who join the field to work on interpretability is that the upstream cause is slightly different. Interpretability is easier to understand and presents “cleaner” projects than most others in the field. It’s also much more accessible and less weird to say, an academic, or the kind of person who hasn’t spent a lot of time thinking about alignment.
All of these are likely correlated with having downstream capabilities applications, but the exact mechanism doesn’t look like “people recognizing that this has huge profit potential and therefore growing much faster than other sub-fields”.
I agree that the effect you’re pointing to is real and a large part of what’s going on here, and could easily be convinced that it’s the main cause (along with the one flagged by @habryka). It’s definitely a more visible motivation from the perspective of an individual going through the funnel than the one this post highlights. I was focusing on making one concise point rather than covering the whole space of argument, and am glad comments have brought up other angles.