I finally got around to reading this today, because I have been thinking about doing more interpretability work, so I wanted to give this piece a chance to talk me out of it.
It mostly didn’t.
A lot of this boils down to “existing interpretability work is unimpressive”. I think this is an important point, and significant sub-points were raised to argue it. However, it says little ‘against almost every theory of impact of interpretability’. We can just do better work.
A lot of the rest boils down to “enumerative safety is dumb”. I agree, at least for the version of “enumerative safety” you argue against here.
My impact story (for the work I am considering doing) is most similar to the “retargeting” story which you briefly mention, but barely critique.
I do think the world would be better off if this were required reading for anyone considering going into interpretability vs other areas. (Barring weird side-effects of the counterfactual where someone has the ability to enforce required reading...) It is a good piece of work which raises many important points.
I finally got around to reading this today, because I have been thinking about doing more interpretability work, so I wanted to give this piece a chance to talk me out of it.
It mostly didn’t.
A lot of this boils down to “existing interpretability work is unimpressive”. I think this is an important point, and significant sub-points were raised to argue it. However, it says little ‘against almost every theory of impact of interpretability’. We can just do better work.
A lot of the rest boils down to “enumerative safety is dumb”. I agree, at least for the version of “enumerative safety” you argue against here.
My impact story (for the work I am considering doing) is most similar to the “retargeting” story which you briefly mention, but barely critique.
I do think the world would be better off if this were required reading for anyone considering going into interpretability vs other areas. (Barring weird side-effects of the counterfactual where someone has the ability to enforce required reading...) It is a good piece of work which raises many important points.