Buck comments on Defining alignment research

Buck 26 Aug 2024 17:28 UTC
LW: 8 AF: 7
3
AF
I’d much rather an interpretability team hire someone who’s intrinsically fascinated by neural networks (but doesn’t think much about alignment) than someone who deeply cares about making AI go well (but doesn’t find neural nets very interesting).
I disagree, I’d rather they’d hire someone who cares about making AI go well. E.g. I like Sam Marks’s work on making interpretability techniques useful (e.g. here), and I think he gets a lot of leverage compared to most interpretability researchers via trying to do stuff that’s in the direction of being useful. (Though note that his work builds on the work of non-backchaining interpretability researchers.)