Another reason for publishing more quickly is that conversations with many interpretability researchers have led us to believe that there is a wealth of knowledge in short experiments and unpublished research that really should be shared. We’d encourage other organizations who think similarly to post frequently, and share results even if they’re not completely polished.
Extremely strong +1! There is so much low-hanging fruit in mechanistic interpretability (of the flavour “I could probably spend a full-time day working on this and find something mildly cool worth writing up into a rough blog post”). I would love a wealth of these posts to exist that I could point people to and read myself! I’ve tried to set myself a much lower bar for this, and still mostly procrastinated on this. I would love to see this.
EDIT: This is also a comparative advantage of being an org outside academia whose employees mostly aren’t aiming for a future career in academia. I gather that in standard academic incentives, being scooped on your research makes the work much less impressive and publishable and can be bad for your career, discincentivising discussing partial results, especially in public. This seems pretty crippling to having healthy and collaborative discourse, but it’s also hard to fault people for following their incentives!
More generally, I really appreciate the reflective tone and candour of this post! I broadly agree re the main themes and that I don’t think Conjecture has really made actions that cut at the hard core of alignment, and these reflections seem plausible to me re concrete but fixable mistakes and deeper and more difficult problems. I look forwards to seeing what you do next!
Extremely strong +1! There is so much low-hanging fruit in mechanistic interpretability (of the flavour “I could probably spend a full-time day working on this and find something mildly cool worth writing up into a rough blog post”). I would love a wealth of these posts to exist that I could point people to and read myself! I’ve tried to set myself a much lower bar for this, and still mostly procrastinated on this. I would love to see this.
EDIT: This is also a comparative advantage of being an org outside academia whose employees mostly aren’t aiming for a future career in academia. I gather that in standard academic incentives, being scooped on your research makes the work much less impressive and publishable and can be bad for your career, discincentivising discussing partial results, especially in public. This seems pretty crippling to having healthy and collaborative discourse, but it’s also hard to fault people for following their incentives!
More generally, I really appreciate the reflective tone and candour of this post! I broadly agree re the main themes and that I don’t think Conjecture has really made actions that cut at the hard core of alignment, and these reflections seem plausible to me re concrete but fixable mistakes and deeper and more difficult problems. I look forwards to seeing what you do next!
To get a better sense of people’s standards’ on “cut at the hard core of alignment”, I’d be curious to hear examples of work that has done so.