Counterpoint: at least one kind of research, mechanistic interpretability, could very well be both dangerous by helping capabilities and alsoessential for alignment. My current intuition is that the same could be said of other research avenues.
Yes, there are plenty of dangerous ideas that aren’t so coupled with alignment, but they’re not the frustrating edge-case I’m writing about. (And, of course, I’m not doing or publishing that type of research.)
Right, and that article makes the case that in those cases you should publish. The reasoning is that the value of unpublished research decays rapidly, so if it could help alignment, publish before it loses its value.
Counterpoint: at least one kind of research, mechanistic interpretability, could very well be both dangerous by helping capabilities and also essential for alignment. My current intuition is that the same could be said of other research avenues.
Yes, there are plenty of dangerous ideas that aren’t so coupled with alignment, but they’re not the frustrating edge-case I’m writing about. (And, of course, I’m not doing or publishing that type of research.)
Right, and that article makes the case that in those cases you should publish. The reasoning is that the value of unpublished research decays rapidly, so if it could help alignment, publish before it loses its value.
Good catch, that certainly motivates me even more to finish my current writings!
Yeah exactly! Not telling anyone until the end just means you missed the chance to push society towards alignment and build on your work. Don’t wait!