For reasons I may/not write about in the near future, many ideas about alignment (especially anything that could be done with today’s systems) could very well accelerate capabilities work.
Counterpoint: at least one kind of research, mechanistic interpretability, could very well be both dangerous by helping capabilities and alsoessential for alignment. My current intuition is that the same could be said of other research avenues.
Yes, there are plenty of dangerous ideas that aren’t so coupled with alignment, but they’re not the frustrating edge-case I’m writing about. (And, of course, I’m not doing or publishing that type of research.)
Right, and that article makes the case that in those cases you should publish. The reasoning is that the value of unpublished research decays rapidly, so if it could help alignment, publish before it loses its value.
I don’t know. It seems to me that we have to make the graph of progress in alignment vs capabilities meet somewhere and part of that would probably involve really thinking about which parts of which bottlenecks are really blockers vs just epiphenomena that tag along but can be optimised away. For instance, in your statement:
If research would be bad for other people to know about, you should mainly just not do it
Then maybe doing research but not having the wrong people know about it is the right intervention, rather than just straight-up not doing it at all?
If it’s too dangerous to publish, it’s not effective to research. From Some background for reasoning about dual-use alignment research
Counterpoint: at least one kind of research, mechanistic interpretability, could very well be both dangerous by helping capabilities and also essential for alignment. My current intuition is that the same could be said of other research avenues.
Yes, there are plenty of dangerous ideas that aren’t so coupled with alignment, but they’re not the frustrating edge-case I’m writing about. (And, of course, I’m not doing or publishing that type of research.)
Right, and that article makes the case that in those cases you should publish. The reasoning is that the value of unpublished research decays rapidly, so if it could help alignment, publish before it loses its value.
Good catch, that certainly motivates me even more to finish my current writings!
Yeah exactly! Not telling anyone until the end just means you missed the chance to push society towards alignment and build on your work. Don’t wait!
I don’t know. It seems to me that we have to make the graph of progress in alignment vs capabilities meet somewhere and part of that would probably involve really thinking about which parts of which bottlenecks are really blockers vs just epiphenomena that tag along but can be optimised away. For instance, in your statement:
Then maybe doing research but not having the wrong people know about it is the right intervention, rather than just straight-up not doing it at all?