Christopher King comments on Why I’m Not (Yet) A Full-Time Technical Alignment Researcher

Christopher King 25 May 2023 2:44 UTC
1 point
−11

For reasons I may/not write about in the near future, many ideas about alignment (especially anything that could be done with today’s systems) could very well accelerate capabilities work.

If it’s too dangerous to publish, it’s not effective to research. From Some background for reasoning about dual-use alignment research

If research would be bad for other people to know about, you should mainly just not do it.
- Nicholas / Heather Kross 25 May 2023 4:28 UTC
  3 points
  3
  Parent
  Counterpoint: at least one kind of research, mechanistic interpretability, could very well be both dangerous by helping capabilities and also essential for alignment. My current intuition is that the same could be said of other research avenues.
  
  Yes, there are plenty of dangerous ideas that aren’t so coupled with alignment, but they’re not the frustrating edge-case I’m writing about. (And, of course, I’m not doing or publishing that type of research.)
  - Christopher King 25 May 2023 13:13 UTC
    3 points
    2
    Parent
    Right, and that article makes the case that in those cases you should publish. The reasoning is that the value of unpublished research decays rapidly, so if it could help alignment, publish before it loses its value.
    - Nicholas / Heather Kross 25 May 2023 23:04 UTC
      3 points
      0
      Parent
      Good catch, that certainly motivates me even more to finish my current writings!
      - Christopher King 26 May 2023 16:01 UTC
        3 points
        2
        Parent
        Yeah exactly! Not telling anyone until the end just means you missed the chance to push society towards alignment and build on your work. Don’t wait!
- junk heap homotopy 25 May 2023 4:46 UTC
  1 point
  0
  Parent
  I don’t know. It seems to me that we have to make the graph of progress in alignment vs capabilities meet somewhere and part of that would probably involve really thinking about which parts of which bottlenecks are really blockers vs just epiphenomena that tag along but can be optimised away. For instance, in your statement:
  
  If research would be bad for other people to know about, you should mainly just not do it
  
  Then maybe doing research but not having the wrong people know about it is the right intervention, rather than just straight-up not doing it at all?