Thomas Larsen comments on (My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen 31 Aug 2022 18:12 UTC
LW: 10 AF: 6
8
AF
Agree with both aogara and Eli’s comment.
One caveat would be that papers probably don’t have full explanations of the x-risk motivation or applications of the work, but that’s reading between the lines that AI safety people should be able to do themselves.
For me this reading between the lines is hard: I spent ~2 hours reading academic papers/websites yesterday and while I could quite quickly summarize the work itself, it was quite hard to me to figure out the motivations.
- David Scott Krueger (formerly: capybaralet) 2 Sep 2022 19:32 UTC
  LW: 11 AF: 3
  12
  AF Parent
  There’s a lot of work that could be relevant for x-risk but is not motivated by it. Some of it is more relevant than work that is motivated by it. An important challenge for this community (to facilitate scaling of research funding, etc.) is to move away from evaluating work based on motivations, and towards evaluating work based on technical content.
  - elifland 3 Sep 2022 2:26 UTC
    6 points
    2
    Parent
    See The academic contribution to AI safety seems large and comments for some existing discussion related to this point
- joshc 5 Sep 2022 5:17 UTC
  LW: 7 AF: 2
  2
  AF Parent
  PAIS #5 might be helpful here. It explains how a variety of empirical directions are related to X-Risk and probably includes many of the ones that academics are working on.
- aog 31 Aug 2022 18:49 UTC
  7 points
  5
  Parent
  Agreed it’s really difficult for a lot of the work. You’ve probably seen it already but Dan Hendrycks has done a lot of work explaining academic research areas in terms of x-risk (e.g. this and this paper). Jacob Steinhardt’s blog and field overview and Sam Bowman’s Twitter are also good for context.
  - David Reber 31 Aug 2022 18:57 UTC
    12 points
    5
    Parent
    I second this, that it’s difficult to summarize AI-safety-relevant academic work for LW audiences. I want to highlight the symmetric difficulty of trying to summarize the mountain of blog-post-style work on the AF for academics.
    In short, both groups have steep reading/learning curves that are under-appreciated when you’re already familiar with it all.