geoffreymiller comments on Resources that (I think) new alignment researchers should know about

geoffreymiller 29 Oct 2022 19:23 UTC
5 points
3
Akash—this is very helpful; thanks for compiling it!
I’m struck that much of the advice for newbies interested in ‘AI alignment with human values’ is focused very heavily on the ‘AI’ side of alignment, and not on the ‘human values’ side of alignment—despite the fact that many behavioral and social sciences have been studying human values for many decades.
It might be helpful to expand lists like these to include recommended papers, books, blogs, videos, etc that can help alignment newbies develop a more sophisticated understanding of the human psychology side of alignment.
I have a list of recommended nonfiction books here, but it’s not alignment-focused. From this list though, I think that many alignment researchers might benefit from reading ‘The blank slate’ (2002) by Steven Pinker, ‘The righteous mind’ (2012) by Jonathan Haidt, ‘Intelligence’ (2016) by Stuart Ritchie, etc.
- Noosphere89 30 Oct 2022 13:09 UTC
  2 points
  1
  Parent
  Primarily because right now, we’re not even close to that goal. We’re trying to figure out how to avoid deceptive alignment right now.
  - geoffreymiller 30 Oct 2022 18:07 UTC
    1 point
    0
    Parent
    If we’re nowhere close to solving alignment well enough that even a coarse-grained description of actual human values is relevant yet, then I don’t understand why anyone is advocating further AI research at this point.
    Also, ‘avoiding deceptive alignment’ doesn’t really mean anything if we don’t have a relatively rich and detailed description of what ‘authentic alignment’ with human values would look like.
    I’m truly puzzled by the resistance that the AI alignment community has against learning a bit more about the human values we’re allegedly aligning with.
    - Chipmonk 28 Sep 2023 10:15 UTC
      1 point
      0
      Parent
      I agree with this. What if it was actually possible to formalize morality? (Cf «Boundaries» for formalizing an MVP morality.) Inner alignment seems like it would be a lot easier with a good outer alignment function!
    - Noosphere89 30 Oct 2022 19:09 UTC
      1 point
      0
      Parent
      Mostly because ambitious value learning is really fucking hard, and this proposal falls into all the problems that ambitious or narrow value learning has.
      
      You’re right though that AI capabilities will need to slow down, and I am not hopeful here.