Sam Clarke comments on AMA: Paul Christiano, alignment researcher

Sam Clarke 30 Apr 2021 7:59 UTC
17 points
Are there any research questions you’re excited about people working on, for making AI go (existentially) well, that are not related to technical AI alignment or safety? If so, what? (I’m especially interested in AI strategy/governance questions)
- paulfchristiano 30 Apr 2021 22:51 UTC
  10 points
  Parent
  Not sure if you want “totally unrelated to technical AI safety” or just “not basically the same as technical AI safety.” Going for somewhere in between.
  - I think that futurism in general is underdone and pretty impactful on the margin, especially if it’s reasonably careful and convincing.
  - I think that broad institutional quality and preparedness for weird stuff is more likely to make stuff go well. I think that particular norms and mechanisms to cope with high-stakes AI development, to enforce and monitor agreements, to establish international trust, etc. all seem likely to be impactful. I don’t have really detailed views about this field.
  - I think that there are tons of other particular bad things that can happen with AI many of which give a suggest a lot of stuff to work on. Stuff like differential tech progress for physical tech at the expense of wisdom, rash AI-mediated binding commitments from bad negotiation, other weird game theory, AI-driven arguments messing up collective deliberation about what we want, crazy cybersecurity risks. There is stuff to do both on the technical side (though often that’s going to be a bit rougher than alignment in that it’s just e.g. researching how to use AI for mitigation, and on the institutional side) and on governance / thinking through responses / agreements / other preparedness.
  - I’m interested in a bunch of philosophical questions like “Should we be nice to AI?”, “What kind of AI should we make if we’re going to hand over the world?” and so on.
- Sam Clarke 30 Apr 2021 12:00 UTC
  7 points
  Parent
  Relatedly: if we manage to solve intent alignment (including making it competitive) but still have an existential catastrophe, what went wrong?