electroswing comments on AMA: Paul Christiano, alignment researcher

electroswing 29 Apr 2021 22:01 UTC
22 points
According to your internal model of the problem of AI safety, what are the main axes of disagreement researchers have?
- paulfchristiano 30 Apr 2021 21:37 UTC
  19 points
  Parent
  The three that first come to mind:
  - How much are ML systems likely to learn to “look good to the person training them” in a way that will generalize scarily to novel test-time situations, vs learning to straightforwardly do what we are trying to train them to do?
  - How much alien knowledge are ML systems likely to have? Will humans be able to basically understand what they are doing with some effort, or will it quickly become completely beyond us?
  - How much time will we have to adapt gradually as AI systems improve, and how fast will we be able to adapt? How similar will the problems that arise be to the ones we can anticipate now?