DanielFilan comments on AMA: Paul Christiano, alignment researcher

DanielFilan 30 Apr 2021 7:45 UTC
LW: 25 AF: 11
AF
How many ideas of the same size as “maybe we could use inverse reinforcement learning to learn human values” are we away from knowing how to knowably and reliably build human-level AI technology that wouldn’t cause something comparably bad as human extinction?
- paulfchristiano 30 Apr 2021 19:31 UTC
  LW: 11 AF: 6
  AF Parent
  A lot of this is going to come down to estimates of the denominator.
  (I mostly just think that you might as well just ask people “Is this good?” rather than trying to use a more sophisticated form of IRL—in particular I don’t think that realistic versions of IRL will successfully address the cases where people err in answering the “is it good?” question, that directly asking is more straightforward in many important ways, and that we should mostly just try to directly empower people to give better answers to such questions.)
  Anyway, with that caveat and kind of using the version of your idea that I feel most enthusiastic about (and construing it quite broadly), I have a significant probability on 0, maybe a median somewhere in 10-20, significant probability at very high levels.