Matthew Barnett comments on AI Alignment 2018-19 Review

Matthew Barnett 29 Jan 2020 1:05 UTC
LW: 2 AF: 1
AF
it’s not obvious to me that supervised learning does
What type of scheme do you have in mind that would allow an AI to learn our values through supervised learning?
Typically, the problem with supervised learning is that it’s too expensive to label everything we care about. In this case, are you imagining that we label some types of behaviors as good and some as bad, perhaps like what we would do with an approval directed agent? Or are you thinking of something more general or exotic?
- John_Maxwell 29 Jan 2020 3:46 UTC
  LW: 4 AF: 2
  AF Parent
  
  Typically, the problem with supervised learning is that it’s too expensive to label everything we care about.
  
  I don’t think we’ll create AGI without first acquiring capabilities that make supervised learning much more sample-efficient (e.g. better unsupervised methods let us better use unlabeled data, so humans no longer need to label everything they care about, and instead can just label enough data to pinpoint “human values” as something that’s observable in the world—or characterize it as a cousin of some things that are observable in the world).
  
  But if you think there are paths to AGI which don’t go through more sample-efficient supervised learning, one course of action would be to promote differential technological development towards more sample-efficient supervised learning and away from deep reinforcement learning. For example, we could try & convince DeepMind and OpenAI to reallocate resources away from deep RL and towards sample efficiency. (Note: I just stumbled on this recent paper which is probably worth a careful read before considering advocacy of this type.)
  
  In this case, are you imagining that we label some types of behaviors as good and some as bad, perhaps like what we would do with an approval directed agent?
  
  This seems like a promising option.