John_Maxwell comments on Humans aren’t agents—what then for value learning?

John_Maxwell 19 Mar 2019 4:17 UTC
5 points
Flagging that the end of “The Tails Coming Apart as Metaphor for Life” more or less describes “distributional shift” from the Concrete Problems in AI Safety paper.

I have a hunch that many AI safety problems end up boiling down to distributional shift in one way or another. For example, here I argued that concerns around Goodhart’s Law are essentially an issue of distributional shift: If the model you’re using for human values is vulnerable to distributional shift, then the maximum value will likely be attained off-distribution.
- Charlie Steiner 19 Mar 2019 23:33 UTC
  2 points
  Parent
  Sure. It describes how humans aren’t robust to distributional shift.