Flagging that the end of “The Tails Coming Apart as Metaphor for Life” more or less describes “distributional shift” from the Concrete Problems in AI Safety paper.
I have a hunch that many AI safety problems end up boiling down to distributional shift in one way or another. For example, here I argued that concerns around Goodhart’s Law are essentially an issue of distributional shift: If the model you’re using for human values is vulnerable to distributional shift, then the maximum value will likely be attained off-distribution.
Flagging that the end of “The Tails Coming Apart as Metaphor for Life” more or less describes “distributional shift” from the Concrete Problems in AI Safety paper.
I have a hunch that many AI safety problems end up boiling down to distributional shift in one way or another. For example, here I argued that concerns around Goodhart’s Law are essentially an issue of distributional shift: If the model you’re using for human values is vulnerable to distributional shift, then the maximum value will likely be attained off-distribution.
Sure. It describes how humans aren’t robust to distributional shift.