paulfchristiano comments on What failure looks like

paulfchristiano 28 Apr 2023 15:26 UTC
5 points
Consider a competent policy that wants paperclips in the very long run. It could reason “I should get a low loss to get paperclips,” and then get a low loss. As a result, it could be selected by gradient descent.