Richard_Ngo comments on The alignment problem from a deep learning perspective

Richard_Ngo 16 Jan 2024 23:42 UTC
LW: 6 AF: 5
0
AF
Ty for review. I still think it’s better, because it gets closer to concepts that might actually be investigated directly. But happy to agree to disagree here.
Small relevant datapoint: the paper version of this was just accepted to ICLR, making it the first time a high-level “case for misalignment as an x-risk” has been accepted at a major ML conference, to my knowledge. (Though Langosco’s goal misgeneralization paper did this a little bit, and was accepted at ICML.)