I thought this post and associated paper was worse than Richard’s previous sequence “AGI safety from first principles”, but despite that, I still think it’s one of the best pieces of introductory content for AI X-risk. I’ve also updated that good communication around AI X-risk stuff will probably involve writing many specialized introductions that work within the epistemic frames and methodologies of many different communities, and I think this post does reasonably well at that for the ML community (though I am not a great judge of that).
Ty for review. I still think it’s better, because it gets closer to concepts that might actually be investigated directly. But happy to agree to disagree here.
Small relevant datapoint: the paper version of this was just accepted to ICLR, making it the first time a high-level “case for misalignment as an x-risk” has been accepted at a major ML conference, to my knowledge. (Though Langosco’s goal misgeneralization paper did this a little bit, and was accepted at ICML.)
I thought this post and associated paper was worse than Richard’s previous sequence “AGI safety from first principles”, but despite that, I still think it’s one of the best pieces of introductory content for AI X-risk. I’ve also updated that good communication around AI X-risk stuff will probably involve writing many specialized introductions that work within the epistemic frames and methodologies of many different communities, and I think this post does reasonably well at that for the ML community (though I am not a great judge of that).
Ty for review. I still think it’s better, because it gets closer to concepts that might actually be investigated directly. But happy to agree to disagree here.
Small relevant datapoint: the paper version of this was just accepted to ICLR, making it the first time a high-level “case for misalignment as an x-risk” has been accepted at a major ML conference, to my knowledge. (Though Langosco’s goal misgeneralization paper did this a little bit, and was accepted at ICML.)