“Formally Stating the AI Alignment Problem” is probably the nicest introduction, but if you want a preprint of a more formal approach to how I think this matters (with a couple specific cases), you might like this preprint (though note I am working on getting this through to publication, have it halfway through review with a journal, and although I’ve been time constrained to make the reviewers’ suggested changes, I suspect the final version of this paper will be more like what you are looking for).
“Formally Stating the AI Alignment Problem” is probably the nicest introduction, but if you want a preprint of a more formal approach to how I think this matters (with a couple specific cases), you might like this preprint (though note I am working on getting this through to publication, have it halfway through review with a journal, and although I’ve been time constrained to make the reviewers’ suggested changes, I suspect the final version of this paper will be more like what you are looking for).