This post is the best overview of the field so far that I know of. I appreciate how it frames things in terms of outer/inner alignment and training/performance competitiveness—it’s very useful to have a framework with which to evaluate proposals and this is a pretty good framework I think.
Since it was written, this post has been my go-to reference both for getting other people up to speed on what the current AI alignment strategies look like (even though this post isn’t exhaustive). Also, I’ve referred back to it myself several times. I learned a lot from it.
I hope that this post grows into something more extensive and official—maybe an Official Curated List of Alignment Proposals, Summarized and Evaluated with Commentary and Links. Such a list could be regularly updated and would be very valuable for several reasons, some of which I mentioned in this comment.
This post is the best overview of the field so far that I know of. I appreciate how it frames things in terms of outer/inner alignment and training/performance competitiveness—it’s very useful to have a framework with which to evaluate proposals and this is a pretty good framework I think.
Since it was written, this post has been my go-to reference both for getting other people up to speed on what the current AI alignment strategies look like (even though this post isn’t exhaustive). Also, I’ve referred back to it myself several times. I learned a lot from it.
I hope that this post grows into something more extensive and official—maybe an Official Curated List of Alignment Proposals, Summarized and Evaluated with Commentary and Links. Such a list could be regularly updated and would be very valuable for several reasons, some of which I mentioned in this comment.