johnswentworth comments on New Speaker Series on AI Alignment Starting March 3

johnswentworth 27 Feb 2022 16:23 UTC
7 points
After talking with a lot of people in Alignment, I think there is still a lot of good to be done for idea diffusion at the object/technical level. We seem to have done a lot of outreach presenting the philosophical arguments, but less so on the technical ground.
Good idea, but the speaker schedule doesn’t seem to reflect this stated goal. Going down the list:
- Holden’s “Most Important Century” is not object-level technical alignment work, it’s meta-level content about why AI safety is important
- Carlsmith’s is also not object-level alignment work, it’s meta-level content about why AI safety is (or isn’t?) important
- Kaplan’s “Scaling Laws in Neural Networks” is also presumably meta-level content about why AI safety is important, not object-level alignment work
- Hadfield-Mennel’s “Normative Information for AI systems” I have not heard of, but does sound like object-level alignment work based on the title
- Christiano’s “Eliciting Latent Knowledge” is definitely object-level alignment work
- Cotra’s “AI Timeline and Alignment Risk” is meta-level content about why AI safety is important, not object-level alignment work
- Hendryks’ talk is TBA, so don’t know about that one
So, out of 7 talks listed, 2 are clearly about object-level technical alignment work, and 4 are clearly not about object-level technical alignment work.
Also, I note that almost half the speakers are from OpenPhil, an organization which (to my understanding) directly employed zero object-level technical alignment researchers as of a couple months ago. I do hear some of them have decided to try object-level work recently, in order to better understand it, but that’s a pretty recent development and the object-level work isn’t really the point of that exercise.