New Speaker Series on AI Alignment Starting March 3
After talking with a lot of people in Alignment, I think there is still a lot of good to be done for idea diffusion at the object/technical level. We seem to have done a lot of outreach presenting the philosophical arguments, but less so on the technical ground.
Since the field of Alignment is quite diverse and nuanced, we think that it would be good to present how different people approach this problem on different frontiers. For example, Anthropic’s empirical approach might be very different from say, Christiano’s theoretical thinking on ELK. Therefore, navigating through the landscape of alignment would be essential for building the inside view. I suppose having a good grasp/inside view of alignment would be actually useful for community builders as they could better channel promising people to resources that fit their interests/philosophy. (Think about the community builder as an advisor who has projects of different flavors for the students.)
Motivated to diffuse more state-of-art alignment research ideas to EAs and promising non-EAs who would find alignment interesting and important (eg. most top students in philosophy/maths/physics/CS), we have planned a new series on alignment starting next Thursday, March 3.
We will kick off the series with Holden Karnofsky on “the most important century” and Joseph Carlsmith on the report of “power-seeking AI”. Later on in the series, we will hear about more technical proposals for alignment from Jared Kaplan, Paul Christiano, and more.
Here is the detailed schedule and sign-up form.
Please use the comment threads below for discussions of the series.
Good idea, but the speaker schedule doesn’t seem to reflect this stated goal. Going down the list:
Holden’s “Most Important Century” is not object-level technical alignment work, it’s meta-level content about why AI safety is important
Carlsmith’s is also not object-level alignment work, it’s meta-level content about why AI safety is (or isn’t?) important
Kaplan’s “Scaling Laws in Neural Networks” is also presumably meta-level content about why AI safety is important, not object-level alignment work
Hadfield-Mennel’s “Normative Information for AI systems” I have not heard of, but does sound like object-level alignment work based on the title
Christiano’s “Eliciting Latent Knowledge” is definitely object-level alignment work
Cotra’s “AI Timeline and Alignment Risk” is meta-level content about why AI safety is important, not object-level alignment work
Hendryks’ talk is TBA, so don’t know about that one
So, out of 7 talks listed, 2 are clearly about object-level technical alignment work, and 4 are clearly not about object-level technical alignment work.
Also, I note that almost half the speakers are from OpenPhil, an organization which (to my understanding) directly employed zero object-level technical alignment researchers as of a couple months ago. I do hear some of them have decided to try object-level work recently, in order to better understand it, but that’s a pretty recent development and the object-level work isn’t really the point of that exercise.