A selection of stuff just from the Alignment Newsletter (which probably misses most of the work):
From Optimizing Engagement to Measuring Value
Aligning AI to Human Values means Picking the Right Metrics
Designing Recommender Systems to Depolarize
What are you optimizing for? Aligning Recommender Systems with Human Values
Aligning Recommender Systems as Cause Area (Not in the newsletter since I don’t find it persuasive, I wrote about why in a comment on the post)
Learning to Summarize with Human Feedback (and its predecessor, Fine-Tuning GPT-2 from Human Preferences)
A selection of stuff just from the Alignment Newsletter (which probably misses most of the work):
From Optimizing Engagement to Measuring Value
Aligning AI to Human Values means Picking the Right Metrics
Designing Recommender Systems to Depolarize
What are you optimizing for? Aligning Recommender Systems with Human Values
Aligning Recommender Systems as Cause Area (Not in the newsletter since I don’t find it persuasive, I wrote about why in a comment on the post)
Learning to Summarize with Human Feedback (and its predecessor, Fine-Tuning GPT-2 from Human Preferences)