You might want to reference Ajeya’s post on ‘Aligning Narrowly Superhuman Models’ where you’re discussing alignment research that can be done with current models
yup, added a sentence about it
You might want to reference Ajeya’s post on ‘Aligning Narrowly Superhuman Models’ where you’re discussing alignment research that can be done with current models
yup, added a sentence about it