I think this is a really useful post, thanks for making this! I maybe have a few things I’d add but broadly I agree with everything here.
You might want to reference Ajeya’s post on ‘Aligning Narrowly Superhuman Models’ where you’re discussing alignment research that can be done with current models
yup, added a sentence about it
I think this is a really useful post, thanks for making this! I maybe have a few things I’d add but broadly I agree with everything here.
You might want to reference Ajeya’s post on ‘Aligning Narrowly Superhuman Models’ where you’re discussing alignment research that can be done with current models
yup, added a sentence about it