I don’t have an easy way of slicing my work up / think that it depends on how you slice it. Broadly I think the two candidates are (i) making RL from human feedback more practical and getting people excited about it at OpenAI, (ii) the theoretical sequence from approval-directed agents and informed oversight to iterated amplification to getting a clear picture of the limits of iterated amplification and setting out on my current research project. Some steps of that were really hard for me at the time though basically all of them now feel obvious.
My favorite blog post was probably approval-directed agents, though this is very much based on judging by the standards of how-confused-Paul-started-out. I think that it set me on a way better direction for thinking about AI safety (and I think it also helped a lot of people in a similar way). Ultimately it’s clear that I didn’t really understand where the difficulties were, and I’ve learned a lot in the last 6 years, but I’m still proud of it.
I don’t have an easy way of slicing my work up / think that it depends on how you slice it. Broadly I think the two candidates are (i) making RL from human feedback more practical and getting people excited about it at OpenAI, (ii) the theoretical sequence from approval-directed agents and informed oversight to iterated amplification to getting a clear picture of the limits of iterated amplification and setting out on my current research project. Some steps of that were really hard for me at the time though basically all of them now feel obvious.
My favorite blog post was probably approval-directed agents, though this is very much based on judging by the standards of how-confused-Paul-started-out. I think that it set me on a way better direction for thinking about AI safety (and I think it also helped a lot of people in a similar way). Ultimately it’s clear that I didn’t really understand where the difficulties were, and I’ve learned a lot in the last 6 years, but I’m still proud of it.