Yes, and they cite iterated amplification in their paper as well, but I’m trying to figure out if they’re proposing anything new, because the title here is “New safety research agenda: scalable agent alignment via reward modeling” but Paul’s post that I linked to already proposed recursively applying reward modeling. Seems like either I’m missing something, or they didn’t read that post?
Yes, and they cite iterated amplification in their paper as well, but I’m trying to figure out if they’re proposing anything new, because the title here is “New safety research agenda: scalable agent alignment via reward modeling” but Paul’s post that I linked to already proposed recursively applying reward modeling. Seems like either I’m missing something, or they didn’t read that post?