Alan E Dunne comments on [Linkpost] Introducing Superalignment

Alan E Dunne 5 Jul 2023 22:02 UTC
3 points
0
“Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing).”