Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing).
Its a quote. I recommend reading the article. Its very short.
Yeah, I’m not a fan of that paragraph, and in particular I do suspect this may blow up in their faces, though the rest of the plan is probably fine from my perspective.
Its a quote. I recommend reading the article. Its very short.
Yeah, I’m not a fan of that paragraph, and in particular I do suspect this may blow up in their faces, though the rest of the plan is probably fine from my perspective.
Blow up in their faces?