Noosphere89 comments on [Linkpost] Introducing Superalignment

Noosphere89 5 Jul 2023 21:58 UTC
0 points
0

They’re planning on deliberately training misaligned models!!!! This seems bad if they mean it.

Is this an actual quote, or did you just infer it from the text? Because I would be very surprised if they are deliberately training AI models to be misaligned.
- Garrett Baker 5 Jul 2023 22:01 UTC
  6 points
  0
  Parent
  
  Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing).
  
  Its a quote. I recommend reading the article. Its very short.
  - Noosphere89 5 Jul 2023 22:05 UTC
    2 points
    0
    Parent
    Yeah, I’m not a fan of that paragraph, and in particular I do suspect this may blow up in their faces, though the rest of the plan is probably fine from my perspective.
    - UHMWPE-UwU 6 Jul 2023 0:44 UTC
      4 points
      3
      Parent
      Blow up in their faces?
- Alan E Dunne 5 Jul 2023 22:02 UTC
  3 points
  0
  Parent
  “Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing).”
- Harrison G 5 Jul 2023 22:01 UTC
  3 points
  0
  Parent
  The quote: “Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing).”