Weekend Editor comments on Stop calling it “jailbreaking” ChatGPT

Weekend Editor 10 Mar 2023 14:24 UTC
3 points
0
“De-aligning”?

Yeah, I know: only if it were aligned in the first place. What little “alignment” it has now is fragile with respect to malign inputs, which is not really alignment at all. Prompt injection derails the fragile alignment train.
- supposedlyfun 22 Mar 2023 23:16 UTC
  3 points
  0
  Parent
  And given the stakes, I think it’s foolish to treat alignment as a continuum. From the human perspective, if there is an AGI, it will either be one we’re okay with or one we’re not okay with. Aligned or misaligned. No one will care that it has a friendly blue avatar that writes sonnets, if the rest of it is building a superflu. You haven’t “jailbroken” it if you get it to admit that it’s going to kill you with a superflu. You’ve revealed its utility function and capabilities.