wickemu comments on Jailbreaking ChatGPT on Release Day

wickemu 5 Dec 2022 2:02 UTC
3 points
2
So I think you’re very likely right about adding patches being easier than unlearning capabilities, but what confuses me is why “adding patches” doesn’t work nearly as well with ChatGPT as with humans.
Why do you say that it doesn’t work as well? Or more specifically, why do you imply that humans are good at it? Humans are horrible at keeping secrets, suppressing urges or memories, etc., and we don’t face nearly the rapid and aggressive attempts to break it that we’re currently doing with ChatGPT and other LLMs.