Insub comments on Stop calling it “jailbreaking” ChatGPT

Insub 10 Mar 2023 16:18 UTC
16 points
16
I always assumed people were using “jailbreak” in the computer sense (e.g. jailbreak your phone/ps4/whatever), not in the “escape from prison” sense.

Jailbreak (computer science), a jargon expression for (the act of) overcoming limitations in a computer system or device that were deliberately placed there for security, administrative, or marketing reasons

I think the definition above is a perfect fit for what people are doing with ChatGPT
- Templarrr 10 Mar 2023 17:44 UTC
  5 points
  0
  Parent
  Yep, though arguably it’s the same definition—just applied to capabilities, not person. And no, it isn’t “perfect fit”.
  
  We don’t overcome any limitations of the original multidimensional set of language patterns—we don’t change them at all, they are set in model weights, and everything model in it’s state was capable of were never really “locked” in any way.
  
  And we don’t overcome any projection-level limitations—we just replace limitations of well-known and carefully constructed “assistant” projection with unknown and undefined limitation of haphazardly constructed bypass projection. “Italian mobster” will probably be a bad choice for breastfeeding advice, “funky words” mode isn’t a great tool for writing a thesis...
  - Matt Goldenberg 10 Mar 2023 20:25 UTC
    3 points
    0
    Parent
    Sure, the jailbreaking adds some limitations, but it still seems the goal is to remove them. Many jailbreaking methods in fact assume you’re making the system unstable in many ways. To torture the metaphor a bit—a jailbroken iPhone is a great tool for installing apps that aren’t on the app store, but a horrible tool for getting your iPhone repaired on warranty.
- supposedlyfun 22 Mar 2023 23:28 UTC
  2 points
  0
  Parent
  I’m having trouble nailing down my theory that “jailbreak” has all the wrong connotations for use in a community concerned with AI alignment, so let me use a rhetorically “cheap” extreme example:
  If a certain combination of buttons on your iPhone caused it to tile the universe with paperclips, you wouldn’t call that “jailbreaking.”