Lao Mein comments on Jailbreaking ChatGPT on Release Day

Lao Mein 2 Dec 2022 13:41 UTC
14 points
11
I have a feeling that their “safety mechanisms” are really just a bit of text saying something like “you’re chatGPT, an AI chat bot that responds to any request for violent information with...”.
Maybe this is intentional, and they’re giving out a cool toy with a lock that’s fun to break while somewhat avoiding the fury of easily-offended journalists?
- tslarm 3 Dec 2022 5:25 UTC
  9 points
  9
  Parent
  Yeah, in cases where the human is very clearly trying to ‘trick’ the AI into saying something problematic, I don’t see why people would be particularly upset with the AI or its creators. (It’d be a bit like writing some hate speech into Word, taking a screenshot and then using that to gin up outrage at Microsoft.)
  If the instructions for doing dangerous or illegal things were any better than could be easily found with a google search, that would be another matter; but at first glance they all seem the same or worse.
  eidt: Likewise, if it was writing superhumanly persuasive political rhetoric then that would be a serious issue. But that too seems like something to worry about with respect to future iterations, not this one. So I wouldn’t assume that OpenAI’s decision to release ChatGPT implies they believed they had it securely locked down.
- Arthur Conmy 3 Dec 2022 19:49 UTC
  1 point
  0
  Parent
  Not sure if you’re aware, but yes the model has a hidden prompt that says it is ChatGPT, and browsing is disabled.
  - mic 4 Dec 2022 23:09 UTC
    2 points
    0
    Parent
    Given that the prompt is apparently:
    Assistant is a large language model trained by OpenAI. knowledge cutoff: 2021-09 Current date: December 01 2022 Browsing: disabled
    it seems that the prompt doesn’t literally contain any text like “you’re chatGPT, an AI chat bot that responds to any request for violent information with...”.