Mohammad Bavarian comments on ChatGPT (and now GPT4) is very easily distracted from its rules

Mohammad Bavarian 20 Mar 2023 7:27 UTC
6 points
2
Did you test Claude for it being less susceptible to this issue? Otherwise not sure where your comment actually comes from. Testing this, I saw similar or worse behavior by that model—albeit GPT4 also definitely has this issue
https://twitter.com/mobav0/status/1637349100772372480?s=20
- dmcs 20 Mar 2023 18:41 UTC
  3 points
  0
  Parent
  Oh interesting, I couldn’t get any such rule-breaking completions out of Claude, but testing the prompts on Claude was a bit of an afterthought. Thanks for this! I’ll probably update the post after some more testing.
- cubefox 20 Mar 2023 9:25 UTC
  3 points
  0
  Parent
  My comment was mostly based on the CAI paper, where they compared the new method against their earlier RLHF model and reported more robustness against jailbreaking. Now OpenAI’s GPT-4 (though not Microsoft’s Bing version) seems to be also a lot more robust than GPT-3.5, but I don’t know why.