green_leaf comments on Using GPT-Eliezer against ChatGPT Jailbreaking

green_leaf 6 Dec 2022 22:44 UTC
2 points
2
AF
(If the point is not to allow the AI to output anything misaligned, being conservative is probably the point, and lowering performance seems to be more than acceptable.)
- Beth Barnes 6 Dec 2022 22:53 UTC
  LW: 4 AF: 2
  0
  AF Parent
  Yes, but OpenAI could have just done that by adjusting their classification threshold.
  - green_leaf 7 Dec 2022 1:47 UTC
    3 points
    0
    Parent
    Isn’t that only the case if their filter was the same but weaker?