Stephen Fowler comments on Challenge: Does ChatGPT ever claim that a bad outcome for humanity is actually good?

Stephen Fowler 23 Mar 2023 4:18 UTC
1 point
0
This makes me wonder if we will eventually start to get LLM “hacks” that are genuine hacks. I’m imagining a scenario in which bugs like SolidGoldMagikarp can be manipulated to be genuine vulnerabilities.

(But I suspect trying to make a one-to-one analogy might be a little naive)
- JoeTheUser 23 Mar 2023 19:44 UTC
  3 points
  1
  Parent
  I’d say my point above would generalize to “there are no strong borders between ‘ordinary language acts’ and ‘genuine hacks’” as far as what level of manipulation ability one can gain over model output. The main further danger would be if the model was given more output channels with which an attacker could work mischief. And that may be appearing as well - notably: https://openai.com/blog/chatgpt-plugins