cwillu comments on LLM keys—A Proposal of a Solution to Prompt Injection Attacks

cwillu 7 Dec 2023 17:43 UTC
2 points
0
- introduce two new special tokens unused during training, which we will call the “keys”
- during instruction tuning include a system prompt surrounded by the keys for each instruction-generation pair
- finetune the LLM to behave in the following way:
  generate text as usual, unless an input attempts to modify the system prompt
  if the input tries to modify the system prompt, generate text refusing to accept the input
- don’t give users access to the keys via API/UI
Besides calling the special control tokens “keys”, this is identical to how instruction-tuning works already.
- Peter Hroššo 7 Dec 2023 18:36 UTC
  1 point
  0
  Parent
  Thanks. So what do you think is the core of the problem? The LLM not recognizing that a user given instruction is trying to modify the system prompt and proceeds out of its bounds?