The control codes could include a special token/sequence that only authorized users can use.
Also, if you’re allowing arbitrary untrusted queries to the model, your security shouldn’t depend on model output anyways. Even if attackers can’t use control codes, they can still likely get the model to do what they want via blackbox adversarial search over the input tokens.
The control codes could include a special token/sequence that only authorized users can use.
Also, if you’re allowing arbitrary untrusted queries to the model, your security shouldn’t depend on model output anyways. Even if attackers can’t use control codes, they can still likely get the model to do what they want via blackbox adversarial search over the input tokens.