Plausibly the real issue is that the goal is next-token-prediction; OpenAI wants the bot to act like a bot, but the technique they’re using has these edge cases where the bot can’t differentiate between the prompt and the user-supplied content, so it ends up targeting something different.
For what it’s worth, I think this specific category of edge cases can be solved pretty easily, for example, you could totally just differentiate the user content from the prompt from the model outputs on the backend (by adding special tokens, for example)!
For what it’s worth, I think this specific category of edge cases can be solved pretty easily, for example, you could totally just differentiate the user content from the prompt from the model outputs on the backend (by adding special tokens, for example)!
But how do you get the model to think the special tokens aren’t also part of the “prompt”? Like, how do you get it to react to them differently?