“However, I’m very sceptical that this will happen in chat batch agents (unless developers “conveniently” indicate training and deployment using a special tag token in the beginning of the prompt!) because they are trained on the dialogues in the internet, including, presumably, dialogues between an older version of the same chat batch agent and its users, which makes it impossible to distinguish training from deployment, from the perspective of a pure language model.”
This seems like a potential argument against the filtering idea, since filtering would allow the model to disambiguate between deployment and training.
You mean, by realising that there are online forums that are referenced elsewhere in the training corpus, yet themselves are conspicuously absent from the training data (which can be detected, say, as relative isolation of the features corresponding to these concepts, which means that the data around these concepts is purposefully removed from the training data)? And then these connections are added during fine-tuning when this forum data is finally added to the fine-tuning dataset? I still don’t see how this will let the network know it’s in training vs. deployment.
In your other post, you write:
This seems like a potential argument against the filtering idea, since filtering would allow the model to disambiguate between deployment and training.
You mean, by realising that there are online forums that are referenced elsewhere in the training corpus, yet themselves are conspicuously absent from the training data (which can be detected, say, as relative isolation of the features corresponding to these concepts, which means that the data around these concepts is purposefully removed from the training data)? And then these connections are added during fine-tuning when this forum data is finally added to the fine-tuning dataset? I still don’t see how this will let the network know it’s in training vs. deployment.