I’m glad to see this, although maybe this swings a little the other way to being too unfair to what you did accomplish. A task unfinished is not necessarily a task failed, and I don’t think this disproves the worth of what I was imagining would be the final product. The biggest surprise for me when reading about this project was that the initial description made me think that a natural fit for an outcome from this would be ‘a fine tuned generator which chooses not to make violent completions’, and that the classifier should be a step on the way to that goal. I think that’s still a reasonable next step. Figure out a way to use the classifier to make a fine-tuned violence-avoidant generator. Then measure that, alone and in combination with the classifier.
One technique that might help for fine-tuning the generator is Meta AI’s DIRECTOR [1]. The technique uses a classifier to estimate the probability that a generated sequence will be unacceptable each time a new token is generated. Rather than generating full completions and sampling among them, this method guides the towards acceptable completions during the generation process. The Blender Bot 3 paper finds that this method works better than the more standard approach of ranking full completions according to the classifier’s acceptability score [2].
I’m glad to see this, although maybe this swings a little the other way to being too unfair to what you did accomplish. A task unfinished is not necessarily a task failed, and I don’t think this disproves the worth of what I was imagining would be the final product. The biggest surprise for me when reading about this project was that the initial description made me think that a natural fit for an outcome from this would be ‘a fine tuned generator which chooses not to make violent completions’, and that the classifier should be a step on the way to that goal. I think that’s still a reasonable next step. Figure out a way to use the classifier to make a fine-tuned violence-avoidant generator. Then measure that, alone and in combination with the classifier.
One technique that might help for fine-tuning the generator is Meta AI’s DIRECTOR [1]. The technique uses a classifier to estimate the probability that a generated sequence will be unacceptable each time a new token is generated. Rather than generating full completions and sampling among them, this method guides the towards acceptable completions during the generation process. The Blender Bot 3 paper finds that this method works better than the more standard approach of ranking full completions according to the classifier’s acceptability score [2].
[1] https://arxiv.org/pdf/2206.07694.pdf
[2] https://arxiv.org/pdf/2208.03188.pdf