Nathan Helm-Burger comments on Takeaways from our robust injury classifier project [Redwood Research]

Nathan Helm-Burger 19 Sep 2022 3:14 UTC
5 points
0
I’m glad to see this, although maybe this swings a little the other way to being too unfair to what you did accomplish. A task unfinished is not necessarily a task failed, and I don’t think this disproves the worth of what I was imagining would be the final product. The biggest surprise for me when reading about this project was that the initial description made me think that a natural fit for an outcome from this would be ‘a fine tuned generator which chooses not to make violent completions’, and that the classifier should be a step on the way to that goal. I think that’s still a reasonable next step. Figure out a way to use the classifier to make a fine-tuned violence-avoidant generator. Then measure that, alone and in combination with the classifier.
- aog 19 Sep 2022 5:32 UTC
  3 points
  1
  Parent
  One technique that might help for fine-tuning the generator is Meta AI’s DIRECTOR [1]. The technique uses a classifier to estimate the probability that a generated sequence will be unacceptable each time a new token is generated. Rather than generating full completions and sampling among them, this method guides the towards acceptable completions during the generation process. The Blender Bot 3 paper finds that this method works better than the more standard approach of ranking full completions according to the classifier’s acceptability score [2].
  
  [1] https://arxiv.org/pdf/2206.07694.pdf
  
  [2] https://arxiv.org/pdf/2208.03188.pdf