1a3orn comments on Bootstrapping Language Models

1a3orn 29 May 2022 14:49 UTC
3 points
For investigation of the kind of thing you suggest, take a look at Anthropic’s “A General Language Assistant as a Laboratory for Alignment” and more importantly “Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback”.

They focus on training a helpful / harmless assistant rather than good short stories, but using human-filtered model-output to improve behavior is the basic paradigm.
- harsimony 29 May 2022 16:38 UTC
  1 point
  Parent
  Thanks for the pointer, I will check that out!