platers comments on Pretraining Language Models with Human Preferences

platers 27 Mar 2023 5:05 UTC
2 points
0
Thanks for the post! You mention that its unlikely PHF is as sample efficient as RLHF, do you have plans to explore that direction? Most attributes we’d like to condition on are not trivially inferred, so labels are scarce or expensive to acquire. I’m interested in how alignment scales with the amount of labeled data. Perhaps this work could synergize well with TracIn or Influence Functions to identify examples that help or hurt performance on a small test set.
- Tomek Korbak 27 Mar 2023 17:13 UTC
  1 point
  0
  Parent
  In practice I think using a trained reward model (as in RLHF), not fixed labels, is the way forward. Then the cost of acquiring the reward model is the same as in RLHF, the difference is primarily that PHF typically needs much more calls to the reward model than RLHF.