Tomek Korbak comments on Pretraining Language Models with Human Preferences

Tomek Korbak 28 Feb 2023 10:53 UTC
8 points
0
For filtering it was 25% of best scores, so we effectively trained for 4 epochs.

(We had different threshold for filtering and conditional training, note that we filter at document level but condition at sentence level.)