Maxime Riché comments on The Waluigi Effect (mega-post)

Maxime Riché 6 Mar 2023 13:42 UTC
2 points
1
Indeed, empirical results show that filtering the data, helps quite well in aligning with some preferences: Pretraining Language Models with Human Preferences