RSS

John Schulman

Karma: 483

Scal­ing Laws for Re­ward Model Overoptimization

Oct 20, 2022, 12:20 AM
103 points
13 comments1 min readLW link
(arxiv.org)