RSS

Ran W

Karma: 15

Why do we need RLHF? Imi­ta­tion, In­verse RL, and the role of reward

Ran W3 Feb 2024 4:00 UTC
14 points
0 comments5 min readLW link