RSS

Ran W

Karma: 16

Why do we need RLHF? Imi­ta­tion, In­verse RL, and the role of reward

Ran WFeb 3, 2024, 4:00 AM
15 points
0 comments5 min readLW link