The acronym is definitely used for reinforcement learning. [“RLHF” “reinforcement learning from human feedback”] gets 564 hits on google, [“RLHF” “reward learning from human feedback”] gets 0.
Thanks for verifying! I retract my comment.
The acronym is definitely used for reinforcement learning. [“RLHF” “reinforcement learning from human feedback”] gets 564 hits on google, [“RLHF” “reward learning from human feedback”] gets 0.
Thanks for verifying! I retract my comment.