Lukas Finnveden comments on Two-year update on my personal AI timelines

Lukas Finnveden 3 Aug 2022 18:59 UTC
10 points
5
Reinforcement* learning from human feedback
- Tom Lieberum 3 Aug 2022 21:28 UTC
  0 points
  0
  Parent
  Well I thought about that but I wasn’t sure whether reinforcement learning from human feedback wouldn’t be just a strict subset of reward learning from human feedback. If reinforcement is indeed the strict definition then I concede but I dont think it makes sense.
  - Lukas Finnveden 3 Aug 2022 21:50 UTC
    7 points
    4
    Parent
    The acronym is definitely used for reinforcement learning. [“RLHF” “reinforcement learning from human feedback”] gets 564 hits on google, [“RLHF” “reward learning from human feedback”] gets 0.
    - Tom Lieberum 8 Aug 2022 11:55 UTC
      2 points
      0
      Parent
      Thanks for verifying! I retract my comment.
  - Tom Lieberum 3 Aug 2022 21:33 UTC
    −2 points
    −2
    Parent
    I think historically reinforcement has been used more in that particular constellation (see eg deep RL from HP paper) but as I noted I find reward learning more apt as it points to the hard thing being the reward learning, i.e. distilling human feedback into an objective, rather than the optimization of any given reward function (which technically need not involve reinforcement learning)