tailcalled comments on Models Don’t “Get Reward”

tailcalled 30 Dec 2022 13:04 UTC
21 points
6
I would add that there are other implementations of reinforcement learning where the dog metaphor are closer to correct. The example you give is for the most basic kind of RL, called policy gradients.
- London L. 21 Feb 2023 23:54 UTC
  1 point
  0
  Parent
  When you say “the dog metaphor” do you mean the original one with the biscuit, or the later one with the killing and breeding?
  - tailcalled 22 Feb 2023 8:23 UTC
    3 points
    0
    Parent
    Biscuit