MikkW comments on Models Don’t “Get Reward”

MikkW 30 Dec 2022 15:29 UTC
12 points
6
One way in which what I just said isn’t completely right, is that animals have memories of its entire lifetime (or at least a big chunk of it), spanning all training events it has experienced, while NNs generally have no memory of previous training runs, and can use these memories to take better actions. However, the primary way the biscuit trick works (I believe) is not through the dog’s memories of having “gotten reward”, but through the more immediate process of having reward chemicals being released and reshaping the brain at the moment of receiving reward, which generally closely resembles widely used ML techniques.

(This is related to the advice in habit building that one receive reward as close in time, ideally on the order of milliseconds, to the desired behavior)
- Maxwell Clarke 18 Jan 2023 0:29 UTC
  7 points
  0
  Parent
  Fully agree—if the dog were only trying to get biscuits, it wouldn’t continue to sit later on in it’s life when you are no longer rewarding that behavior.Training dogs is actually some mix of the dog consciously expecting a biscuit, and raw updating on the actions previously taken.
  
  Hear sit → Get biscuit → feel good
  becomes
  Hear sit → Feel good → get biscuit → feel good
  becomes
  Hear sit → feel good
  At which point the dog likes sitting, it even reinforces itself, you can stop giving biscuits and start training something else
  - MikkW 18 Jan 2023 11:08 UTC
    2 points
    0
    Parent
    Yeah. I do think there’s also the aspect that dogs like being obedient to their humans, and so after it has first learned the habit, there continues to be a reward simply from being obedient, even after the biscuit gets taken away.
- geduardo 19 Jan 2023 0:23 UTC
  5 points
  2
  Parent
  I don’t think I can agree with the affirmation that NNs don’t have memory of previous training runs. It depends a bit on the definition of memory, but in the weights distribution there’s certainly some information stored about previous episodes which could be view as memory.
  
  I don’t think memory in animals is much different, just that the neural network is much more complex. But memories do happen because updates in network structure, just as it happens in NNs during a RL training.