Be kind to yourself for sample efficiency reasons. Reinforcing good behavior provides an exact “policy gradient” towards desired outputs. Whipping yourself provides an “inexact gradient” away from undesired decisions, which is much harder to learn. :)
Noting that you could provide exact “negative” gradients by focusing on what you should have done instead. Although whether this transduces into an internal positive reward event / “exact gradient” is unclear to me. Seems like that stills “feels bad” in similar ways to unconcentrated negative reward events.
Which is why when you learn a new sport it is a good idea to feel happy when your action worked well but mostly ignore failures—that would more likely lead to you not liking the sport than make you better.
Be kind to yourself for sample efficiency reasons. Reinforcing good behavior provides an exact “policy gradient” towards desired outputs. Whipping yourself provides an “inexact gradient” away from undesired decisions, which is much harder to learn. :)
Noting that you could provide exact “negative” gradients by focusing on what you should have done instead. Although whether this transduces into an internal positive reward event / “exact gradient” is unclear to me. Seems like that stills “feels bad” in similar ways to unconcentrated negative reward events.
Which is why when you learn a new sport it is a good idea to feel happy when your action worked well but mostly ignore failures—that would more likely lead to you not liking the sport than make you better.