A puzzling question is why your brain doesn’t get this right automatically. In particular, deciding whether to gather some food before sleeping is an issue mammals have faced in the EEA for millions of years.
Temporal difference learning seems so basic that brains ought to implement it reasonably accurately. Any idea why we might do the wrong thing in this case?
Temporal difference learning seems so basic that brains ought to implement it reasonably accurately.
I’m guessing that it has to do with the kinds of “things” that are linked to a later consequence. For example, we seem to be pretty good at avoiding or frequenting the kinds of places where we tend to have negative or positive experiences. And we’re also good at linking physical items or concrete actions to their consequences—like in Roko’s example about the bills:
For example, suppose that you started off in life with a wandering mind and were punished a few times for failing to respond to official letters. Your TDL algorithm began to propagate the pain back to the moment you looked at an official letter or bill. As a result, you would be less effective than average at responding, so you got punished a few more times. Henceforth, when you received a bill, you got the pain before you even opened it, and it laid unpaid on the mantelpiece until a Big Bad Red late payment notice with an $25 fine arrived. More negative conditioning. Now even thinking about a bill, form or letter invokes the flinch response, and your lizard brain has fully cut you out out.
But “not going to the store results in hunger the next morning” seems like a more abstract thing. The fact that it’s the lack of an action, rather than the presence of one, seems particularly relevant. Neither the store nor the act of going there is something that’s directly associated with getting hungry. If anything it’s my earlier thought of possibly needing to go to the store… and I guess it’s possible that to the extent that anything gets negatively reinforced, it’s the act of me even considering it, since it’s the only concrete action that my brain can directly link to the consequence!
Also, if I do go to the store, there isn’t any clear reward that would reinforce my behavior. The reward is simply that I won’t be hungry the next morning… but that’s not something that would be very out of the ordinary, for not-being-hungry is just the normal state of being. And being in a neutral state doesn’t produce a reward. I guess that if I enjoyed food more, getting to eat could be more of a reward in itself.
(I’m very sure that there exist mountains of literature on this very topic that could answer the question rather conclusively, but I don’t have the energy to go do a lit search right now.)
How is temporal difference learning basic? Do you think that if I give my dog a treat every morning if he obeyed my command to sit the previous day, that would teach him to sit? How would he connect those two events, out of all the events over the day?
Until you’ve become comparatively good at predicting the future (entails good models, which entails cognitive effort, which necessitates a reasonably developed cognitive architecture), an immediate benefit will often outweigh some nebulous possible future reward (in OP’s parlance, value).
A puzzling question is why your brain doesn’t get this right automatically. In particular, deciding whether to gather some food before sleeping is an issue mammals have faced in the EEA for millions of years.
Temporal difference learning seems so basic that brains ought to implement it reasonably accurately. Any idea why we might do the wrong thing in this case?
I’m guessing that it has to do with the kinds of “things” that are linked to a later consequence. For example, we seem to be pretty good at avoiding or frequenting the kinds of places where we tend to have negative or positive experiences. And we’re also good at linking physical items or concrete actions to their consequences—like in Roko’s example about the bills:
But “not going to the store results in hunger the next morning” seems like a more abstract thing. The fact that it’s the lack of an action, rather than the presence of one, seems particularly relevant. Neither the store nor the act of going there is something that’s directly associated with getting hungry. If anything it’s my earlier thought of possibly needing to go to the store… and I guess it’s possible that to the extent that anything gets negatively reinforced, it’s the act of me even considering it, since it’s the only concrete action that my brain can directly link to the consequence!
Also, if I do go to the store, there isn’t any clear reward that would reinforce my behavior. The reward is simply that I won’t be hungry the next morning… but that’s not something that would be very out of the ordinary, for not-being-hungry is just the normal state of being. And being in a neutral state doesn’t produce a reward. I guess that if I enjoyed food more, getting to eat could be more of a reward in itself.
(I’m very sure that there exist mountains of literature on this very topic that could answer the question rather conclusively, but I don’t have the energy to go do a lit search right now.)
How is temporal difference learning basic? Do you think that if I give my dog a treat every morning if he obeyed my command to sit the previous day, that would teach him to sit? How would he connect those two events, out of all the events over the day?
A bird in the hand is worth two in the bush.
Until you’ve become comparatively good at predicting the future (entails good models, which entails cognitive effort, which necessitates a reasonably developed cognitive architecture), an immediate benefit will often outweigh some nebulous possible future reward (in OP’s parlance, value).