Temporal difference learning seems so basic that brains ought to implement it reasonably accurately.
I’m guessing that it has to do with the kinds of “things” that are linked to a later consequence. For example, we seem to be pretty good at avoiding or frequenting the kinds of places where we tend to have negative or positive experiences. And we’re also good at linking physical items or concrete actions to their consequences—like in Roko’s example about the bills:
For example, suppose that you started off in life with a wandering mind and were punished a few times for failing to respond to official letters. Your TDL algorithm began to propagate the pain back to the moment you looked at an official letter or bill. As a result, you would be less effective than average at responding, so you got punished a few more times. Henceforth, when you received a bill, you got the pain before you even opened it, and it laid unpaid on the mantelpiece until a Big Bad Red late payment notice with an $25 fine arrived. More negative conditioning. Now even thinking about a bill, form or letter invokes the flinch response, and your lizard brain has fully cut you out out.
But “not going to the store results in hunger the next morning” seems like a more abstract thing. The fact that it’s the lack of an action, rather than the presence of one, seems particularly relevant. Neither the store nor the act of going there is something that’s directly associated with getting hungry. If anything it’s my earlier thought of possibly needing to go to the store… and I guess it’s possible that to the extent that anything gets negatively reinforced, it’s the act of me even considering it, since it’s the only concrete action that my brain can directly link to the consequence!
Also, if I do go to the store, there isn’t any clear reward that would reinforce my behavior. The reward is simply that I won’t be hungry the next morning… but that’s not something that would be very out of the ordinary, for not-being-hungry is just the normal state of being. And being in a neutral state doesn’t produce a reward. I guess that if I enjoyed food more, getting to eat could be more of a reward in itself.
(I’m very sure that there exist mountains of literature on this very topic that could answer the question rather conclusively, but I don’t have the energy to go do a lit search right now.)
I’m guessing that it has to do with the kinds of “things” that are linked to a later consequence. For example, we seem to be pretty good at avoiding or frequenting the kinds of places where we tend to have negative or positive experiences. And we’re also good at linking physical items or concrete actions to their consequences—like in Roko’s example about the bills:
But “not going to the store results in hunger the next morning” seems like a more abstract thing. The fact that it’s the lack of an action, rather than the presence of one, seems particularly relevant. Neither the store nor the act of going there is something that’s directly associated with getting hungry. If anything it’s my earlier thought of possibly needing to go to the store… and I guess it’s possible that to the extent that anything gets negatively reinforced, it’s the act of me even considering it, since it’s the only concrete action that my brain can directly link to the consequence!
Also, if I do go to the store, there isn’t any clear reward that would reinforce my behavior. The reward is simply that I won’t be hungry the next morning… but that’s not something that would be very out of the ordinary, for not-being-hungry is just the normal state of being. And being in a neutral state doesn’t produce a reward. I guess that if I enjoyed food more, getting to eat could be more of a reward in itself.
(I’m very sure that there exist mountains of literature on this very topic that could answer the question rather conclusively, but I don’t have the energy to go do a lit search right now.)