I would absolutely expect internalized models to be a part of the thing (to be one of the abstractions or simplifications that your S1 uses to understand all of the data it’s ever experienced). I wouldn’t be surprised to find out that they’re the generator of a lot of the “this is serving my goals” or “this is threatening/dangerous” conclusions that lead to positive and negative pings. I would, however, be surprised to find out that they’re the only thing, or even the dominant one. I think we might disagree on type or hierarchy?
I’m positing that the social stuff you’re pointing out is like one of many “states” in the larger “nation” of brain-models-that-inform-the-brain’s-decision-to-punish-or-reward, whereas if I’m understanding you correctly you’re claiming either that the social modeling is the only model, or that the reward/punishment is always delivered through the social modeling channels (it always “comes from” some person-shaped thing in the head).
Please correct if I’ve misunderstood. I note that I wouldn’t be surprised if it’s like that for some people, but according to my introspection the social dynamic just doesn’t have that much power for me personally.
So, (I claim that) machine learning models provide a pretty good basis for comparison of the dopamine-moving-earlier thing: eg, this is what you’d expect from a system that does a local reinforce-positive update on the policy net as soon as the value net starts predicting a higher future expected value. See something about actor-critic, eg section 3.2.1 of this pdf. Because we’re starting from the prior that the brain is well enough designed to get pretty damn close to working, seeing that policy rewards move earlier is not evidence that should update us away from models where the brain is doing correct temporal difference learning (section 2.3.3 in that pdf).
The social thing I’m suggesting is that the expected value that the value function is predicting on seeing “oh, I gained weight” is a correct representation of future reward, even though it’s a very simple approximation. I don’t mean to say that I think a complicated, multi-step model is being run, just that the usual approximation is approximating a reasoning process that if done in full using the verbal loop, would look something like:
I have higher weight
I now know that I have higher weight
I now have less justified ability to claim high status
When I next interact with someone, I will have less claim to be valuable in their eyes
I will therefore expect them to express slightly less approval toward me, because I won’t be able to hide that I know I feel I have less justified ability to claim status
I am saying that I don’t think implementation of TD-learning is the problem here.
Got it. That makes sense. I think I still disagree, but if I’ve understood you right I can agree that that hypothesis also clearly deserves to be in the mix.
I would absolutely expect internalized models to be a part of the thing (to be one of the abstractions or simplifications that your S1 uses to understand all of the data it’s ever experienced). I wouldn’t be surprised to find out that they’re the generator of a lot of the “this is serving my goals” or “this is threatening/dangerous” conclusions that lead to positive and negative pings. I would, however, be surprised to find out that they’re the only thing, or even the dominant one. I think we might disagree on type or hierarchy?
I’m positing that the social stuff you’re pointing out is like one of many “states” in the larger “nation” of brain-models-that-inform-the-brain’s-decision-to-punish-or-reward, whereas if I’m understanding you correctly you’re claiming either that the social modeling is the only model, or that the reward/punishment is always delivered through the social modeling channels (it always “comes from” some person-shaped thing in the head).
Please correct if I’ve misunderstood. I note that I wouldn’t be surprised if it’s like that for some people, but according to my introspection the social dynamic just doesn’t have that much power for me personally.
So, (I claim that) machine learning models provide a pretty good basis for comparison of the dopamine-moving-earlier thing: eg, this is what you’d expect from a system that does a local reinforce-positive update on the policy net as soon as the value net starts predicting a higher future expected value. See something about actor-critic, eg section 3.2.1 of this pdf. Because we’re starting from the prior that the brain is well enough designed to get pretty damn close to working, seeing that policy rewards move earlier is not evidence that should update us away from models where the brain is doing correct temporal difference learning (section 2.3.3 in that pdf).
The social thing I’m suggesting is that the expected value that the value function is predicting on seeing “oh, I gained weight” is a correct representation of future reward, even though it’s a very simple approximation. I don’t mean to say that I think a complicated, multi-step model is being run, just that the usual approximation is approximating a reasoning process that if done in full using the verbal loop, would look something like:
I have higher weight
I now know that I have higher weight
I now have less justified ability to claim high status
When I next interact with someone, I will have less claim to be valuable in their eyes
I will therefore expect them to express slightly less approval toward me, because I won’t be able to hide that I know I feel I have less justified ability to claim status
I am saying that I don’t think implementation of TD-learning is the problem here.
Got it. That makes sense. I think I still disagree, but if I’ve understood you right I can agree that that hypothesis also clearly deserves to be in the mix.