This is a nitpick, but I think you’re using the word “pro-social” when you mean something more like “doing socially-endorsed things”. For example, If a bully is beating up a nerd, he’s impressing his (bully) friends, and he’s acting from social motivations, and he’s taking pride in his work, and he’s improving his self-image and popularity, but most people wouldn’t call bullying “pro-social behavior”, right?
Agreed.
Incidentally, I think your description is an overstatement. My claim is that “the valence our “best”/pro-social selves would ascribe” is very relevant to the valence of self-reflective thoughts, to a much greater extent than non-self-reflective thoughts. But they’re not decisive. That’s what I was suggesting by my §2.5.2 example of “Screw being ‘my best self’, I’m tired, I’m going to sleep”.
Also agreed.
Re your reply to my first question:
I think that makes sense iiuc. Does the following correction to my model seem correct?:
I was thinking of it like “self reflective thoughts have some valence—causes---> model of homunculus gets described as wanting those things where self-reflective thoughts have positive valence”. But actually your model is like “there are beliefs about what the model of the homunculus wants—causes---> self-reflective thoughts to get higher valence if they fit to what the homunculus wants”. (Where I think for many people the “what the homunculus wants” is sorta a bit editable and changes in different situations depending on what subagents are in control.)
Re your reply to my second question:
So I’m not sure what your model is, but as far as I understand it seems like the model says “valence of S(X) heavily depends on what homunculus wants” and “what homunculus wants is determined by what goals there is sophisticated brainstorming towards, which are the goals where S(X) is positive valence”. And it’s possible that such a circularity is there, but that alone doesn’t explain to me why the homunculus’ preferences usually end up in the “socially-endorsed” attactor.
I mean another way to phrase the question might be “why are there a difference between ego-syntonic and positive valence? why not just one thing?”. And yeah it’s possible that the answer here doesn’t really require anything new and it’s just that the way valence naturally is coded in our brain is stupid and incoherent and the homunculus-model has higher consistency pressure which straightens out the reflectively endorsed values to be more coherent and in particular neglects myopic high-valence urges. And that the homunculus-model ends up with socially-endorsed preferences because modelling what thoughts come up in the mind is pretty intertwined with language and it makes sense that for language-related thoughts the “is this socially endorsed” thought accessors are particularly strong. Not sure whether that’s the whole story though.
(Also I think ego-dystonic goals can sometimes still cause decently sophisticated brainstorming, especially if it comes from urges that other parts try to suppress and thus learn to “hide their thoughts”. Possibly related is that people often rationalize about why to do something.)
Anyway, I think there’s an innate drive to impress the people who you like in turn. I’ve been calling it the drive to feel liked / admired. It is certainly there for evolutionary reasons, and I think that it’s very strong (in most people, definitely not everyone), and causes a substantial share of ego-syntonic desires, without people realizing it. It has strong self-reflective associations, in that “what the people I like would think of me” centrally involves “me” and what I’m doing, both right now and in general. It’s sufficiently strong that there tends to be a lot of overlap between “the version of myself that I would want others to see, especially whom I respect in turn” versus “the version of myself that I like best all things considered”.
I think that’s similar to what you’re talking about, right?
Yeah sorta. I think what I wanted to get at is that it seems to me that people often think of themselves as (wanting to be) nicer than their behavior would actually imply (though maybe I overestimated how strong that effect is) and I wanted to look for an explanation why.
(Also I generally want to get a great understanding of what values end up being reflectively endorsed and why—this seems very important for alignment.)
Agreed.
Also agreed.
Re your reply to my first question:
I think that makes sense iiuc. Does the following correction to my model seem correct?:
I was thinking of it like “self reflective thoughts have some valence—causes---> model of homunculus gets described as wanting those things where self-reflective thoughts have positive valence”. But actually your model is like “there are beliefs about what the model of the homunculus wants—causes---> self-reflective thoughts to get higher valence if they fit to what the homunculus wants”. (Where I think for many people the “what the homunculus wants” is sorta a bit editable and changes in different situations depending on what subagents are in control.)
Re your reply to my second question:
So I’m not sure what your model is, but as far as I understand it seems like the model says “valence of S(X) heavily depends on what homunculus wants” and “what homunculus wants is determined by what goals there is sophisticated brainstorming towards, which are the goals where S(X) is positive valence”. And it’s possible that such a circularity is there, but that alone doesn’t explain to me why the homunculus’ preferences usually end up in the “socially-endorsed” attactor.
I mean another way to phrase the question might be “why are there a difference between ego-syntonic and positive valence? why not just one thing?”. And yeah it’s possible that the answer here doesn’t really require anything new and it’s just that the way valence naturally is coded in our brain is stupid and incoherent and the homunculus-model has higher consistency pressure which straightens out the reflectively endorsed values to be more coherent and in particular neglects myopic high-valence urges.
And that the homunculus-model ends up with socially-endorsed preferences because modelling what thoughts come up in the mind is pretty intertwined with language and it makes sense that for language-related thoughts the “is this socially endorsed” thought accessors are particularly strong. Not sure whether that’s the whole story though.
(Also I think ego-dystonic goals can sometimes still cause decently sophisticated brainstorming, especially if it comes from urges that other parts try to suppress and thus learn to “hide their thoughts”. Possibly related is that people often rationalize about why to do something.)
Yeah sorta. I think what I wanted to get at is that it seems to me that people often think of themselves as (wanting to be) nicer than their behavior would actually imply (though maybe I overestimated how strong that effect is) and I wanted to look for an explanation why.
(Also I generally want to get a great understanding of what values end up being reflectively endorsed and why—this seems very important for alignment.)