Why is, according to your model, the valence of self-reflective thoughts sorta the valence our “best”/pro-social selves would ascribe?
That would be §2.5.1. The idea is that, in general, there are lots of kinds of self-reflective thoughts: thoughts that involve me, and what I’m doing, and what I’m thinking about, and how my day is going, and whether I’m following through with my new years resolution, and what other people would think of me right now, and so on.
These all tend to have salient associations with each other. If I’m thinking about how my day is going, it might remind me that I had promised myself to exercise every day, which might remind me that Sally called me fat, and so on.
Whereas non-self-reflective thoughts by and large have less relation to that whole cloud of associations. If I’m engrossed in a movie and thinking about how the prince is fighting a dragon in a river, or even if I’m just thinking about how best to chop this watermelon, then I’m not thinking about any of those self-reflective things in the above paragraph, and am unlikely to for at least the next second or two.
Incidentally, I think your description is an overstatement. My claim is that “the valence our “best”/pro-social selves would ascribe” is very relevant to the valence of self-reflective thoughts, to a much greater extent than non-self-reflective thoughts. But they’re not decisive. That’s what I was suggesting by my §2.5.2 example of “Screw being ‘my best self’, I’m tired, I’m going to sleep”. The reason that they’re very relevant is those salient associations I just mentioned. If I self-reflect on what I’m thinking about, then that kinda reminds me of how what I’m thinking about reflects on myself in general; so if the latter seems really good and motivating, then some of that goodness will splash onto the former too.
Do you buy that? Sorry if I’m misunderstanding.
Why does the homunculus get modeled as wanting pro-social/best-self stuff (as opposed to just what overall valence would imply)?
Again, I think this is an overstatement, per the §2.5.2 example of “Screw being ‘my best self’, I’m tired, I’m going to sleep”. But it’s certainly directionally true, and I was talking about that in §3.5.1. I think the actual rule is that, if planning / brainstorming is happening towards some goal G, then we imagine that “the homunculus wants G”, since in general the planning / brainstorming process in general pattern-matches to “wanting” (i.e., we can predict what will probably wind up happening without knowing how).
So that moves us to the question: “if planning / brainstorming is happening towards some goal G, then why do we conclude that S(G) is positive valence, rather than concluding that G is positive valence?” For one thing, if G is negative-valence but S(G) is positive-valence, then we’ll still do the planning / brainstorming, we just focus our attention on S(G) rather than G during that process. That’s my example above of “I really wanted and intended to step into the ice-cold shower, but when I got there, man, I just couldn’t.” Relatedly, if the brainstorming process involves self-reflective thoughts, then that enables better brainstorming, for example involving attention-control strategies, making deals with yourself, etc. (more in Post 8). And another part of the answer is the refrigerator-light illusion, as mentioned in §3.5.1 (and see also the edge-case of “impulsive planning” in §3.5.2).
Does that help?
I’d guess that there was evolutionary pressure for a self-model/homunculus to seem more pro-social as the overall behavior (and thoughts) of the human might imply, so I guess there might be some particular programming from evolution into that direction. I don’t know how exactly it might look like though. I also wouldn’t be shocked if it’s mostly just like all the non-myopic desires are pretty pro-social and the self-model’s values get straightened out in a way the myopic desires end up dropped because that would be incoherent. Would be interested in hearing your model on my questions above.
This is a nitpick, but I think you’re using the word “pro-social” when you mean something more like “doing socially-endorsed things”. For example, If a bully is beating up a nerd, he’s impressing his (bully) friends, and he’s acting from social motivations, and he’s taking pride in his work, and he’s improving his self-image and popularity, but most people wouldn’t call bullying “pro-social behavior”, right?
Anyway, I think there’s an innate drive to impress the people who you like in turn. I’ve been calling it the drive to feel liked / admired. It is certainly there for evolutionary reasons, and I think that it’s very strong (in most people, definitely not everyone), and causes a substantial share of ego-syntonic desires, without people realizing it. It has strong self-reflective associations, in that “what the people I like would think of me” centrally involves “me” and what I’m doing, both right now and in general. It’s sufficiently strong that there tends to be a lot of overlap between “the version of myself that I would want others to see, especially whom I respect in turn” versus “the version of myself that I like best all things considered”.
I think that’s similar to what you’re talking about, right?
This is a nitpick, but I think you’re using the word “pro-social” when you mean something more like “doing socially-endorsed things”. For example, If a bully is beating up a nerd, he’s impressing his (bully) friends, and he’s acting from social motivations, and he’s taking pride in his work, and he’s improving his self-image and popularity, but most people wouldn’t call bullying “pro-social behavior”, right?
Agreed.
Incidentally, I think your description is an overstatement. My claim is that “the valence our “best”/pro-social selves would ascribe” is very relevant to the valence of self-reflective thoughts, to a much greater extent than non-self-reflective thoughts. But they’re not decisive. That’s what I was suggesting by my §2.5.2 example of “Screw being ‘my best self’, I’m tired, I’m going to sleep”.
Also agreed.
Re your reply to my first question:
I think that makes sense iiuc. Does the following correction to my model seem correct?:
I was thinking of it like “self reflective thoughts have some valence—causes---> model of homunculus gets described as wanting those things where self-reflective thoughts have positive valence”. But actually your model is like “there are beliefs about what the model of the homunculus wants—causes---> self-reflective thoughts to get higher valence if they fit to what the homunculus wants”. (Where I think for many people the “what the homunculus wants” is sorta a bit editable and changes in different situations depending on what subagents are in control.)
Re your reply to my second question:
So I’m not sure what your model is, but as far as I understand it seems like the model says “valence of S(X) heavily depends on what homunculus wants” and “what homunculus wants is determined by what goals there is sophisticated brainstorming towards, which are the goals where S(X) is positive valence”. And it’s possible that such a circularity is there, but that alone doesn’t explain to me why the homunculus’ preferences usually end up in the “socially-endorsed” attactor.
I mean another way to phrase the question might be “why are there a difference between ego-syntonic and positive valence? why not just one thing?”. And yeah it’s possible that the answer here doesn’t really require anything new and it’s just that the way valence naturally is coded in our brain is stupid and incoherent and the homunculus-model has higher consistency pressure which straightens out the reflectively endorsed values to be more coherent and in particular neglects myopic high-valence urges. And that the homunculus-model ends up with socially-endorsed preferences because modelling what thoughts come up in the mind is pretty intertwined with language and it makes sense that for language-related thoughts the “is this socially endorsed” thought accessors are particularly strong. Not sure whether that’s the whole story though.
(Also I think ego-dystonic goals can sometimes still cause decently sophisticated brainstorming, especially if it comes from urges that other parts try to suppress and thus learn to “hide their thoughts”. Possibly related is that people often rationalize about why to do something.)
Anyway, I think there’s an innate drive to impress the people who you like in turn. I’ve been calling it the drive to feel liked / admired. It is certainly there for evolutionary reasons, and I think that it’s very strong (in most people, definitely not everyone), and causes a substantial share of ego-syntonic desires, without people realizing it. It has strong self-reflective associations, in that “what the people I like would think of me” centrally involves “me” and what I’m doing, both right now and in general. It’s sufficiently strong that there tends to be a lot of overlap between “the version of myself that I would want others to see, especially whom I respect in turn” versus “the version of myself that I like best all things considered”.
I think that’s similar to what you’re talking about, right?
Yeah sorta. I think what I wanted to get at is that it seems to me that people often think of themselves as (wanting to be) nicer than their behavior would actually imply (though maybe I overestimated how strong that effect is) and I wanted to look for an explanation why.
(Also I generally want to get a great understanding of what values end up being reflectively endorsed and why—this seems very important for alignment.)
That would be §2.5.1. The idea is that, in general, there are lots of kinds of self-reflective thoughts: thoughts that involve me, and what I’m doing, and what I’m thinking about, and how my day is going, and whether I’m following through with my new years resolution, and what other people would think of me right now, and so on.
These all tend to have salient associations with each other. If I’m thinking about how my day is going, it might remind me that I had promised myself to exercise every day, which might remind me that Sally called me fat, and so on.
Whereas non-self-reflective thoughts by and large have less relation to that whole cloud of associations. If I’m engrossed in a movie and thinking about how the prince is fighting a dragon in a river, or even if I’m just thinking about how best to chop this watermelon, then I’m not thinking about any of those self-reflective things in the above paragraph, and am unlikely to for at least the next second or two.
Incidentally, I think your description is an overstatement. My claim is that “the valence our “best”/pro-social selves would ascribe” is very relevant to the valence of self-reflective thoughts, to a much greater extent than non-self-reflective thoughts. But they’re not decisive. That’s what I was suggesting by my §2.5.2 example of “Screw being ‘my best self’, I’m tired, I’m going to sleep”. The reason that they’re very relevant is those salient associations I just mentioned. If I self-reflect on what I’m thinking about, then that kinda reminds me of how what I’m thinking about reflects on myself in general; so if the latter seems really good and motivating, then some of that goodness will splash onto the former too.
Do you buy that? Sorry if I’m misunderstanding.
Again, I think this is an overstatement, per the §2.5.2 example of “Screw being ‘my best self’, I’m tired, I’m going to sleep”. But it’s certainly directionally true, and I was talking about that in §3.5.1. I think the actual rule is that, if planning / brainstorming is happening towards some goal G, then we imagine that “the homunculus wants G”, since in general the planning / brainstorming process in general pattern-matches to “wanting” (i.e., we can predict what will probably wind up happening without knowing how).
So that moves us to the question: “if planning / brainstorming is happening towards some goal G, then why do we conclude that S(G) is positive valence, rather than concluding that G is positive valence?” For one thing, if G is negative-valence but S(G) is positive-valence, then we’ll still do the planning / brainstorming, we just focus our attention on S(G) rather than G during that process. That’s my example above of “I really wanted and intended to step into the ice-cold shower, but when I got there, man, I just couldn’t.” Relatedly, if the brainstorming process involves self-reflective thoughts, then that enables better brainstorming, for example involving attention-control strategies, making deals with yourself, etc. (more in Post 8). And another part of the answer is the refrigerator-light illusion, as mentioned in §3.5.1 (and see also the edge-case of “impulsive planning” in §3.5.2).
Does that help?
This is a nitpick, but I think you’re using the word “pro-social” when you mean something more like “doing socially-endorsed things”. For example, If a bully is beating up a nerd, he’s impressing his (bully) friends, and he’s acting from social motivations, and he’s taking pride in his work, and he’s improving his self-image and popularity, but most people wouldn’t call bullying “pro-social behavior”, right?
Anyway, I think there’s an innate drive to impress the people who you like in turn. I’ve been calling it the drive to feel liked / admired. It is certainly there for evolutionary reasons, and I think that it’s very strong (in most people, definitely not everyone), and causes a substantial share of ego-syntonic desires, without people realizing it. It has strong self-reflective associations, in that “what the people I like would think of me” centrally involves “me” and what I’m doing, both right now and in general. It’s sufficiently strong that there tends to be a lot of overlap between “the version of myself that I would want others to see, especially whom I respect in turn” versus “the version of myself that I like best all things considered”.
I think that’s similar to what you’re talking about, right?
Agreed.
Also agreed.
Re your reply to my first question:
I think that makes sense iiuc. Does the following correction to my model seem correct?:
I was thinking of it like “self reflective thoughts have some valence—causes---> model of homunculus gets described as wanting those things where self-reflective thoughts have positive valence”. But actually your model is like “there are beliefs about what the model of the homunculus wants—causes---> self-reflective thoughts to get higher valence if they fit to what the homunculus wants”. (Where I think for many people the “what the homunculus wants” is sorta a bit editable and changes in different situations depending on what subagents are in control.)
Re your reply to my second question:
So I’m not sure what your model is, but as far as I understand it seems like the model says “valence of S(X) heavily depends on what homunculus wants” and “what homunculus wants is determined by what goals there is sophisticated brainstorming towards, which are the goals where S(X) is positive valence”. And it’s possible that such a circularity is there, but that alone doesn’t explain to me why the homunculus’ preferences usually end up in the “socially-endorsed” attactor.
I mean another way to phrase the question might be “why are there a difference between ego-syntonic and positive valence? why not just one thing?”. And yeah it’s possible that the answer here doesn’t really require anything new and it’s just that the way valence naturally is coded in our brain is stupid and incoherent and the homunculus-model has higher consistency pressure which straightens out the reflectively endorsed values to be more coherent and in particular neglects myopic high-valence urges.
And that the homunculus-model ends up with socially-endorsed preferences because modelling what thoughts come up in the mind is pretty intertwined with language and it makes sense that for language-related thoughts the “is this socially endorsed” thought accessors are particularly strong. Not sure whether that’s the whole story though.
(Also I think ego-dystonic goals can sometimes still cause decently sophisticated brainstorming, especially if it comes from urges that other parts try to suppress and thus learn to “hide their thoughts”. Possibly related is that people often rationalize about why to do something.)
Yeah sorta. I think what I wanted to get at is that it seems to me that people often think of themselves as (wanting to be) nicer than their behavior would actually imply (though maybe I overestimated how strong that effect is) and I wanted to look for an explanation why.
(Also I generally want to get a great understanding of what values end up being reflectively endorsed and why—this seems very important for alignment.)