Thanks for writing this!! Great post, strong endorse. Here are some nitpicks / elaborations.
I think the question of functional hemispheric lateralization is a fruitful and fascinating one that Steve tends to emphasize less in his models, I suspect because of his sympathies to “neocortical blank-slate-ism.”
The cortex has a rather complicated neural architecture, with allegedly 180 distinguishable regions, which have different types and densities of connections with each other and with other parts of the brain, different “hyperparameters”, etc. I want to say that cortical hemispherical specialization is a special case of this more general phenomenon of cortical specialization. So I would say: “I haven’t blogged about the differences between hemispheres” is in the same category as “I haven’t blogged about the difference between the mid-insular cortex and the posterior insular cortex”. Of course there are interesting differences; it just hasn’t come up. :-P As it happens, I do have strong opinions about the roles of mid-insular cortex vs posterior insular cortex, even if I haven’t written about them. By contrast, I’m pretty ignorant about hemispherical differences, with a few exceptions. I haven’t read Master & Emissary. It’s possible that I’m missing something important. :)
I also have found that the phrase “blank slate” gives people the wrong idea, and switched to “learning from scratch” with the definition here.
self-referential misalignment
I agree with this part. We certainly don’t want an AGI with aligned object-level motivations, but regards these motivations as ego-dystonic :-P There’s a sense in which misaligned self-reflective thoughts and misaligned object-level thoughts are “all just part of the alignment problem”, but I think the misaligned self-reflective thoughts are a sufficiently impactful and probable failure mode that they’re worth thinking about separately.
ToM is basically just inverse reinforcement learning (IRL) through Bayesian inference.
Sure. We can construct a compositional generative model of a person and fit it to the data using Bayesian inference, just as we can construct a compositional generative model of a car engine and fit it to the data using Bayesian inference. In fact, I talked to a couple people with autism and they both independently described learning to interact with and understand and predict people as feeling similar to gaining an understanding of how car engines work etc. (If I understood them correctly.) They had excellent ToM by the way; they would have no problem whatsoever with Jessica’s red box. I think it’s an interesting sign that neurotypical people probably wouldn’t describe learning-to-socialize in that way, and suggests that the IRL part is at most just a piece of the puzzle. (Which I guess is consistent with what you said.)
affective empathy, which adds to the “I understand how you’re feeling...” of affective ToM: “...and now I feel this way, too!”.
Sure, that is a thing. But there’s also a thing where, knowing how somebody’s feeling induces a reaction, but the reaction is not in the direction of feeling more similar to them. For example, if I’m suffering, and I see that you’re laughing at me, it does NOT make me feel more like you feel (playful and safe and high-status), but rather it makes me feel mad as hell. Or if you’re sad, maybe I’ll feel sad, but also maybe I’ll feel schadenfreude. I assume that both the “sad” reaction and the “schadenfreude” reaction are innate reactions. Somehow the genome has encoded into the brain a system for deciding whether “sad” or “schadenfreude” is the correct reaction in any given situation. I’m confused what that system is / how it’s built.
Thank you! I think these are all good/important points.
In regards to functional specialization between the hemispheres, I think whether this difference is at the same level as mid-insular cortex vs posterior insular cortex would depend on whether the hemispheric differences can account for certain lower-order distinctions of this sort or not. For example, let’s say that there are relevant functional differences between left ACC and right ACC, left vmPFC and right vmPFC, and left insular cortex and right insular cortex—and that these differences all have something in common (i.e., there is something characteristic about the kinds of computations that differentiate left-hemispheric ACC, vmPFC, insula from right-hemispheric ACC, vmPFC, insula). Then, you might have a case for the hemispheric difference being more fundamental or important than, say, the distinction between mid-insular cortex vs posterior insular cortex. But that’s only if these conditions hold (i.e., that there are functional differences and these differences have intra-hemispheric commonalities). I think there’s a good chance something like this might be true, but I obviously haven’t put forward an argument for this yet, so I don’t blame anyone for not taking my word for it!
I’m not fully grasping the autism/ToM/IRL point yet. My understanding of people on the autism spectrum is that they typically lackordinary ToM, though I’m certainly not saying that I don’t believe the people you’ve spoken with; maybe only that they might be the exception rather than the rule (there are accounts that emphasize things others than ToM, though, to your point). If it is true that (1) autistic people use mechanisms other than ToM/IRL to understand people (i.e., modeling people like car engines), and (2) autistic people have social deficits, then I’m not yet seeing how this demonstrates that IRL is ‘at most’ just a piece of the puzzle. (FWIW, I would be surprised if IRL were the only piece of the puzzle; I’m just not yet grasping how this argument shows this.) I can tell I’m missing something.
And I agree with the sad vs. schadenfreude point. I think in an earlier exchange you made the point that this sort of thing could be conceivably modulated by in-group style dynamics. More specifically, I think that the extent to which I can look at a person, their situation, the outcome, etc., and notice (probably implicitly) that I could end up in a similar situation, it’s adaptive for me to “simulate” what it is probably like for them to be in this position so I can learn from their experience without having to go through the experience myself. As you note, there are exceptions to this—I think this is particularly when we are looking at people more as “objects” (i.e., complex external variables in our environments) than “subjects” (other agents with internal states, goals, etc. just like me). I think this is well-demonstrated by the following examples.
1, lion-as-subject: I go to the zoo and see a lion. “Ooh, aah! Super majestic.” Suddenly, a huge branch falls onto the lion, trapping it. It yelps loudly. I audibly wince, and I really hope the lion is okay. (Bonus subjects: other people around the enclosure also demonstrate they’re upset/disturbed by what just happened, which makes me even more upset/disturbed!)
2: lion-as-object: I go on a safari alone and my car breaks down, so I need to walk to the nearest station to get help. As I’m doing this, a lion starts stalking and chasing me. Oh crap. Suddenly, a huge branch falls onto the lion, trapping it. It yelps loudly. “Thank goodness. That was almost really bad.”
Very different reactions to the same narrow event. So I guess this kind of thing demonstrates to me that I’m inclined to make stronger claims about affective empathy in those situations where we’re looking at other agents in our environment as subjects, not objects. I think in eusocial creatures like humans, subject-perspective is probably far more common than object-perspective, though one could certainly come up with lots of examples of both. So definitely more to think about here, but I really like this kind of challenge to an overly-simplistic picture of affective empathy wherein someone else feeling way X automatically and context-independently makes me feel way X. This, to your point, just seems wrong.
My understanding of people on the autism spectrum is that they typically lackordinary ToM
The link says “high-functioning adults with ASD…can easily pass the false belief task when explicitly asked to”. So there you go! Perfectly good ToM, right?
The paper also says they “do not show spontaneous false belief attribution”. But if you look at Figure 3, they “fail” the test by looking equally at the incorrect window and correct window, not by looking disproportionately at the incorrect window. So I would suggest that the most likely explanation is not that the ASD adults are screwing up the ToM task, but rather that they’re taking no interest in the ToM task! Remember, the subjects were never asked to pay any attention to the person! Maybe they just didn’t! So I say this is a case of motivation, not capability. Maybe they were sitting there during the test, thinking to themselves “Gee, that’s a neat diorama, I wonder how the experimenters glued it together!” :-P That would also be consistent with the eye-tracking results mentioned in the book excerpt here. (I recall also a Temple Grandin anecdote (I can’t immediately find it) about getting fMRI’d, and she said she basically ignored the movie she was nominally supposed to be looking at, because she was so interested in some aspect of how the scientists had set up the experiment.) Anyway, the paper you link doesn’t report (AFAICT) what fraction of the time the subjects are looking at neither window—they effectively just throw those trials away I think—which to me seems like discarding the most interesting data!
If it is true that (1) autistic people use mechanisms other than ToM/IRL to understand people (i.e., modeling people like car engines)
I think you misunderstood me here. I’m suggesting that maybe:
ToM ≈ IRL ≈ building a good generative model that explains observations of humans
“understanding car engines” ≈ building a good generative model that explains observations of car engines.
I guess you’re assuming that a good generative model of a mind must contain special ingredients that a good generative model of a car engine does not need? I don’t currently think that. Well, more specifically, I think “the particular general-purpose toolkit that a human brain uses for building generative models” is sufficient for both modeling minds and modeling car engines. (I can imagine other generative-model-building toolkits that are not.) For example, the thought “Sally believes the sky is green” seems to me to be of similarly construction to the thought “The engine is not painted green, but if it were, the paint would quickly rub off and contaminate the engine fluid”. Both kinda involve an invoking and manipulation of a counterfactual world and relating it to the real world. I could be wrong, but anyway that’s what I meant.
Thanks for writing this!! Great post, strong endorse. Here are some nitpicks / elaborations.
The cortex has a rather complicated neural architecture, with allegedly 180 distinguishable regions, which have different types and densities of connections with each other and with other parts of the brain, different “hyperparameters”, etc. I want to say that cortical hemispherical specialization is a special case of this more general phenomenon of cortical specialization. So I would say: “I haven’t blogged about the differences between hemispheres” is in the same category as “I haven’t blogged about the difference between the mid-insular cortex and the posterior insular cortex”. Of course there are interesting differences; it just hasn’t come up. :-P As it happens, I do have strong opinions about the roles of mid-insular cortex vs posterior insular cortex, even if I haven’t written about them. By contrast, I’m pretty ignorant about hemispherical differences, with a few exceptions. I haven’t read Master & Emissary. It’s possible that I’m missing something important. :)
I also have found that the phrase “blank slate” gives people the wrong idea, and switched to “learning from scratch” with the definition here.
I agree with this part. We certainly don’t want an AGI with aligned object-level motivations, but regards these motivations as ego-dystonic :-P There’s a sense in which misaligned self-reflective thoughts and misaligned object-level thoughts are “all just part of the alignment problem”, but I think the misaligned self-reflective thoughts are a sufficiently impactful and probable failure mode that they’re worth thinking about separately.
Sure. We can construct a compositional generative model of a person and fit it to the data using Bayesian inference, just as we can construct a compositional generative model of a car engine and fit it to the data using Bayesian inference. In fact, I talked to a couple people with autism and they both independently described learning to interact with and understand and predict people as feeling similar to gaining an understanding of how car engines work etc. (If I understood them correctly.) They had excellent ToM by the way; they would have no problem whatsoever with Jessica’s red box. I think it’s an interesting sign that neurotypical people probably wouldn’t describe learning-to-socialize in that way, and suggests that the IRL part is at most just a piece of the puzzle. (Which I guess is consistent with what you said.)
Sure, that is a thing. But there’s also a thing where, knowing how somebody’s feeling induces a reaction, but the reaction is not in the direction of feeling more similar to them. For example, if I’m suffering, and I see that you’re laughing at me, it does NOT make me feel more like you feel (playful and safe and high-status), but rather it makes me feel mad as hell. Or if you’re sad, maybe I’ll feel sad, but also maybe I’ll feel schadenfreude. I assume that both the “sad” reaction and the “schadenfreude” reaction are innate reactions. Somehow the genome has encoded into the brain a system for deciding whether “sad” or “schadenfreude” is the correct reaction in any given situation. I’m confused what that system is / how it’s built.
Thank you! I think these are all good/important points.
In regards to functional specialization between the hemispheres, I think whether this difference is at the same level as mid-insular cortex vs posterior insular cortex would depend on whether the hemispheric differences can account for certain lower-order distinctions of this sort or not. For example, let’s say that there are relevant functional differences between left ACC and right ACC, left vmPFC and right vmPFC, and left insular cortex and right insular cortex—and that these differences all have something in common (i.e., there is something characteristic about the kinds of computations that differentiate left-hemispheric ACC, vmPFC, insula from right-hemispheric ACC, vmPFC, insula). Then, you might have a case for the hemispheric difference being more fundamental or important than, say, the distinction between mid-insular cortex vs posterior insular cortex. But that’s only if these conditions hold (i.e., that there are functional differences and these differences have intra-hemispheric commonalities). I think there’s a good chance something like this might be true, but I obviously haven’t put forward an argument for this yet, so I don’t blame anyone for not taking my word for it!
I’m not fully grasping the autism/ToM/IRL point yet. My understanding of people on the autism spectrum is that they typically lack ordinary ToM, though I’m certainly not saying that I don’t believe the people you’ve spoken with; maybe only that they might be the exception rather than the rule (there are accounts that emphasize things others than ToM, though, to your point). If it is true that (1) autistic people use mechanisms other than ToM/IRL to understand people (i.e., modeling people like car engines), and (2) autistic people have social deficits, then I’m not yet seeing how this demonstrates that IRL is ‘at most’ just a piece of the puzzle. (FWIW, I would be surprised if IRL were the only piece of the puzzle; I’m just not yet grasping how this argument shows this.) I can tell I’m missing something.
And I agree with the sad vs. schadenfreude point. I think in an earlier exchange you made the point that this sort of thing could be conceivably modulated by in-group style dynamics. More specifically, I think that the extent to which I can look at a person, their situation, the outcome, etc., and notice (probably implicitly) that I could end up in a similar situation, it’s adaptive for me to “simulate” what it is probably like for them to be in this position so I can learn from their experience without having to go through the experience myself. As you note, there are exceptions to this—I think this is particularly when we are looking at people more as “objects” (i.e., complex external variables in our environments) than “subjects” (other agents with internal states, goals, etc. just like me). I think this is well-demonstrated by the following examples.
1, lion-as-subject: I go to the zoo and see a lion. “Ooh, aah! Super majestic.” Suddenly, a huge branch falls onto the lion, trapping it. It yelps loudly. I audibly wince, and I really hope the lion is okay. (Bonus subjects: other people around the enclosure also demonstrate they’re upset/disturbed by what just happened, which makes me even more upset/disturbed!)
2: lion-as-object: I go on a safari alone and my car breaks down, so I need to walk to the nearest station to get help. As I’m doing this, a lion starts stalking and chasing me. Oh crap. Suddenly, a huge branch falls onto the lion, trapping it. It yelps loudly. “Thank goodness. That was almost really bad.”
Very different reactions to the same narrow event. So I guess this kind of thing demonstrates to me that I’m inclined to make stronger claims about affective empathy in those situations where we’re looking at other agents in our environment as subjects, not objects. I think in eusocial creatures like humans, subject-perspective is probably far more common than object-perspective, though one could certainly come up with lots of examples of both. So definitely more to think about here, but I really like this kind of challenge to an overly-simplistic picture of affective empathy wherein someone else feeling way X automatically and context-independently makes me feel way X. This, to your point, just seems wrong.
The link says “high-functioning adults with ASD…can easily pass the false belief task when explicitly asked to”. So there you go! Perfectly good ToM, right?
The paper also says they “do not show spontaneous false belief attribution”. But if you look at Figure 3, they “fail” the test by looking equally at the incorrect window and correct window, not by looking disproportionately at the incorrect window. So I would suggest that the most likely explanation is not that the ASD adults are screwing up the ToM task, but rather that they’re taking no interest in the ToM task! Remember, the subjects were never asked to pay any attention to the person! Maybe they just didn’t! So I say this is a case of motivation, not capability. Maybe they were sitting there during the test, thinking to themselves “Gee, that’s a neat diorama, I wonder how the experimenters glued it together!” :-P That would also be consistent with the eye-tracking results mentioned in the book excerpt here. (I recall also a Temple Grandin anecdote (I can’t immediately find it) about getting fMRI’d, and she said she basically ignored the movie she was nominally supposed to be looking at, because she was so interested in some aspect of how the scientists had set up the experiment.) Anyway, the paper you link doesn’t report (AFAICT) what fraction of the time the subjects are looking at neither window—they effectively just throw those trials away I think—which to me seems like discarding the most interesting data!
I think you misunderstood me here. I’m suggesting that maybe:
ToM ≈ IRL ≈ building a good generative model that explains observations of humans
“understanding car engines” ≈ building a good generative model that explains observations of car engines.
I guess you’re assuming that a good generative model of a mind must contain special ingredients that a good generative model of a car engine does not need? I don’t currently think that. Well, more specifically, I think “the particular general-purpose toolkit that a human brain uses for building generative models” is sufficient for both modeling minds and modeling car engines. (I can imagine other generative-model-building toolkits that are not.) For example, the thought “Sally believes the sky is green” seems to me to be of similarly construction to the thought “The engine is not painted green, but if it were, the paint would quickly rub off and contaminate the engine fluid”. Both kinda involve an invoking and manipulation of a counterfactual world and relating it to the real world. I could be wrong, but anyway that’s what I meant.