Hmm. Maybe here’s an analogy. Suppose somebody said:
There’s a certain kind of interoceptive sensory input, consisting of such-and-such signal coming from blah type of thermoreceptor in the peripheral nervous system. Your brain does its usual thing of transforming that sensation into its own “color” of “metaphysical paint” (as in §3.3.2) that forms a concept / property in your conscious awareness and world-model, and you know it by the everyday term “cold”.
On the one hand, I would defend this passage as basically true. On the other hand, there’s clearly a lot of connotations and associations of the word “cold” that go way beyond the natural generalization of things that trigger this thermoreceptor. “Concepts are clusters in thingspace”, as the saying goes, and thus things that go along with coldness often enough kinda get roped in as a connation or aspect of the coldness concept itself. And then all those aspects of coldness can in turn get analogized into other domains, and now here we are talking about cold personalities and cold starts and cold cases and cold symptoms and the Cold War and on and on.
By the same token, I’m happy to defend a claim along the lines of “intrinsic unpredictability is the seed / core at the center of concepts like animation, vitality, agency, etc.”, but I acknowledge that intrinsic unpredictability in and of itself is not the entirety of those terms and their various connotations and associations.
By the same token, I’m happy to defend a claim along the lines of “intrinsic unpredictability is the seed / core at the center of concepts like animation, vitality, agency, etc.”
Well I don’t think that intrinsic unpredictability explains the sense of lifeforce or whatever.
(What seems possible is that something like hard-to-predict (and purposeful?) behavior triggers human minds to model an object as interfaced to an invisible mind/soul/spirit, and the way humans model such souls is particular in some way which explains the sense of lifeforce.)
I think when humans model other minds (which includes animals (and gods)) they start from a pre-built template (potentially from mirroring part of their own cognitive machinery) with properties goals/desires, emotions, memory and beliefs.
When your dog dies the appearance of lifeforce disappearing might’ve been caused by seeing the dead body being now very predictable, but the explanation isn’t that the sense of unpredictability went away, but rather something to do with that your whole model of the mind of your dog stopped having any predictive power. (I don’t know yet what exactly might cause the sense of life force.)
Tbc what I’m imagining when saying “intrinsic unpredictability” is a reductionist model of how some machinery in the mind works, sorta like the model that explains frequentist[1] intuitions that a coin has an inherent 50% probability to come up heads. (I do NOT mean that an “intrinsic unpredictability” tag gets attached to the object which then needs to get interpreted by some abstract-modelling machinery.)
As example for a reductionist explanation, consider the frequentist intuition that it is a fact about the world that a coin comes up heads with 50% probability. This can be explained by saying that such agents model the world as probabilistic environment with P(coin=heads)=50%. (As opposed to worlds as deterministic environments where the oucome of an experiment is fixed and then having probabilistic uncertainty about what world one is in.)
I don’t know precisely how to model “intrinsic unpredictability” yet, but if I’m looking for the part that explains why it seems unintuitive to us to think of ourselves (and others?) as deterministic, it could be that we model minds as “intrinsically probabilistic” just like in the coin case, or it might be a bit different like that a part predicts the model of the vitalistic object to get constantly updated as we observe it. (I previously didn’t think about it clearly and had a slightly different guess how it might be implemented but it wasn’t a full coherent picture that made sense.)
In case you do think that “intrinsic unpredictability” explains the sense of lifeforce, I think this is a mysterious answer.
Harry gasped for breath, “but what is going on? ”
“Magic,” said Professor McGonagall.
“That’s just a word! Even after you tell me that, I can’t make any new predictions! It’s exactly like saying ‘phlogiston’ or ‘elan vital’ or ‘emergence’ or ‘complexity’!”
(chapter 6, HPMoR) (To clarify: Even though it is magic, I think Harry is correct here that it’s not an explanation.)
Also, I think you’re aware of this, but nothing is inherently meaningful; meaning can only arise through how something is relative to something else. In the cold case (where I assume you talk about mental-physiological reactions to freezing/feeling-cold (as opposed to modelling the temperature of objects)), the meaning of “cold” comes from the cluster of sensations it refers to and how it affects considerations. If you just had the information “type-ABC (aka ‘cold’) sensors fired at position-XYZ”, the rest of the mind wouldn’t need to know what to do with that information on it’s own but it needs some circutry to relate the information to other events. So I wouldn’t say what you wrote explains cold, but maybe you didn’t think it did.
I might not be fair to frequentists and don’t really know their models. I just don’t know how else to easily call it because it seems some people like Eliezer might not have had such intuitions.
Hmm. I don’t think I’m invoking any mysterious answers. I think I’m suggesting a nuts-and-bolts model—a particular prediction about the behavior of a particular kind of algorithm given a particular type of input data. I’m trying to figure out why you disagree.
Like IMO it’s important to recognize that saying “inherent-surprisingness/vitalistic-force my mind paints on objects explains my sense of animals having life-force” is not actually a mechanistic hypothesis—I would not advance-predict a sense of life-force from thinking that minds project their continuous surprise about an object as a property on the object itself. Not sure whether you’re making this mistake though.
Again I think it’s a mechanistic hypothesis. Let me walk through it in more detail; see where you disagree:
Any concept or property in your conscious experience is a piece (latent variable or whatever) in a generative model built by a predictive (self-supervised) learning algorithm on sensory data.
Some of that sensory data is interoceptive, including things like sense of one’s own physiological arousal, temperature, confusion, valence (goodness / badness), physical attraction, etc.
The “mind projection fallacy” applies to these interoceptive sensations (§3.3.2). Why? Because the learning algorithm is finding generative models that predict sensory data, and mind-projection-fallacy generative models are simple and effective at predicting interoceptive sensory data. For example, whenever I look at the shirt, I reliably get white-derived visual sensations, therefore I wind up with a generative model that says that there’s a shirt in the world, and it’s white. Likewise, whenever I think about capitalism, I reliably get an interoceptive sensation of negative valence, therefore I wind up with a generative model that says that there’s a thing “capitalism” in the world, and that thing is “bad”.
Every interoceptive sensation spawns a mind-projection-fallacy conscious concept / property that applies to things in the outside world. And surprise is one such sensation. So a priori we strongly expect every adult human to feel like there’s a surprise-derived intuitive property of things in the world. (But I haven’t yet said which intuitive property it is.)
Meanwhile, in our everyday experience, we all have an intuitive sense of animation / agency. I think the word “vitalistic force” is a good way to point to this recognizable intuition.
And then my substantive claim is that the previous two bullet should be equated: the surprise-derived intuitive property in adult humans is the intuitive sense of animation / agency.
Alternatively, suppose we didn’t have our subjective experience, but were told that there exists predictive learning algorithms blah blah as in Post 1. We should predict that these algorithms will build generative models containing a surprise-derived property of things in the world. And then we could look around the “training environment” (human world), try to figure out what would generate surprise (things that are both unpredictable an un-ignorable), and we’d predict that this intuitive property would get painted first and foremost onto things that are alive, but also onto cartoon characters and so on, and also to certain self-reflective things (i.e., aspects of the brain algorithm itself). When we do this kind of analysis well, we’ll wind up describing every aspect of our actual everyday intuitions around animation / agency / alive-ness, and predicting all the items in §3.3. But we’d be doing all that purely from first-principles reasoning about algorithms and biology. And then that “prediction” would be “tested” by noticing that humans have exactly those intuitions. As it happens, it’s not really a “prediction” because we already know what intuitions are typical in human adults. But nevertheless I think the reasoning is sound and tight and locally-valid, not just special pleading because we already know the answer. See what I mean?
I think when humans model other minds (which includes animals (and gods)) they start from a pre-built template (potentially from mirroring part of their own cognitive machinery) with properties goals/desires, emotions, memory and beliefs.
I think that when an average person sees a cockroach running across the floor, they think of it as having goals but probably not emotions or memories or beliefs. As a scientific matter, cockroaches do have memories, but I think at least some people feel kinda surprised and impressed when they see a cockroach doing something that demonstrates memory, which suggests that their intuitive model did not already include cockroach memory. But everyone thinks of the cockroach as being alive / animate, and also, nobody would be surprised or impressed to see a cockroach demonstrate “wanting” / goal-seeking by going around a trivial barrier to get into a hiding place.
That goes well with my theory that “vitalistic force” (derived from surprise) and “wanting” (derived from a pattern where I can make medium-term predictions despite short-term surprise) are two widely-used core intuitions in our generative model space, which strongly tend to go together. And then other aspects of modeling minds are optional add-ons. (Just like “has frost on it” is an optional add-on to an object being “cold”.)
Also, I think you’re aware of this, but nothing is inherently meaningful; meaning can only arise through how something is relative to something else. In the cold case (where I assume you talk about mental-physiological reactions to freezing/feeling-cold (as opposed to modelling the temperature of objects)), the meaning of “cold” comes from the cluster of sensations it refers to and how it affects considerations. If you just had the information “type-ABC (aka ‘cold’) sensors fired at position-XYZ”, the rest of the mind wouldn’t need to know what to do with that information on it’s own but it needs some circutry to relate the information to other events. So I wouldn’t say what you wrote explains cold, but maybe you didn’t think it did.
My claim is: there’s a predictive learning algorithm that sculpts generative models that can explain incoming sensory data. (See Post 1.) When I look at a clock, the sensory data involves retinal cells firing, while the generative model involves the concept “clock” (among other things).
The concept “cold”, like “clock”, is a concept in our intuitive models. This is “meaningful” in the same way any other intuitive concept is meaningful. It fits into our web-of-knowledge / world-model / “map” / generative model space, it has relations to other concepts, it helps make sense of the world, etc.
If an adult has a concept in their intuitive models, then that concept must be doing some work: it must be directly or indirectly helping to predict some kind of sensory input data. Otherwise it would not be in the generative models in the first place—that’s how the predictive learning algorithm works. For example, the concept “clock” is doing lots of work in different contexts, including helping explain visual input data when I happen to be looking at a clock. Thus we can ask by analogy: what’s the concept “cold” doing? The obvious answer is: the concept “cold” is mainly helping explain sensory input data involving the signals coming from blah blah type of thermoreceptor in the peripheral nervous system.
The point I was making before was that the concept “cold” starts from that important role. But by adulthood it winds up being invoked by analogy in things like “cold comfort”, and getting all these other connotations that are not superficially related to predicting the sensory signals coming from blah blah type of thermoreceptor. …But nevertheless, I think it’s fair to say that the central role of the “cold” concept, even in adults, is to enable generative models to correctly predict (many of) the signals coming from blah blah type of thermoreceptor.
And in a similar way, I’m claiming that the central role of the intuitive “vitalistic force” / “animation” concept is to enable generative models to correctly predict many of the sensory signals coming from the interoceptive sensation of surprise. (But it’s still true that this concept winds up with other connotations and extensions-by-analogy too.)
Does that help? Thanks for patient engagement and feedback.
I agree that memory and beliefs are in some sense optional addons. I don’t understand precisely enough yet how we model animals.
On your section on cold:
First, I’m still not sure in what way you’re using “cold” of the two interpretations I indicated here: ”(where I assume you talk about mental-physiological reactions to freezing/feeling-cold (as opposed to modelling the temperature of objects))”.
But in either case I mostly just mean that having a full reductionist explanation of e.g. cold is an extremely high standard that ought to fulfill the following criteria:
You can replace the word “cold” and other related abstract words with some other token-sequences/made-up-words, and someone who had a sufficiently good understanding would still be able to figure out that the new made-up-word corresponds to the concept we call “cold”.
(Where I don’t think your explanation had something in it where you couldn’t just replace “cold” with “heat” or “redness” (except redness wouldn’t work if we allow “thermoreceptor” but I’d also want to rename this to “receptor-type-abc”.)
You can sorta write code for a relevant part of what’s happening in the mind when e.g. the freezing emotion/sensation is triggered.
(Like you would not need to describe a fully conscious program, but the function that triggers how muscles contract and the sensation of wanting to curl up and the skin shivering and causes a negative hedonic tone as well as instantiating a subgoal of getting thermoreceptors to report higher temperature or sth. Like I’d count this description as a weak reductionist hypothesis (which makes progress on unpacking the “cold” concept but where there are more levels of unpacking to do), though it might be very incomplete and partially wrong.)
Like I’m not sure we disagree much here. I think everything you said is correct, but I feel like emphasizing that there are still more layers of understanding that need to get unpacked and that saying “it’s a concept that’s useful to predict sensory data” still leaves up open questions of what exactly the information is the concept has the ability to communicate or of how the concept relates to other concepts.
Hmm, I still might not be following, but I’ll write something anyway. :)
Take some “concept” in your world-model, operationalized as a particular cluster C of neurons in some part of your cortex that tend to activate together.
How might we figure out what what C “means”?
One part of the answer is entirely within the cortex world-model: C has particular relationships to other things in the cortex world-model, which in term have relationships to still other things etc. Clusters of neurons related to “bird” have some connection to clusters of neurons related to “flying”. That by itself might already be enough to pin down the “meanings” of different things, just because there’s so much structure there, and we can try to match it up with structures in the world, by analogy with unsupervised machine translation. But if not…
The other part of the answer is about how the cortex world-model relates to the real world. Maybe C directly predicts some particular pattern in low-level sensory inputs. Maybe C directly activates some particular pattern in motor output. Or maybe the connection is less direct—a certain abstract pattern in the space of abstract patterns in the space of abstract patterns in the space of low-level sensory inputs, or whatever. If we look at naturalistic visual inputs that directly or indirectly trigger C, and they’re disproportionately pictures of clocks, then that’s some evidence that C “means” clock.
So, how about “cold”? Our body has a couple relevant sensors: peripheral nerves that express TRPM8 (“cold and menthol receptor 1”), hypothalamus neurons that detect blood temperature via TRPV1, etc. (I’m not an expert on the details.) As usual, these sensory signals are processed in two areas in parallel. In the hypothalamus & brainstem (“Steering Subsystem”), they trigger innate reactions like shivering, unpleasant feelings / desire to warm up, and so on. And in the cortex, they’re treated as just so many more channels of unlabeled input data that the world-model needs to predict.
In the course of predicting them well, the world-model invents some slightly-higher-level concept (or family of closely-interlinked concepts) that we call “cold”. And it notices and memorizes predictively-useful relationships between this new “cold” concept and other things in the world-model, e.g. shivering and ice.
I don’t think there’s more to the concept “cold” than the sum total of its associations with every other concept, with sensory input, and with motor output. And we can explain those latter associations via the structure of the world and body in conjunction with a learning algorithm running throughout your life experience.
You can sorta write code for a relevant part of what’s happening in the mind when e.g. the freezing emotion/sensation is triggered.
I like to draw the distinction between understanding learning algorithms and understanding trained models. The former is kinda like what you learn in an ML course (gradient descent, training data, etc.) , the latter is kinda like what you learn in a mechanistic interpretability paper. I don’t think it’s realistic to “write code” for the “cold” concept, because I think it (like all concepts) emerges at the trained model level. It emerges from a learning algorithm, training environment, loss function, etc.
Of course, we can chat about the trained model level to some extent. Why is “cold” associated with shivering? Because in the training environment of life experience, those two things have tended to go together, such that each provides nonzero Bayesian evidence that the other should be active, or will be soon. Ditto with the connection between cold and ice cream, and everything else. So we can chat about it, but it would take forever to directly write code for all those things. Hence the learning algorithm. Does that help?
In the course of predicting them well, the world-model invents some slightly-higher-level concept (or family of closely-interlinked concepts) that we call “cold”. And it notices and memorizes predictively-useful relationships between this new “cold” concept and other things in the world-model, e.g. shivering and ice.
I don’t think there’s more to the concept “cold” than the sum total of its associations with every other concept, with sensory input, and with motor output.
I also basically agree with:
I like to draw the distinction between understanding learning algorithms and understanding trained models. The former is kinda like what you learn in an ML course (gradient descent, training data, etc.) , the latter is kinda like what you learn in a mechanistic interpretability paper. I don’t think it’s realistic to “write code” for the “cold” concept, because I think it (like all concepts) emerges at the trained model level. It emerges from a learning algorithm, training environment, loss function, etc.
I agree that fully writing code would be quite a daunting task. I think my phrasing of “write code” was not great. But it’s already some reductionist progress if you have something like:
if coldness concept gets more activated: increase activation of shivering anticipation; weakly increase activation of snow concept; ...
I don’t think it’s a worthwhile exercise to get very precise.
An important point I wanted to make here is just that the meaning of “cold” comes from the interactions with other concepts, and there’s no such thing as an inherent independent meaning of the word “cold”. (So when I hear ‘If we look at naturalistic visual inputs that directly or indirectly trigger C, and they’re disproportionately pictures of clocks, then that’s some evidence that C “means” clock.’ this seems a bit off to me, though not too bad.)
I guess I best try to explain why I felt some unease with your initial description of the cold example:
Suppose somebody said:
There’s a certain kind of interoceptive sensory input, consisting of such-and-such signal coming from blah type of thermoreceptor in the peripheral nervous system. Your brain does its usual thing of transforming that sensation into its own “color” of “metaphysical paint” (as in §3.3.2) that forms a concept / property in your conscious awareness and world-model, and you know it by the everyday term “cold”.
On the one hand, I would defend this passage as basically true.
Basically I think that some people—though a priory not you—would think that sth like “i feel cold because the cold-thermorecepters activate the corresponding cold concept” explains their sense of cold. However, if you just take this hypothesis which basically is “some sensors activate some concept” without anything else, then the concept would be completely shapeless and uninterpretable—unrelated to anything known.
I now think you probably didn’t mean it in a nearly that bad way but not sure.
(But some parts of what you write seem to me like you have slightly weaker sensors about “how does a hypothesis actually constrain my anticipations / concentrate probability mass” or “what would this hypothesis predict if I didn’t already know how I perceive it”, and I do think those sensors are useful.)
(I also think that there is some hypothalamus-or-so buisness logic for what responses to trigger (e.g. shivers) from significant cold input signals that would need to be figured out if you want to get a good model of freezing/feeling-uncomfortably-cold, but that’s about freezing in particular and not temperature as a property we model on objects.)
Thanks for being so wonderfully precise to make it easy for me to reply!
The part where you loose me is here:
Meanwhile, in our everyday experience, we all have an intuitive sense of animation / agency.
Where does this sense of agency come from? Likewise:
When we do this kind of analysis well, we’ll wind up describing every aspect of our actual everyday intuitions around animation / agency / alive-ness, and predicting all the items in §3.3.
How do we get from something seeming inherently surprising to something seeming agentic or embued with life-force?
EDITED TO ADD: Tbc I think you can explain agency (though not life-force, and you need to be carefuly to only interpret agency in this limited sense) through being able to predict outcomes without trajectories (as you also seem to have realized, as in “(derived from a pattern where I can make medium-term predictions despite short-term surprise)”). I wouldn’t equate agency with inherent surprisingness though, although it often occurs together.
Yeah, I think the §3.3.1 pattern (intrinsic surprisingness) is narrower than the §3.3.4 pattern (intrinsic surprisingness but with an ability to make medium-term predictions).
But they tend to go together so much in practice (life experience) that when we see the former we generally kinda assume the latter. An exception might be, umm, a person spasming, or having a seizure? Or a drunkard wandering about randomly? Hmm, maybe those don’t count because there are still some desires, e.g. the drunkard wants to remain standing.
I agree that agency / life-force has a strong connotation of the §3.3.4 thing, not just the §3.3.1 thing. Or at least, it seems to have that connotation in my own intuitions. ¯\_(ツ)_/¯
I feel like life-force seems like a sensation that’s different from what I’d expect from just having a thing in the world model with inherent surprisingness and ends-without-trajectory-predictions/”optimizerness” attached. (“Life-force” sounds more like “as if the thing had a soul” to me. I do not understand where this comes from but I don’t see how I’d predict such a sensation in advance given just the inherent-surprisingness + optimizerness hypothesis.)
Hmm. Maybe here’s an analogy. Suppose somebody said:
On the one hand, I would defend this passage as basically true. On the other hand, there’s clearly a lot of connotations and associations of the word “cold” that go way beyond the natural generalization of things that trigger this thermoreceptor. “Concepts are clusters in thingspace”, as the saying goes, and thus things that go along with coldness often enough kinda get roped in as a connation or aspect of the coldness concept itself. And then all those aspects of coldness can in turn get analogized into other domains, and now here we are talking about cold personalities and cold starts and cold cases and cold symptoms and the Cold War and on and on.
By the same token, I’m happy to defend a claim along the lines of “intrinsic unpredictability is the seed / core at the center of concepts like animation, vitality, agency, etc.”, but I acknowledge that intrinsic unpredictability in and of itself is not the entirety of those terms and their various connotations and associations.
(This is a helpful discussion for me, thanks.)
Well I don’t think that intrinsic unpredictability explains the sense of lifeforce or whatever.
(What seems possible is that something like hard-to-predict (and purposeful?) behavior triggers human minds to model an object as interfaced to an invisible mind/soul/spirit, and the way humans model such souls is particular in some way which explains the sense of lifeforce.)
I think when humans model other minds (which includes animals (and gods)) they start from a pre-built template (potentially from mirroring part of their own cognitive machinery) with properties goals/desires, emotions, memory and beliefs.
When your dog dies the appearance of lifeforce disappearing might’ve been caused by seeing the dead body being now very predictable, but the explanation isn’t that the sense of unpredictability went away, but rather something to do with that your whole model of the mind of your dog stopped having any predictive power. (I don’t know yet what exactly might cause the sense of life force.)
Tbc what I’m imagining when saying “intrinsic unpredictability” is a reductionist model of how some machinery in the mind works, sorta like the model that explains frequentist[1] intuitions that a coin has an inherent 50% probability to come up heads. (I do NOT mean that an “intrinsic unpredictability” tag gets attached to the object which then needs to get interpreted by some abstract-modelling machinery.)
As example for a reductionist explanation, consider the frequentist intuition that it is a fact about the world that a coin comes up heads with 50% probability. This can be explained by saying that such agents model the world as probabilistic environment with P(coin=heads)=50%. (As opposed to worlds as deterministic environments where the oucome of an experiment is fixed and then having probabilistic uncertainty about what world one is in.)
I don’t know precisely how to model “intrinsic unpredictability” yet, but if I’m looking for the part that explains why it seems unintuitive to us to think of ourselves (and others?) as deterministic, it could be that we model minds as “intrinsically probabilistic” just like in the coin case, or it might be a bit different like that a part predicts the model of the vitalistic object to get constantly updated as we observe it. (I previously didn’t think about it clearly and had a slightly different guess how it might be implemented but it wasn’t a full coherent picture that made sense.)
In case you do think that “intrinsic unpredictability” explains the sense of lifeforce, I think this is a mysterious answer.
(chapter 6, HPMoR)
(To clarify: Even though it is magic, I think Harry is correct here that it’s not an explanation.)
Also, I think you’re aware of this, but nothing is inherently meaningful; meaning can only arise through how something is relative to something else. In the cold case (where I assume you talk about mental-physiological reactions to freezing/feeling-cold (as opposed to modelling the temperature of objects)), the meaning of “cold” comes from the cluster of sensations it refers to and how it affects considerations. If you just had the information “type-ABC (aka ‘cold’) sensors fired at position-XYZ”, the rest of the mind wouldn’t need to know what to do with that information on it’s own but it needs some circutry to relate the information to other events. So I wouldn’t say what you wrote explains cold, but maybe you didn’t think it did.
I might not be fair to frequentists and don’t really know their models. I just don’t know how else to easily call it because it seems some people like Eliezer might not have had such intuitions.
Hmm. I don’t think I’m invoking any mysterious answers. I think I’m suggesting a nuts-and-bolts model—a particular prediction about the behavior of a particular kind of algorithm given a particular type of input data. I’m trying to figure out why you disagree.
Again I think it’s a mechanistic hypothesis. Let me walk through it in more detail; see where you disagree:
Any concept or property in your conscious experience is a piece (latent variable or whatever) in a generative model built by a predictive (self-supervised) learning algorithm on sensory data.
Some of that sensory data is interoceptive, including things like sense of one’s own physiological arousal, temperature, confusion, valence (goodness / badness), physical attraction, etc.
The “mind projection fallacy” applies to these interoceptive sensations (§3.3.2). Why? Because the learning algorithm is finding generative models that predict sensory data, and mind-projection-fallacy generative models are simple and effective at predicting interoceptive sensory data. For example, whenever I look at the shirt, I reliably get white-derived visual sensations, therefore I wind up with a generative model that says that there’s a shirt in the world, and it’s white. Likewise, whenever I think about capitalism, I reliably get an interoceptive sensation of negative valence, therefore I wind up with a generative model that says that there’s a thing “capitalism” in the world, and that thing is “bad”.
Every interoceptive sensation spawns a mind-projection-fallacy conscious concept / property that applies to things in the outside world. And surprise is one such sensation. So a priori we strongly expect every adult human to feel like there’s a surprise-derived intuitive property of things in the world. (But I haven’t yet said which intuitive property it is.)
Meanwhile, in our everyday experience, we all have an intuitive sense of animation / agency. I think the word “vitalistic force” is a good way to point to this recognizable intuition.
And then my substantive claim is that the previous two bullet should be equated: the surprise-derived intuitive property in adult humans is the intuitive sense of animation / agency.
Alternatively, suppose we didn’t have our subjective experience, but were told that there exists predictive learning algorithms blah blah as in Post 1. We should predict that these algorithms will build generative models containing a surprise-derived property of things in the world. And then we could look around the “training environment” (human world), try to figure out what would generate surprise (things that are both unpredictable an un-ignorable), and we’d predict that this intuitive property would get painted first and foremost onto things that are alive, but also onto cartoon characters and so on, and also to certain self-reflective things (i.e., aspects of the brain algorithm itself). When we do this kind of analysis well, we’ll wind up describing every aspect of our actual everyday intuitions around animation / agency / alive-ness, and predicting all the items in §3.3. But we’d be doing all that purely from first-principles reasoning about algorithms and biology. And then that “prediction” would be “tested” by noticing that humans have exactly those intuitions. As it happens, it’s not really a “prediction” because we already know what intuitions are typical in human adults. But nevertheless I think the reasoning is sound and tight and locally-valid, not just special pleading because we already know the answer. See what I mean?
I think that when an average person sees a cockroach running across the floor, they think of it as having goals but probably not emotions or memories or beliefs. As a scientific matter, cockroaches do have memories, but I think at least some people feel kinda surprised and impressed when they see a cockroach doing something that demonstrates memory, which suggests that their intuitive model did not already include cockroach memory. But everyone thinks of the cockroach as being alive / animate, and also, nobody would be surprised or impressed to see a cockroach demonstrate “wanting” / goal-seeking by going around a trivial barrier to get into a hiding place.
That goes well with my theory that “vitalistic force” (derived from surprise) and “wanting” (derived from a pattern where I can make medium-term predictions despite short-term surprise) are two widely-used core intuitions in our generative model space, which strongly tend to go together. And then other aspects of modeling minds are optional add-ons. (Just like “has frost on it” is an optional add-on to an object being “cold”.)
My claim is: there’s a predictive learning algorithm that sculpts generative models that can explain incoming sensory data. (See Post 1.) When I look at a clock, the sensory data involves retinal cells firing, while the generative model involves the concept “clock” (among other things).
The concept “cold”, like “clock”, is a concept in our intuitive models. This is “meaningful” in the same way any other intuitive concept is meaningful. It fits into our web-of-knowledge / world-model / “map” / generative model space, it has relations to other concepts, it helps make sense of the world, etc.
If an adult has a concept in their intuitive models, then that concept must be doing some work: it must be directly or indirectly helping to predict some kind of sensory input data. Otherwise it would not be in the generative models in the first place—that’s how the predictive learning algorithm works. For example, the concept “clock” is doing lots of work in different contexts, including helping explain visual input data when I happen to be looking at a clock. Thus we can ask by analogy: what’s the concept “cold” doing? The obvious answer is: the concept “cold” is mainly helping explain sensory input data involving the signals coming from blah blah type of thermoreceptor in the peripheral nervous system.
The point I was making before was that the concept “cold” starts from that important role. But by adulthood it winds up being invoked by analogy in things like “cold comfort”, and getting all these other connotations that are not superficially related to predicting the sensory signals coming from blah blah type of thermoreceptor. …But nevertheless, I think it’s fair to say that the central role of the “cold” concept, even in adults, is to enable generative models to correctly predict (many of) the signals coming from blah blah type of thermoreceptor.
And in a similar way, I’m claiming that the central role of the intuitive “vitalistic force” / “animation” concept is to enable generative models to correctly predict many of the sensory signals coming from the interoceptive sensation of surprise. (But it’s still true that this concept winds up with other connotations and extensions-by-analogy too.)
Does that help? Thanks for patient engagement and feedback.
I agree that memory and beliefs are in some sense optional addons. I don’t understand precisely enough yet how we model animals.
On your section on cold:
First, I’m still not sure in what way you’re using “cold” of the two interpretations I indicated here: ”(where I assume you talk about mental-physiological reactions to freezing/feeling-cold (as opposed to modelling the temperature of objects))”.
But in either case I mostly just mean that having a full reductionist explanation of e.g. cold is an extremely high standard that ought to fulfill the following criteria:
You can replace the word “cold” and other related abstract words with some other token-sequences/made-up-words, and someone who had a sufficiently good understanding would still be able to figure out that the new made-up-word corresponds to the concept we call “cold”.
(Where I don’t think your explanation had something in it where you couldn’t just replace “cold” with “heat” or “redness” (except redness wouldn’t work if we allow “thermoreceptor” but I’d also want to rename this to “receptor-type-abc”.)
You can sorta write code for a relevant part of what’s happening in the mind when e.g. the freezing emotion/sensation is triggered.
(Like you would not need to describe a fully conscious program, but the function that triggers how muscles contract and the sensation of wanting to curl up and the skin shivering and causes a negative hedonic tone as well as instantiating a subgoal of getting thermoreceptors to report higher temperature or sth. Like I’d count this description as a weak reductionist hypothesis (which makes progress on unpacking the “cold” concept but where there are more levels of unpacking to do), though it might be very incomplete and partially wrong.)
Like I’m not sure we disagree much here. I think everything you said is correct, but I feel like emphasizing that there are still more layers of understanding that need to get unpacked and that saying “it’s a concept that’s useful to predict sensory data” still leaves up open questions of what exactly the information is the concept has the ability to communicate or of how the concept relates to other concepts.
Hmm, I still might not be following, but I’ll write something anyway. :)
Take some “concept” in your world-model, operationalized as a particular cluster C of neurons in some part of your cortex that tend to activate together.
How might we figure out what what C “means”?
One part of the answer is entirely within the cortex world-model: C has particular relationships to other things in the cortex world-model, which in term have relationships to still other things etc. Clusters of neurons related to “bird” have some connection to clusters of neurons related to “flying”. That by itself might already be enough to pin down the “meanings” of different things, just because there’s so much structure there, and we can try to match it up with structures in the world, by analogy with unsupervised machine translation. But if not…
The other part of the answer is about how the cortex world-model relates to the real world. Maybe C directly predicts some particular pattern in low-level sensory inputs. Maybe C directly activates some particular pattern in motor output. Or maybe the connection is less direct—a certain abstract pattern in the space of abstract patterns in the space of abstract patterns in the space of low-level sensory inputs, or whatever. If we look at naturalistic visual inputs that directly or indirectly trigger C, and they’re disproportionately pictures of clocks, then that’s some evidence that C “means” clock.
So, how about “cold”? Our body has a couple relevant sensors: peripheral nerves that express TRPM8 (“cold and menthol receptor 1”), hypothalamus neurons that detect blood temperature via TRPV1, etc. (I’m not an expert on the details.) As usual, these sensory signals are processed in two areas in parallel. In the hypothalamus & brainstem (“Steering Subsystem”), they trigger innate reactions like shivering, unpleasant feelings / desire to warm up, and so on. And in the cortex, they’re treated as just so many more channels of unlabeled input data that the world-model needs to predict.
In the course of predicting them well, the world-model invents some slightly-higher-level concept (or family of closely-interlinked concepts) that we call “cold”. And it notices and memorizes predictively-useful relationships between this new “cold” concept and other things in the world-model, e.g. shivering and ice.
I don’t think there’s more to the concept “cold” than the sum total of its associations with every other concept, with sensory input, and with motor output. And we can explain those latter associations via the structure of the world and body in conjunction with a learning algorithm running throughout your life experience.
I like to draw the distinction between understanding learning algorithms and understanding trained models. The former is kinda like what you learn in an ML course (gradient descent, training data, etc.) , the latter is kinda like what you learn in a mechanistic interpretability paper. I don’t think it’s realistic to “write code” for the “cold” concept, because I think it (like all concepts) emerges at the trained model level. It emerges from a learning algorithm, training environment, loss function, etc.
Of course, we can chat about the trained model level to some extent. Why is “cold” associated with shivering? Because in the training environment of life experience, those two things have tended to go together, such that each provides nonzero Bayesian evidence that the other should be active, or will be soon. Ditto with the connection between cold and ice cream, and everything else. So we can chat about it, but it would take forever to directly write code for all those things. Hence the learning algorithm. Does that help?
Thanks for communicating your model well again!
I think we might mostly agree, but let’s clarify.
I agree with all of:
I also basically agree with:
I agree that fully writing code would be quite a daunting task. I think my phrasing of “write code” was not great. But it’s already some reductionist progress if you have something like:
I don’t think it’s a worthwhile exercise to get very precise.
An important point I wanted to make here is just that the meaning of “cold” comes from the interactions with other concepts, and there’s no such thing as an inherent independent meaning of the word “cold”. (So when I hear ‘If we look at naturalistic visual inputs that directly or indirectly trigger C, and they’re disproportionately pictures of clocks, then that’s some evidence that C “means” clock.’ this seems a bit off to me, though not too bad.)
I guess I best try to explain why I felt some unease with your initial description of the cold example:
Basically I think that some people—though a priory not you—would think that sth like “i feel cold because the cold-thermorecepters activate the corresponding cold concept” explains their sense of cold. However, if you just take this hypothesis which basically is “some sensors activate some concept” without anything else, then the concept would be completely shapeless and uninterpretable—unrelated to anything known.
I now think you probably didn’t mean it in a nearly that bad way but not sure.
(But some parts of what you write seem to me like you have slightly weaker sensors about “how does a hypothesis actually constrain my anticipations / concentrate probability mass” or “what would this hypothesis predict if I didn’t already know how I perceive it”, and I do think those sensors are useful.)
(I also think that there is some hypothalamus-or-so buisness logic for what responses to trigger (e.g. shivers) from significant cold input signals that would need to be figured out if you want to get a good model of freezing/feeling-uncomfortably-cold, but that’s about freezing in particular and not temperature as a property we model on objects.)
Thanks for being so wonderfully precise to make it easy for me to reply!
The part where you loose me is here:
Where does this sense of agency come from? Likewise:
How do we get from something seeming inherently surprising to something seeming agentic or embued with life-force?
EDITED TO ADD: Tbc I think you can explain agency (though not life-force, and you need to be carefuly to only interpret agency in this limited sense) through being able to predict outcomes without trajectories (as you also seem to have realized, as in “(derived from a pattern where I can make medium-term predictions despite short-term surprise)”). I wouldn’t equate agency with inherent surprisingness though, although it often occurs together.
Yeah, I think the §3.3.1 pattern (intrinsic surprisingness) is narrower than the §3.3.4 pattern (intrinsic surprisingness but with an ability to make medium-term predictions).
But they tend to go together so much in practice (life experience) that when we see the former we generally kinda assume the latter. An exception might be, umm, a person spasming, or having a seizure? Or a drunkard wandering about randomly? Hmm, maybe those don’t count because there are still some desires, e.g. the drunkard wants to remain standing.
I agree that agency / life-force has a strong connotation of the §3.3.4 thing, not just the §3.3.1 thing. Or at least, it seems to have that connotation in my own intuitions. ¯\_(ツ)_/¯
I feel like life-force seems like a sensation that’s different from what I’d expect from just having a thing in the world model with inherent surprisingness and ends-without-trajectory-predictions/”optimizerness” attached. (“Life-force” sounds more like “as if the thing had a soul” to me. I do not understand where this comes from but I don’t see how I’d predict such a sensation in advance given just the inherent-surprisingness + optimizerness hypothesis.)