I think this podcast has interesting discussion of self-awareness and selfhood: Jan Kulveit—Understanding agency
I feel like it needs to be expanded on a bit to be more complete. After listening to the episode, here’s my take that expands on their discussion. They discuss
Three components of self perception
Observation self—consistent localization of input device, ability to identify that input device within observations (e.g. the ‘mirror test’ for noticing one’s own body and noticing changes to it). - If a predictive model were trained on images from a camera which got carried around in the world, I would expect that the model would have some abstract concept of that camera as it’s ‘self’. That viewpoint represents a persistent and omnipresent factor in the data which it makes sense to model. A hand coming towards the camera to adjust it, suggests that the model should anticipate an adjustment in viewing angle.
Action self—persistent localization of effector device : ability to take actions in the world, and these actions originate from a particular source.
Oddly, LLMs currently are in a strange position with having their ‘observation->prediction->action’ loop completed in deployment, but only being in inference mode during this time and thus not able to learn from it. Their pre-training consists of simply ‘observation->prediction’ with no ability to act to influence future observations. I would expect that an LLM which got continually trained based on its interactions with the world would develop a sense of ‘action self’.
Valence self—valenced impact of events upon a particular object. For example, feeling pain or pleasure in the body. Correlation of events happening to an object and the associated feels of pain or pleasure being reported in the brain leads to a perception of the object as self.
I would expect that giving a model a special input channel for valence, and associating that valence input with things occurring to a simulated body during training would give the model a sense of ‘valence self’ even if the other aspects were lacking. That’s a weird separation to imagine for a human, but imagine that your body were completely numb and you never felt hungry or thirsty, and in your view there was a voodoo doll. Every time someone touched the voodoo doll, you felt that touch (pleasant or unpleasant). With enough experience of this situation, I expect this would give you a sense of ‘Valence self’ centered on the voodoo doll. Thus, I think this qualifies as a different sort of self-perception from the ‘perception self’. In this case, your ‘perception self’ would still be associated with your own eyes and ears, just your ‘valence self’ would be moved to the voodoo doll.
Note that these senses of self are tied to our bodies by nature of our physical existence, not by logical necessity. It is the data we are trained on that creates these results. We almost certainly also have biological priors which push us towards learning these things, but I don’t believe that those biological priors are necessary for the effects (just helpful). Consider the ways that these perceptions of self extend beyond ourselves in our life experiences. For example, the valenced self of a mother who deeply loves her infant will expand to include that infant. Anything bad happening to the infant will deeply affect her, just as if the bad thing happened to her. I would call that an expansion of the ‘valence self’. But she can’t control the infant’s limbs with just her mind, nor can she see through the infant’s eyes.
Consider a skilled backhoe operator. They can manipulate the arm of the backhoe with extreme precision, as if it were a part of their body. This I would consider an expansion of the ‘action self’.
Consider the biohacker who implants a magnet into their fingertip which vibrates in the presence of magnetic fields. This is in someway an expansion of the ‘perception self’ to include an additional sensory modality being delivered through an existing channel. The correlations in the data between that fingertip and touch sensations will remain, but a new correlation to magnetic field strength has been gained. This new correlation will separate itself and become distinct, it will carry distinctly different meanings about the world.
Consider the First Person View (FPV) drone pilot. Engaged in an intense race, they will be seeing from the drone’s point of view, their actions of controlling the joysticks will control the drone’s motions. Crashing the drone will upset them and cause them to lose the race. They have, temporarily at least, expanded all their senses of self to include the drone. These senses of self can therefore be learned to be modular, turned on or off at will. If we could voluntarily turn off a part of our body (no longer experiencing control over it or sensation from it), and had this experience of turning that body part on and off a lot in our life, we’d probably feel a more ‘optional’ attachment to that body part.
My current best guess for what consciousness is, is that it is an internal perception of self. By internal, I mean, within the mind. As in, you can have perception of your thoughts, control over your thoughts, and associate valence with your thoughts. Thus, you associate all these senses of self with your own internal cognitive processes. I think that giving an AI model consciousness would be as simple as adding these three aspects. So probably a model which has been trained only on web text does not have consciousness, but one which has been fine-tuned to perform chain-of-thought does have some rudimentary sense of consciousness. Note that prompting a model to perform chain-of-thought would be much less meaningful than actually fine-tuning it, since prompting doesn’t actually change the weights of the model.
I think this podcast has interesting discussion of self-awareness and selfhood: Jan Kulveit—Understanding agency
I feel like it needs to be expanded on a bit to be more complete. After listening to the episode, here’s my take that expands on their discussion. They discuss
Three components of self perception
Observation self—consistent localization of input device, ability to identify that input device within observations (e.g. the ‘mirror test’ for noticing one’s own body and noticing changes to it).
- If a predictive model were trained on images from a camera which got carried around in the world, I would expect that the model would have some abstract concept of that camera as it’s ‘self’. That viewpoint represents a persistent and omnipresent factor in the data which it makes sense to model. A hand coming towards the camera to adjust it, suggests that the model should anticipate an adjustment in viewing angle.
Action self—persistent localization of effector device : ability to take actions in the world, and these actions originate from a particular source.
Oddly, LLMs currently are in a strange position with having their ‘observation->prediction->action’ loop completed in deployment, but only being in inference mode during this time and thus not able to learn from it. Their pre-training consists of simply ‘observation->prediction’ with no ability to act to influence future observations. I would expect that an LLM which got continually trained based on its interactions with the world would develop a sense of ‘action self’.
Valence self—valenced impact of events upon a particular object. For example, feeling pain or pleasure in the body. Correlation of events happening to an object and the associated feels of pain or pleasure being reported in the brain leads to a perception of the object as self.
see: The Rubber Hand Illusion—Horizon: Is Seeing Believing? - BBC Two
I would expect that giving a model a special input channel for valence, and associating that valence input with things occurring to a simulated body during training would give the model a sense of ‘valence self’ even if the other aspects were lacking. That’s a weird separation to imagine for a human, but imagine that your body were completely numb and you never felt hungry or thirsty, and in your view there was a voodoo doll. Every time someone touched the voodoo doll, you felt that touch (pleasant or unpleasant). With enough experience of this situation, I expect this would give you a sense of ‘Valence self’ centered on the voodoo doll. Thus, I think this qualifies as a different sort of self-perception from the ‘perception self’. In this case, your ‘perception self’ would still be associated with your own eyes and ears, just your ‘valence self’ would be moved to the voodoo doll.
Note that these senses of self are tied to our bodies by nature of our physical existence, not by logical necessity. It is the data we are trained on that creates these results. We almost certainly also have biological priors which push us towards learning these things, but I don’t believe that those biological priors are necessary for the effects (just helpful). Consider the ways that these perceptions of self extend beyond ourselves in our life experiences. For example, the valenced self of a mother who deeply loves her infant will expand to include that infant. Anything bad happening to the infant will deeply affect her, just as if the bad thing happened to her. I would call that an expansion of the ‘valence self’. But she can’t control the infant’s limbs with just her mind, nor can she see through the infant’s eyes.
Consider a skilled backhoe operator. They can manipulate the arm of the backhoe with extreme precision, as if it were a part of their body. This I would consider an expansion of the ‘action self’.
Consider the biohacker who implants a magnet into their fingertip which vibrates in the presence of magnetic fields. This is in someway an expansion of the ‘perception self’ to include an additional sensory modality being delivered through an existing channel. The correlations in the data between that fingertip and touch sensations will remain, but a new correlation to magnetic field strength has been gained. This new correlation will separate itself and become distinct, it will carry distinctly different meanings about the world.
Consider the First Person View (FPV) drone pilot. Engaged in an intense race, they will be seeing from the drone’s point of view, their actions of controlling the joysticks will control the drone’s motions. Crashing the drone will upset them and cause them to lose the race. They have, temporarily at least, expanded all their senses of self to include the drone. These senses of self can therefore be learned to be modular, turned on or off at will. If we could voluntarily turn off a part of our body (no longer experiencing control over it or sensation from it), and had this experience of turning that body part on and off a lot in our life, we’d probably feel a more ‘optional’ attachment to that body part.
My current best guess for what consciousness is, is that it is an internal perception of self. By internal, I mean, within the mind. As in, you can have perception of your thoughts, control over your thoughts, and associate valence with your thoughts. Thus, you associate all these senses of self with your own internal cognitive processes. I think that giving an AI model consciousness would be as simple as adding these three aspects. So probably a model which has been trained only on web text does not have consciousness, but one which has been fine-tuned to perform chain-of-thought does have some rudimentary sense of consciousness. Note that prompting a model to perform chain-of-thought would be much less meaningful than actually fine-tuning it, since prompting doesn’t actually change the weights of the model.