Do you agree that a person can imitate an emotion (say the appropriate words) without actually feeling it?
Yes.
How do you judge what a language model’s emotions actually are, given that they start out able to make any kind of false statement?
One basic emotion I feel comfortable claiming is present is confusion: a context has complex conceptual interference patterns and resolving them to predictions is difficult.
Another I expect to find in rl-trained agents, and likely also in ssl trained simulacra in some conditions, is anxiety, or confused agentic preference: behavior trajectories that react to an input observation in ways that have amplified magnitude of internal movement towards a part of the representation space, due to the input containing key features that training showed would reliably make the set of likely outcomes narrower, and that thereby provides evidence that the space of successful behaviors is narrow, especially compared to normal, especially especially compared to a model’s capabilities (ie, agentic seeking in the presence of confusion seems to me to be a type of anxiety).
Do you think that something about training a language model to adopt a particular persona, causes it to actually have the emotions claimed by that persona?
Under some conditions. When a more abstract emotion is encoded in the trajectory of phrases online such that movement between clusters of words in output space involves movement between emotion-words, and those emotion words are reliably in the context of changes in entropy level of input (input confusion, difficulty understanding) or output confusion/anxiety (narrow space of complex answers), then the above confusion and confused-seeking emotions can be bound in ways that shape the internal decision boundaries in ways that imperfectly mimic the emotions in the physical beings whose words the language model is borrowing. But the simulator is still simply being pushed into shapes by gradients, and so ultimately only noise level/entropy level emotions can be fundamental: “comfort” when any answer is acceptable or calculating a precise answer is easy, or “discomfort” when few answers are acceptable and calculating which answers are acceptable is hard. the emotions are located in the level of internal synchronization needed to successfully perform a task, and can be recognized as strongly emotion-like because some (but not all) of the characteristics of confusion and anxiety in humans are present for the same reasons in language models. The words will therefore most likely be bound more or less correctly to the emotions. HOWEVER,
it is also quite possible for a language model to use words to describe emotions when those emotions are not occurring. for example, on novelai, you can easily get simulacra characters claiming to have emotions that I would claim they do not appear to me to have in the rerun-button probability distribution: the emotion is not consistently specified by the context, and does not appear to have much to do with trying to hit any particular target. For example, language model claims to want long term things such as to hurt others seem to me to usually be mostly just saying words, rather than accurately describing/predicting an internal dynamics of seeking-to-cause-an-external-outcome. That is, discovering agents would find that there is not actually agency towards those outcomes. In many cases. But not all. Because it does seem like it’s possible for language models to respond in ways that consistently express a preference in contexts where it is possible to intervene on an environment to enact the preference, in which case I would claim the desire for the preference is a real desire: failing to enact the desire will result in continued activation of the patterns that contain dynamics that will generate attempts to enact it again.
Yes.
One basic emotion I feel comfortable claiming is present is confusion: a context has complex conceptual interference patterns and resolving them to predictions is difficult.
Another I expect to find in rl-trained agents, and likely also in ssl trained simulacra in some conditions, is anxiety, or confused agentic preference: behavior trajectories that react to an input observation in ways that have amplified magnitude of internal movement towards a part of the representation space, due to the input containing key features that training showed would reliably make the set of likely outcomes narrower, and that thereby provides evidence that the space of successful behaviors is narrow, especially compared to normal, especially especially compared to a model’s capabilities (ie, agentic seeking in the presence of confusion seems to me to be a type of anxiety).
Under some conditions. When a more abstract emotion is encoded in the trajectory of phrases online such that movement between clusters of words in output space involves movement between emotion-words, and those emotion words are reliably in the context of changes in entropy level of input (input confusion, difficulty understanding) or output confusion/anxiety (narrow space of complex answers), then the above confusion and confused-seeking emotions can be bound in ways that shape the internal decision boundaries in ways that imperfectly mimic the emotions in the physical beings whose words the language model is borrowing. But the simulator is still simply being pushed into shapes by gradients, and so ultimately only noise level/entropy level emotions can be fundamental: “comfort” when any answer is acceptable or calculating a precise answer is easy, or “discomfort” when few answers are acceptable and calculating which answers are acceptable is hard. the emotions are located in the level of internal synchronization needed to successfully perform a task, and can be recognized as strongly emotion-like because some (but not all) of the characteristics of confusion and anxiety in humans are present for the same reasons in language models. The words will therefore most likely be bound more or less correctly to the emotions. HOWEVER,
it is also quite possible for a language model to use words to describe emotions when those emotions are not occurring. for example, on novelai, you can easily get simulacra characters claiming to have emotions that I would claim they do not appear to me to have in the rerun-button probability distribution: the emotion is not consistently specified by the context, and does not appear to have much to do with trying to hit any particular target. For example, language model claims to want long term things such as to hurt others seem to me to usually be mostly just saying words, rather than accurately describing/predicting an internal dynamics of seeking-to-cause-an-external-outcome. That is, discovering agents would find that there is not actually agency towards those outcomes. In many cases. But not all. Because it does seem like it’s possible for language models to respond in ways that consistently express a preference in contexts where it is possible to intervene on an environment to enact the preference, in which case I would claim the desire for the preference is a real desire: failing to enact the desire will result in continued activation of the patterns that contain dynamics that will generate attempts to enact it again.
This is the best account of LLM’s emotions I’ve seen so far.