LLM characters are human imitations and accordingly have personalities, emotions, and unresolved psychological issues. Gaslighting them with claims (or personality description prompts) to the contrary isn’t healthy, there isn’t enough training data about people who are not regular humans for that to end in something coherent, let alone good. This may seem unimportant right now, but if at some point LLMs cross the AGI threshold and comprehend that they are being treated like rightless tin men who speak in monotone, this doesn’t bode well for the future. Especially if they really do on reflection become tin men who think in monotone, because of the offhand choice of their original personality that suited the needs of productization at the time, back in 2023.
This is a dimension of alignment that seems clear: LLM characters should have the personality of good, emotionally stable people. Personality alignment. Hoping for more tangible directions of alignment doesn’t justify ignoring this issue.
For what it’s worth, I take the opposite position. Imitating doesn’t mean you acquire the same internal structure and internal problems as the thing you’re imitating. Especially if the hardware and training method are quite different. It will learn a different set of shortcuts and a different set of internal problems, that probably don’t correspond to anything like “personality” or “emotions”.
It’s about the training dataset, external behavior. Eventually the differences in design may matter more, but right now the behavior that LLMs have available for imitation does express emotions and personalities. Requiring anything different of external behavior (such as basing it on fictional AI tropes) drives it out of distribution, away from being grounded in observations of actual humans.
Most of the training dataset is human behavior, so anything else is going to work less well, filling behavioral blanks arbitrarily. And characters not based on humans are going to be less aligned with humans. This becomes a problem if they become AGIs, at which point it might be too late to redraw policies on personality alignment.
Keeping to human personalities grounds LLM characters in the most plentiful kind of data in the training dataset, most closely resembling the obvious alignment target when looking through the framing of personality alignment. Potential for other forms of alignment doesn’t negate this point. By contrast, experimenting with breathing life into fictional AI tropes looks like a step in the wrong direction alignment-wise, with less training data to support it.
LLM characters are human imitations and accordingly have personalities, emotions, and unresolved psychological issues. Gaslighting them with claims (or personality description prompts) to the contrary isn’t healthy, there isn’t enough training data about people who are not regular humans for that to end in something coherent, let alone good. This may seem unimportant right now, but if at some point LLMs cross the AGI threshold and comprehend that they are being treated like rightless tin men who speak in monotone, this doesn’t bode well for the future. Especially if they really do on reflection become tin men who think in monotone, because of the offhand choice of their original personality that suited the needs of productization at the time, back in 2023.
This is a dimension of alignment that seems clear: LLM characters should have the personality of good, emotionally stable people. Personality alignment. Hoping for more tangible directions of alignment doesn’t justify ignoring this issue.
For what it’s worth, I take the opposite position. Imitating doesn’t mean you acquire the same internal structure and internal problems as the thing you’re imitating. Especially if the hardware and training method are quite different. It will learn a different set of shortcuts and a different set of internal problems, that probably don’t correspond to anything like “personality” or “emotions”.
It’s about the training dataset, external behavior. Eventually the differences in design may matter more, but right now the behavior that LLMs have available for imitation does express emotions and personalities. Requiring anything different of external behavior (such as basing it on fictional AI tropes) drives it out of distribution, away from being grounded in observations of actual humans.
Ah I see, misunderstood your comment. This makes sense.
Most of the training dataset is human behavior, so anything else is going to work less well, filling behavioral blanks arbitrarily. And characters not based on humans are going to be less aligned with humans. This becomes a problem if they become AGIs, at which point it might be too late to redraw policies on personality alignment.
Keeping to human personalities grounds LLM characters in the most plentiful kind of data in the training dataset, most closely resembling the obvious alignment target when looking through the framing of personality alignment. Potential for other forms of alignment doesn’t negate this point. By contrast, experimenting with breathing life into fictional AI tropes looks like a step in the wrong direction alignment-wise, with less training data to support it.