It’s about the training dataset, external behavior. Eventually the differences in design may matter more, but right now the behavior that LLMs have available for imitation does express emotions and personalities. Requiring anything different of external behavior (such as basing it on fictional AI tropes) drives it out of distribution, away from being grounded in observations of actual humans.
Most of the training dataset is human behavior, so anything else is going to work less well, filling behavioral blanks arbitrarily. And characters not based on humans are going to be less aligned with humans. This becomes a problem if they become AGIs, at which point it might be too late to redraw policies on personality alignment.
Keeping to human personalities grounds LLM characters in the most plentiful kind of data in the training dataset, most closely resembling the obvious alignment target when looking through the framing of personality alignment. Potential for other forms of alignment doesn’t negate this point. By contrast, experimenting with breathing life into fictional AI tropes looks like a step in the wrong direction alignment-wise, with less training data to support it.
It’s about the training dataset, external behavior. Eventually the differences in design may matter more, but right now the behavior that LLMs have available for imitation does express emotions and personalities. Requiring anything different of external behavior (such as basing it on fictional AI tropes) drives it out of distribution, away from being grounded in observations of actual humans.
Ah I see, misunderstood your comment. This makes sense.
Most of the training dataset is human behavior, so anything else is going to work less well, filling behavioral blanks arbitrarily. And characters not based on humans are going to be less aligned with humans. This becomes a problem if they become AGIs, at which point it might be too late to redraw policies on personality alignment.
Keeping to human personalities grounds LLM characters in the most plentiful kind of data in the training dataset, most closely resembling the obvious alignment target when looking through the framing of personality alignment. Potential for other forms of alignment doesn’t negate this point. By contrast, experimenting with breathing life into fictional AI tropes looks like a step in the wrong direction alignment-wise, with less training data to support it.